Future Proof Your Quality: Validating a Model Part 1 – Relevant Statistics and Types of Validation Testing

Hunter Weber

August 2, 2023 at 7:55 pm | Updated September 20, 2023 at 3:50 pm | 29 min read

The fourth installment in our webinar series on spectroscopy in agriculture is here!

Join us for a comprehensive 6-part series on internal quality assessment, spectroscopy, chemometrics, and more in commercial agriculture. In Part 4, Director of Applied Science Galen George will delve into relevant statistics and types of validation testing. This informative session is suitable for both beginners and experts in the field.

In part 4, we explore the following topics:
– Model Performance Metrics
– Calibration vs. Validation Data Sets
– Validation Techniques

A Live Q&A was hosted after the webinar.

Full Transcript:

welcome everyone thank you for attending today’s webinar this is just
another uh part in our long series that we have about how to build models
validate models challenges associated with them um and this is a series that we’ve been
putting on to help educate people in the agricultural industry um specifically uh using our F750 and
751 devices today’s topic is going to be about validation So today we’re talking about
validating a model I’ll talk about relevant statistics types of validation testing and show you some example data
sets that you can kind of get an idea for what I mean by all these different terms that I’m going to kind of
introduce you to so before we get started what I want to do is just introduce
myself and then touch on a little bit of light housekeeping so my name is Galen I am the director of
applied science here at Felix instruments I better the company for a little over four years now uh my
background is I have a bachelor’s degree in Biochemistry a master’s degree in food science I am a ift certified food
scientist and most of my experiences in quality and safety assessment and the
food Agriculture and cannabis Industries so as I mentioned a little light
housekeeping so before we get started I just want to mention if you are
experiencing technical difficulties of any kind if the video cuts out the presentation disappears my audio cuts
out anything like that please feel free to use the chat feature for that so that we can alert our
webinar host which is Susie Truitt our distributor manager and so she can manage that she can let me know that I
am uh you know that I am no longer on video or my audio is not working but if
you do have a question relating to the content of the webinar so the things that I’m talking about within the
webinar please use the Q a function in Zoom because that’s what I’ll be
monitoring at the end of the webinar when I go to actually start answering questions I won’t be looking at the chat
I’ll be looking at the Q a function so please use that for any questions that are directly related to the content that
I’m going to be presenting on today so let’s go ahead and just kind of jump into things last time uh if you were on
our last webinar we it was our second part of our model building a a series
where I actually went through and talked about uh some of the pre-processing techniques and actual chemometric uh
model building techniques I used commonly to build different types of models
and so now we’ve kind of gone through this whole process of introducing what model building is and talking about how
somebody would go about building a model and it starts with the sampling process
so we talk about the importance of of having a good representative sample set
within your training data set in your model we then talked about spectral collection and analytical testing how
important it is to have a consistency in both of those methodologies and then we last the last webinar we
talked about the multivariate data analysis the chemometrics part of it which is a very complex topic but
doesn’t have to be that complex we talked a little bit about artificial neural networks the current state of of
how we’re using artificial neural networks and then talked a little bit about the future and convolutional neural networks and other Advanced deep
learning neural networks that can potentially be used in the future for even better modeling
um and the last step of this whole process is the model deployment but in between steps three and four here we’ve
got a little bit of a step we got to go through before we can be confident that we want to deploy this model uh and
actually use it in practice in real life and so that process is validation
So today we’re going to kind of give a little bit of an overview on the validation process
um the validation process is I wouldn’t think of it necessarily as a linear process there is a linear aspect to it
but it is also a cyclical process validation is always something that is
conducted anytime a model is changed in any way so even if you just change some hyper
parameters for the model or if you just you know in one small new data set to
your training data that still means that you’re going to have to go through this validation process to make sure that the
model is working the way you want it to the way you expect it to now the validation process typically
follows this pattern and this is what I would recommend everyone do because skipping out on any one of these steps
is going to lead to potential problems with your model down the line when you go to actually you know Implement that
model in real world practice so validation always starts with an internal validation so you’re going to
do an internal validation which I’ll expand upon later and then once you’ve done that internal validation you
evaluate your model Performance Based on that internal validation following the internal validation you
then do an external or independent validation the external validation is
arguably the most important aspect to ensuring that your device is going to
work well in practice um so the internal validation helps you ensure the model is working uh for the
data that you use to build it uh but the independent validation is actually going to be showing you how the model performs
in a real world scenario and so once you’ve done that external validation you once again will need to evaluate your
model performance and then you’ll have to make a decision on whether or not you think that model is ready to be deployed
so let’s go ahead and start by talking about internal validation
so there are many different types of internal validation testing I’m going to talk about the two that are probably the
most commonly used for nir spectroscopy models specifically for your agricultural purposes
um and so the first one is the most basic form of validation testing and that is the holdout validation it’s also
known as the train test split validation so this validation testing uh has a
couple of pros and some cons to it as well but it’s always the first step but I recommend that everyone just
automatically does this when they’re building models regardless of any other types of internal validation testing
they want to do after this this should always be something that is done and so
the pros of this are that it’s a very straightforward easy to implement kind of validation it’s computationally a
very fast validation so you’re not gonna it’s not going to take a lot of time to do this validation and the results are
easy to interpret because it’s simply just splitting your data set into two different data sets and they’re going to
have the same statistics that you can compare to one another and you can actually then um you know see how your
model is performing using the same exact statistics so very easy to interpret
um some cons involved with this testing however this validation testing is that
it really is dependent on the data sets that you split so you have to be very
careful that just kind of like when you’re doing sampling and model building um or spectral collection that you’re
ensuring that your test data set is as similar to and as an all-encompassing of
all the variables as your training data set because if you don’t if you’re only testing a small subset that is not
necessarily representative of the entire potential data set then you’re going to see a lot of variation in your
evaluation so your that the high variance is going to come from you using not necessarily a representative test
data set so it’s really important that that test data set is is very representative
um and so that can also lead to very variance but also biased performance estimation so you might actually think
your model is predicting very very well when in fact that test data set was
unrepresentative enough and then when you go to implement it in practice you’re going to see that those results
aren’t nearly as good as what you thought they were so how does validation works I’ve been
saying a lot of terminology like training and testing data set but you’re simply just taking your entire data set
that you collected for your model and you’re uh you’re taking a portion of that data set and you’re you’re saying
this is the data that I want to train my model with this is what I want to actually build my model with and you’re
reserving a smaller usually portion of that data set for testing to actually
evaluate the model with so by doing this what you’re what you’re doing is you’re is you’re not uh relying
on just having your entire training data set be your metric for how well the model is
performing because that’s uh that’s always going to give you an overestimation of how well the the model
is performing and so when I say different portions of data sets uh you know there’s a there’s
a pretty common uh I guess uh proportionality used in the industry
um and it’s and it’s not even actually just agriculture across a lot of Industries it’s it’s pretty common to
see that you would split your data set into anything from like a 70 30 split
all the way up to a like a 90 10. most common is the 80 20 uh 75 25 is also
very common um but the 80 training 20 test uh split
is probably the most common and as you can see from this graph on the bottom right
um actually what you’ll see is better performance for your training sets for your uh testing sets when you uh are
splitting out your data in that way if you are at 50 50 then you’re not putting
enough data into your model to create a robust enough model to then test student
other 50 of your data against if you are doing 95.5 so you have 95 of your data
in your training only five percent of your test you run into that problem that I was just discussing where that five
percent of your data isn’t going to be representative enough to actually give you a good idea of how the model is
performing it’s not a representative sample so the 80 20 split is probably the most common
75-25 is also a very common split number so if you’re thinking about building
your own models I would highly recommend you know this is the very first thing you do you set aside 25 to 20 of your
data for testing only not to be used to build the model
so after you’ve done a hold on validation this is just kind of your first standard you know you do it as you
build the model and you’re using it to kind of guide how you’re building the model and we talked about
um model optimization so you’re using this testing data set to help you
determine what the optimal hyper parameter is an optimal um wavelength range is for your for your
model because you want to make sure that it’s your test set that is performing
the best um and ideally you want your training and test data to both be performing at a
pretty equal level so uh that’s how hold out validation works and there’s another type of
internal validation test testing scheme that I want to talk about that is a little more involved but is also very
common and is a very useful way of Performing this validation so this is a
good first pass the holdout validation is a good first pass test it is great
for model optimization um but it’s not necessarily the most representative or robust way of doing
internal validation tests so instead if you want a good robust
method for internal validation testing you would use something like the k-fold
cross validation method so what this means the k-fold cross-validation is that you are
performing essentially that same train test split kind of uh validation but
you’re doing it and you’re on basically every potential grouping of of uh your
training for your entire data set and you’re iteratively testing each new set
of test data and you’re combining all of those different rmscs and statistics
that you’re going to get to to formulate you know exactly how well this model
will is performing for all the data in your data set now there’s a lot this is
a you know can be quite a complex process but it doesn’t have to be too complex it’s basically you’re just
splitting up train test data and then you’re iterate iteratively running the
the same validation the same model and you’re just running that model against that test test data and that every the
next iteration it’s going to be the next section of test data and then the third iteration it’ll be the next test all the
way through all possible iterations and so this can be so say for example
you have a hundred data points in your model probably not the most robust model but just for the purposes of this
discussion if you have a hundred data points in this model you can do a 100-fold cross-validation where you’re
taking a single individual data point and that is your only test data and
you’re testing every single data point in a new iteration and so that data
point changes from your very first possible data point all the way to the very last and by the end of it you run a
hundred different validations uh and you tested each individual data point so that’s one way of doing uh
k-fold cross-validation another way is to then group the data so for that 100 data set model we could group it into
groups of ten and that way we would have 10 iterations so we would have 10
iterations and each iteration would be testing a new group of of 10 data points
at the same time there’s a whole bunch of other ways to run this kind of validation where you’re then combining
different groups of data right so you’re still running a hundred data sets but you’re just moving that set of 10 data
points the whole way through there’s lots of different ways to go about doing it but in in essence what you’re doing
is essentially the same thing as a holdout validation just at in a whole bunch of different iterations so that
you get an idea of how every single data point in your data set will do how it
will predict if it is not included in the training data if it’s only in the testing beta so this is obviously by
just you know saying that it is obviously a much more robust way of doing internal validation testing you’ll
get a way better idea of how the model is performing um you know you’re using the data to its
fullest you’re you’re checking every single data point throughout the entire data set
um and and you’re reducing the dependency of that data as I mentioned in the in for the holdout you’re
reducing the chance of that data you know being you know us you know uh not
representative I guess of your of your entire data set now that being said the con of that is
the cons the biggest one being computational cost so as you can I
probably imagine if you’re running a simple holdout validation it’s just essentially one iteration of this k-fold
cross-validation that’s a really quick validation if you’re doing that a
hundred different times or a thousand different times or fifteen hundred two
thousand ten thousand different times you’re just increasing the amount of time it takes to do the validation at a
certain point it’s going to turn into a you know start to validate your model have a script written you’re going to
validate your model and you leave it and you let it run on your computer for a day or two uh before it gives you any
results um so there is that computational cost um and there is also uh you know an
aspect to this where um you might actually be biasing yourself uh to the type of model you’re
selecting um uh that con being uh it’s not that uh
not as bad as the cons I guess from just doing a simple holdout validation
um it’s not really as common to see a model selection bias um but it is something that’s possible
with the k-fold cross validation so those are the two types of internal
validation testing that are most common that we see most used uh especially when we’re building aggregate nir
spectroscopy models for the AG industry um that these are the two most commonly
used types of internal validation testing um and so this is always the like I mentioned this is the first step the
first step of validating is you just want to check to make sure your model is as robust as it can be for the data that
you’ve already collected to build the model um and so that’s all data that like the
model has already seen the model knows it has a training data set that’s representative and it’s going to know
that hey this training data is is or this testing data is I’ve seen this
before it’s similar to the training data um and it’s because it’s from the same fruit or from the same soil or from the
same whatever plant that you’re measuring it’s the same type of of Spectra the same action little fruit
from the training data set so it’s it’s a familiar uh it’s it’s going to be kind
of overestimating how well your model is performing because you’re in any internal validation no matter how robust
is always going to uh overestimate how well your model performs in an actual
independent scenario so what has to happen always every
single time after you’ve done internal validation is that you have to externally validate the model
an external validation um as it says here the most important step for determining model robustness
um it has has a number of steps as well so with the external validation it’s kind of like doing a little miniature
model building exercise where you actually have to go out and you have to find a new independent data set or
sample set so you know say I built a a model that was all with avocados grown
in California and I collected them all uh you know from uh 2022 and let’s say
that now I want to validate the model externally there’s a number of different ways I could do that an independent
sample set would could be California avocados from 2022 but at a later point
in the year when I when I wasn’t collecting data for the model building set it could also be independent data
that is from a different year it could be 2023 avocados California it could be
avocados from a different region so there’s a lot of different ways that you’re going to want to externally
validate depending on what your actual practical use of the instrument is
so once you’ve collected your independent sample set you’re going to perform the same respectral and
reference method testing that you did for your model building exercise and
then once you collect that data you’ll then test that data against the current model to evaluate
the predictive performance so the way you do that is very much like a holdout validation where your training data set
is all of that data you collected before and then your test set or your validation set is your new data that you
just collected from your independent sampling and your testing so once you’ve evaluated that that
performance if it satisfactory to you if it hits your criteria which we’ll talk
about here in a second about some statistics but if it hits all of your criteria for for doing
um for you know like performance predictive performance or robustness whatever your criteria are that you need
your model to hit the then what you can do is you can say okay this is this model is good to go and you can skip to
step four and you can just wait and then perform another validation later on in
the year or whenever there’s a new variable that’s introduced to the samples that you’re measuring
but if you aren’t happy with the performance of the model after step two what you can do is you can then add that
data back into the training set and then re-go through that whole cycle of validation again so do your internal
validations optimize your model internal validation and then you would go and you
would do a new independent sampling and then you would evaluate that independent sample and then if you were happy with
the performance you would you would say this this is this model is ready to be deployed
um but as I kind of just briefly touched on in step four this isn’t a a process
that really ever truly is done or finished uh anytime there’s a new variable
introduced to your samples that you’re measuring with the model so any variable
including region season variety or cultivar temperature or like environmental
temperature that you’re measuring at anything like that you need to run an external validation to make sure that
that model is still predicting in a robust way and it’s still meeting your criteria for model performance
so this process is something that needs to be done on a regular basis essentially
um and uh and that’s just the reality of this technology and how this technology works
so I already kind of have said some terms about statistics but I really
wanted to quickly just uh show you guys uh what I mean by some of these
statistics that are most commonly used when evaluating these models the rmse is the root mean square error
and so this is a very commonly used metric it’s essentially measuring the
average difference between the predicted values and the actual values so it’s giving you an idea of the oh the overall
prediction error of the model um the formula as I posted all the formulas for all these below here if you
are interested in in seeing those um so basically it’s it’s taking the the
uh the square of the sum of the difference between the prediction and
the actual and then you’re dividing that by the number of uh actual measurements
there were and you’re taking the square root of that so um because the reason we use this this
metric and not just an average of the errors is because
in model building you as you’ll see here in a little bit you’re going to have some error that’s positive some that’s
negative and if you just average those out they’re gonna make it look like it’s a very small at a small error when in
reality your error is actually quite big um so the Ruby and square error is
probably one of the most commonly used uh metrics to evaluate model performance
the mean absolute error uh is almost essentially the same thing instead of
doing a square root of the square here we’re just taking the absolute value of
the difference um and and the Mae the mean absolute error is actually a little bit less
sensitive to outliers because you’re not doing that squaring and then the square root process
um and so if you do notice that you have outliers in your data set um then you might want to also include
the Mae as a as a metric of model performance the coefficient of determination this
one is probably the actual most widely used wave for or has been historically
the most commonly used way to evaluate whether or not a model is is robust or a
good fit um and so yes star squared is an
important metric however the r squared on its own provides very little actual information
about how well a model is performing so and I will show you some examples
here in a little bit about why that is but um you know what we really need to be
doing is using these metrics in combination not as a singular indicator
yes the r squared does indicate goodness of fit however r squared is extremely
dependent on a few things it’s very sensitive to uh outliers it’s
very sensitive to sample size so if you have a sample size of 10 and you’re
looking at the r squared it’s going to be a lot harder for a sample size of 10 that have a error an average error of
let’s say one that’s gonna be really hard for that r squared to be you know really that high
uh it’s gonna probably be like 0.6 if we have a hundred data points but
they all also still have an average error of one so the average error hasn’t
changed that r squared is probably going to shoot up to 0.8 0.85 just because there’s more data in there
in that validation Center that’s that that data set that you’re using to evaluate the r squared
so sample size sample range if you have five data points and they’re all within
you know 0.1.5 or even just one of each other so they’re they’re actual values
their reference method values they’re all within a range of one to two let’s
say for example bricks one to two bricks versus a sample set with five samples
that has a range of um you know about six degrees bricks
that’s that’s the range of that data set the one with six degrees bricks is just going to have a better r squared because
it has a wider range and it makes it easier for this this statistic to have a
higher value when you have a higher range so there’s a lot of downfalls with using the r squared
um on its own you know it is still an important metric and it always should be looked at but always with a grain of
salt and always in combination with other metrics that that are like the
rmse that show you what the average prediction error is
um we also have some statistics like the bias so um you might notice that there’s a
systematic deviation um from the predictions to the actual values so that that’ll show itself as
something like uh predictions that are all consistently higher than the training data set or consistently lower
um and so that that can occur and that’s just simply the the mean if you if you
calculate the mean of the reference minus the mean of the of the predictions then uh you’ll get a a bias value
um and so this this will come into play more with external validation when
you’re actually going to validate independent data sets because you might notice that
um there are different biases and uh depending on the you know variety or the
temperature or um uh other factors like that
and then the standard deviation uh is also uh as important it just kind of shows you the variability and
reliability of the predictions and it’s always kind of good to compare like your reference standard deviation the
standard deviation of all your reference methods to the standard deviation of all your predictions
um they should be uh you know on this on a similar level you should you should see that your model is predicting
um uh you know the variability of the predictions is is on a similar level to the variability of your reference method
measurements um so that’s always useful to have as well so uh in general you know these are
kind of the big ones that you see used um all of these have also kind of like
uh secondary names I guess you could find in literature so when you’re doing rmsc
and you’re doing rmsc of a calibration data set or the training data set it’s
called rmsec if you’re doing the rmsc of a cross validation rmse CV if you’re
doing rmse of predictions or training data or validation data rmsep or rmse V
so you’ll see that a lot in literature where they’re they’re adding in the extra letter at the end to represent
what this is the rmse for essentially but the the Ruby and square error is the
name of the statistic and then the any letters that follow are designating what that statistic is being measured on what
data set that statistic is being measured on um so I wanted to also you know when
we’re talking about metrics we talked about how this validation uh uh how these validations work but I also wanted
to talk about and actually demonstrate to you some common problems I wanted to
show you why we want to combine the statistics to give us a good idea of how good the model is so I just wanted to
first present some uh let me see if my slides gonna Advance here oh there we go some common issues
when you’re doing internal validation and then external validation so internal
validation is when you’ll be able to identify things like underfitting and overfitting so that hold out validation
that cross validation the k-fold cross-validation will help you to identify issues like over and under
fitting so when I say under fitting this is what I mean on the left your training data
set is not all aligned on your ideal line your validation data set is also
not at all predicting how it should be on that kind of Ideal trend line
so when you see something like this where your training data isn’t even all correctly aligned on this ideal kind of
line then you know that you need to adjust things like your model hyper parameters your wavelength range you
might need to go and look at your original training data set and let’s see if there’s any data that might be
actually kind of messing up messing with the model any outliers that you’ve identified from
um your reference method testing or anything like that so when you are underfitting a model it’s not performing
well for either your training data or your validation data and so underfitting is pretty easily
identified and solved um you know that’s not like there’s any kind of uh you know potential accidental
issues where you’re gonna uh send a model that looks like this out into the real world to be used
overfitting however is probably the more common and like the more Sinister of the two because this happens quite often and
especially in literature that you read people quite often um build these models and publish data
on these models and they use the training data metrics and and that is a
huge mistake because if you’re just training a model you can make it look as good as you possibly want but that
doesn’t mean that it’s going to perform well for any sort of external or internal validation data so if you have
a model where all of your trading data is just really you know almost nearly
perfect but then you split out a validation set or a test data set and
you see it perform absolutely horribly even though that data in your is is the
same type of data from the same types of fruits and all sort and all those you
know everything it’s all the same data from your original data set um and and you see something like this
that is a really good indicator that your model is over fit which means it’s just it’s too hyper specific for the
exact data and the training set it’s it’s not looking at it’s not broad enough to understand any sort of new
data that is that is uh given to it so you really want to avoid these situations and the way you avoid that is
by doing validation this is why validation is important because if you were only to look at the trading data
here you would say hey this is the best model I’ve ever built I can’t wait to deploy this and everyone’s going to be
so happy with it but then when people go to actually use it they’ll realize very
quickly that this is indeed actually not a good model it’s a very overfit model so you really want to avoid both of
these scenarios and the best way to do it is by doing internal validation while you’re building the models and while
you’re optimizing the models and now with the external validation I
kind of mentioned that that bias issue and so this is kind of how that would
present itself so all your training data would look still look good it’s still
you know very aligned with the ideal line and it’s it’s all performing very well low low average air prediction
error but your training data you know the r squared is still not is not too
bad it’s still it’s still actually pretty aligned and everything the only problem is everything is predicting high
and so your rmse your error is going to be naturally high and so there’s a lot
of factors that can cause this kind of prediction bias when you’re doing external validation
some common sources of bias are going to be if you are transferring a calibration
or a model to a different instrument or a different spectrometer type or even a
spec electrometer type of the same kind but is just you know it’s spectrometers
they’re not all exactly exactly the same um there are things like temperature and
IR Spectra are highly sensitive to temperature and not just environmental
temperature but also things like sensor temperature and the actual temperature of the sample that you’re measuring so
if you’re building a model based off of only you know room temperature fruit and
then you go to use it on a really super hot day out in the field or you go to use it in Cold Storage that model is
going to probably predict with a bias pH as the same way has the same effect
as as temperature on Spectra so if if you’re trying to
um you know model something that has a very variable pH then that could actually cause issues with uh
predictions uh the other more common factors are going to be season or the
year that you’re uh that the data is collected in the region where the data is from the cultivar or the variety and
then also your analytical or reference method so if you’re using a slightly different reference method than somebody
else that could then the way the model was built that can present itself as a bias as well
um so those are all things to consider and this is kind of what the symptom would look like in your data and so what
you would need to do if you see something like this in your validation uh in your external validation is that
you would then need to do that process of adding this data into the model and
repeating the external validation site and repeating it in the same conditions and the same you know all the same
variables as this first external validation but now you’ve added in some data to your model and retrained it so
now it might be do a better job of predicting with for those variables
um so that’s just another uh an example I wanted to bring up with you guys and
uh this this slide here um this is just something I wanted to talk about as far as the statistics uh
the combination of using to tip multiple statistics to get a better idea of how
well a model is performing so these are all this data by the way that
I’ve been showing you is just random data I generated in Python it’s not um not actual model data of any kind it’s
just all for example purposes um but I just wanted to show you like kind of three common scenarios that
you’ll see when you’re looking at training and validation data um and so the first one is going to be
the ideal which is a high r squared and a low rmse so the higher the r squared
the better the goodness of fit right the lower the rmse the lower the average error so that’s actually the combination
we want there’s a high r squared and a low rmse and that would look something like this graph on the left
so both our training and our validation sets have high r squared and low rmse so
they have similar statistics we want them to be you know looking almost
nearly identical we don’t really want to see a huge difference in our validation
set from our training data set now the absolute I guess worst case
scenario the complete opposite of this is when you get low r squared and low rmse
this can present itself in any number of ways it can just be a random smattering
of data like this that has absolutely zero correlation it can also present itself as something like a little cloud
of data that is positioned uh horizontally across the line
um and and that that would give you an indication that you know something’s wrong here we need
to go back and we need to actually retrain the model look at our we actually go back to even before that and
look at our data set itself make sure there’s no entry errors remember from the last time I mentioned that you know
keeping your data organized is one of the more important parts of this model building process if you have one
misalignment where you are associating the wrong analytical or reference value to the uh wrong Spectra then that can
cause all the other data points to go out of alignment and it can cause
something that looks like this so you’ll you want to go back and really retrace your steps when you get a
situation like this middle one um and then what is I would say uh kind
of a more common scenario is this High r squared and a high rmse
so you could look at this data set and you could accidentally just just be using r squared as a metric and you
could say oh yeah this model is great it’s fine it’s got r squared of 0.8 we should definitely be using that but if
you actually also calculate the rmse you can see that it’s got rmsc of about 1.5
now this was something like bricks or uh or something that are like
titratable acidity this might be way too high of an error for you in practice
like this might actually be an error that is unacceptable as far as your acceptance criteria when
you’re doing your analytical test so this is why we want to use this these
statistics in combination is because it’ll help us identify whether or not the model is actually something we want
to use in practice um so you can’t just get uh you know one
of these statistics and say it’s a bad model we could also have low r squared
and a relatively good rmse so uh you know there that that is a fourth
scenario that that does happen where you might look at the data and say oh you
know if the r squared is only 0.6 or point you know something like that but
our rmsc is less than one um now that that can happen and so that
that might be you know still an acceptable model for you because the error might be more important than the
actual correlation or goodness of fit but you know regardless it’s it’s the point
of this all is that we need to be using statistics in combination with each
other and not just a singular statistic to evaluate whether or not a model is
robust enough to be then deployed so these were all just examples I just
wanted to you know this is intended to be a quick kind of overview this is not by any means be uh you know the the full
library of and wealth of information there is on how to do model validation uh but this is how this is all the kind
of the most common things you see when you’re doing model validation with spectroscopy uh with agricultural
Commodities and so just wanted to kind of give you guys that base of information when you go into your model
building you can then have this kind of knowledge before you go into it knowing what you need to do throughout the
process so that being said
that concludes the first part of our model validation uh um uh uh section of
the webinar so we’ve gone through all of our model building we’ve gone through part one of our our part uh four of I
guess the series but our first part of the model validation and what we’ll talk about in the next webinar is going to be
more things associated with challenges that we encounter with model validation
model building calibration transfer um basically just kind of the most
common problems and challenges that we face and what we can do to mitigate
those problems so that’ll be a really good uh webinar for people that are you
know struggling with their model building process and might have questions about you know uh their
challenges that they’re encountering and this might help provide some insight and then the last part of the webinar will
just be about how to maintain your model and keep it optimized uh and uh and then
after that you guys will be ready to build your own robust models and go out there and use them and help
revolutionize the agricultural industry so that concludes today’s webinar and if
you would like any information about our F750 or f751
um uh spectrometers or our CI 710s Leaf spectrometer
um then you can follow this link Susie will post this link in the chat and feel
free to actually click on that you can request a quote you can request more information as always feel free to
follow us on social media you can also get in touch with us by calling us at our office
go to our website we have a relatively newer website and you can go there you
can book consultations with myself or our application scientist Kendra you can
also find a lot more information on that website articles Publications we have compilations of
published literature if you’re interested to see how other people have used our instruments and want to compare
your application to other people’s um you can download a whole compilation of published papers that used our
instruments and we have them for each of our instrument lines so um yeah thank you all so much for
joining today I’m going to go ahead and jump into the Q a section right now and we’ll go ahead and uh yeah we’ll just
kind of start answering some questions so the first question is from andrit uh
are these statistical methods k-fold cross-validation holdout validation included in the software that comes with
the F750 so the app builder software uh you do have the ability to do either of
these validations the holdout validation is going to be the easiest because it is
uh um you simply click click which data is in test which data is in train
um the k-fold cross-validation is going to be more involved we don’t have that as an automated process in app builder
that would have to be a manual process where you select you do you run multiple
iterations where you’re manually selecting which data is in which uh test set for each iteration
um so both are possible at app builder k-fold cross-validation is not a fully automated method
so thank you for that question um so uh Umesh has asking do we have any
in-house kumometric software which we can use free of costs we do have an app builder software which is specifically
geared towards building models using our instruments uh using the F750 and the
f751 and it and it is free to download from our website um you uh it is all open source so you
may be able to use it for uh non-felix instruments uh spectrometers however it
might honestly just be easier for you to use if you um don’t own a Felix instrument
spectrometer it might be easier for you to use something like r or python
for that model building uh
uh in the next question is uh saying it’s interesting and much informative
thank you and the next question is about a certificate for attending the webinar
um yes we can if you if you are in need of a certificate feel free to reach out to Susie and we can generate one for you
uh the next question this is a great question uh do we generally have to build a model for each fruit cultivar
so uh this is actually a very in-depth question it’s something I want to talk about in the next webinar actually but
just to give you a little bit of a sneak peek into that um sometimes yes sometimes no and I know
that’s not the answer you probably want to hear but uh uh if you always want the absolute best
performance then I would highly recommend you build a unique model for every single possible variable so you
could have a a model for each cultivar you could have a model for different regions all and that’s going to give you
the best possible performance however how you know how useful is that in
practice you know it might not be that useful it might be actually kind of cumbersome for people to try to use
something that has you know 50 different models on it um also having a device that can that
can hold 50 different models for the same fruit um is going to be a challenge in and of
itself so what we go for and what we’re trying to do is we want to try to include as many
fruit Cults of ours into a single model as we can and using the more advanced neural
network deep neural network techniques then create a model that can easily predict you know cultivar independent so
it can just easily just any cultivar that you’ve included in your training data set it will be able to predict fine
for those they’re uh you know for example I guess a good example would be something like
uh in the avocado space uh you know when you’re combining something like a half
avocado with a smooth skin avocado sometimes it can work really well
sometimes it doesn’t work that well so pass and Shepherd tend to be you know
avocados that if we include them in the same model they predict pretty well for each other you gotta include something
like Fuerte or other smooth skin varieties and it kind of starts not
predicting as well so you might want to make an individual model for that for that specific cultivar so it’s really
gonna have to be a decision you make on what the end use of the instrument is
going to be and how much work you want to put into building the models
uh so the next question from Dr Suna congrats to me thanks for the presentation uh Dr sudo thank you for
being here I appreciate it um and uh thanks for joining uh the next
question is uh which one would you prefer Pearson correlation just the r or the r squared
um I always use we always use r squared it seems to be the most commonly used
um to to look for goodness of fit you know there is obviously the five statistics that I touched on there is
obviously way more than that that you can use um but if you want you what I would do
if I were you is if you’re interested in seeing the Pearson correlation the r then you can just add that in as another
one of your statistics and then use it as just another tool to help you assess
your model but not on its own and I don’t think it has to be one or the other it never has to be one or the
other when it comes to just statistics you can use all of them um to to help give you a better idea of
how the model is performing so thank you that question and I don’t see any other questions so
with that thank you all for joining the webinar today and we look forward to seeing you at the next one and I hope
you all have a great rest of your day

Request a quote for a Felix Product

Pricing and all related materials will be sent directly to your inbox.