Future Proof Your Quality: Validating a Model Part 1 – Relevant Statistics and Types of Validation Testing

Hunter Weber

August 2, 2023 at 7:55 pm | Updated September 20, 2023 at 3:50 pm | 29 min read

The fourth installment in our webinar series on spectroscopy in agriculture is here!

Join us for a comprehensive 6-part series on internal quality assessment, spectroscopy, chemometrics, and more in commercial agriculture. In Part 4, Director of Applied Science Galen George will delve into relevant statistics and types of validation testing. This informative session is suitable for both beginners and experts in the field.

In part 4, we explore the following topics:
– Model Performance Metrics
– Calibration vs. Validation Data Sets
– Validation Techniques

A Live Q&A was hosted after the webinar.

Full Transcript:

welcome everyone thank you for attending today’s webinar this is just

another uh part in our long series that we have about how to build models

validate models challenges associated with them um and this is a series that we’ve been

putting on to help educate people in the agricultural industry um specifically uh using our F750 and

751 devices today’s topic is going to be about validation So today we’re talking about

validating a model I’ll talk about relevant statistics types of validation testing and show you some example data

sets that you can kind of get an idea for what I mean by all these different terms that I’m going to kind of

introduce you to so before we get started what I want to do is just introduce

myself and then touch on a little bit of light housekeeping so my name is Galen I am the director of

applied science here at Felix instruments I better the company for a little over four years now uh my

background is I have a bachelor’s degree in Biochemistry a master’s degree in food science I am a ift certified food

scientist and most of my experiences in quality and safety assessment and the

food Agriculture and cannabis Industries so as I mentioned a little light

housekeeping so before we get started I just want to mention if you are

experiencing technical difficulties of any kind if the video cuts out the presentation disappears my audio cuts

out anything like that please feel free to use the chat feature for that so that we can alert our

webinar host which is Susie Truitt our distributor manager and so she can manage that she can let me know that I

am uh you know that I am no longer on video or my audio is not working but if

you do have a question relating to the content of the webinar so the things that I’m talking about within the

webinar please use the Q a function in Zoom because that’s what I’ll be

monitoring at the end of the webinar when I go to actually start answering questions I won’t be looking at the chat

I’ll be looking at the Q a function so please use that for any questions that are directly related to the content that

I’m going to be presenting on today so let’s go ahead and just kind of jump into things last time uh if you were on

our last webinar we it was our second part of our model building a a series

where I actually went through and talked about uh some of the pre-processing techniques and actual chemometric uh

Overview of Model Building Process

model building techniques I used commonly to build different types of models

and so now we’ve kind of gone through this whole process of introducing what model building is and talking about how

somebody would go about building a model and it starts with the sampling process

so we talk about the importance of of having a good representative sample set

within your training data set in your model we then talked about spectral collection and analytical testing how

important it is to have a consistency in both of those methodologies and then we last the last webinar we

talked about the multivariate data analysis the chemometrics part of it which is a very complex topic but

doesn’t have to be that complex we talked a little bit about artificial neural networks the current state of of

how we’re using artificial neural networks and then talked a little bit about the future and convolutional neural networks and other Advanced deep

learning neural networks that can potentially be used in the future for even better modeling

um and the last step of this whole process is the model deployment but in between steps three and four here we’ve

got a little bit of a step we got to go through before we can be confident that we want to deploy this model uh and

actually use it in practice in real life and so that process is validation

So today we’re going to kind of give a little bit of an overview on the validation process

um the validation process is I wouldn’t think of it necessarily as a linear process there is a linear aspect to it

Overview of Model Validation Process

but it is also a cyclical process validation is always something that is

conducted anytime a model is changed in any way so even if you just change some hyper

parameters for the model or if you just you know in one small new data set to

your training data that still means that you’re going to have to go through this validation process to make sure that the

model is working the way you want it to the way you expect it to now the validation process typically

follows this pattern and this is what I would recommend everyone do because skipping out on any one of these steps

is going to lead to potential problems with your model down the line when you go to actually you know Implement that

model in real world practice so validation always starts with an internal validation so you’re going to

do an internal validation which I’ll expand upon later and then once you’ve done that internal validation you

evaluate your model Performance Based on that internal validation following the internal validation you

then do an external or independent validation the external validation is

arguably the most important aspect to ensuring that your device is going to

work well in practice um so the internal validation helps you ensure the model is working uh for the

data that you use to build it uh but the independent validation is actually going to be showing you how the model performs

in a real world scenario and so once you’ve done that external validation you once again will need to evaluate your

model performance and then you’ll have to make a decision on whether or not you think that model is ready to be deployed

so let’s go ahead and start by talking about internal validation

so there are many different types of internal validation testing I’m going to talk about the two that are probably the

Types of Internal Validation Testing – hold out

most commonly used for nir spectroscopy models specifically for your agricultural purposes

um and so the first one is the most basic form of validation testing and that is the holdout validation it’s also

known as the train test split validation so this validation testing uh has a

couple of pros and some cons to it as well but it’s always the first step but I recommend that everyone just

automatically does this when they’re building models regardless of any other types of internal validation testing

they want to do after this this should always be something that is done and so

the pros of this are that it’s a very straightforward easy to implement kind of validation it’s computationally a

very fast validation so you’re not gonna it’s not going to take a lot of time to do this validation and the results are

easy to interpret because it’s simply just splitting your data set into two different data sets and they’re going to

have the same statistics that you can compare to one another and you can actually then um you know see how your

model is performing using the same exact statistics so very easy to interpret

um some cons involved with this testing however this validation testing is that

it really is dependent on the data sets that you split so you have to be very

careful that just kind of like when you’re doing sampling and model building um or spectral collection that you’re

ensuring that your test data set is as similar to and as an all-encompassing of

all the variables as your training data set because if you don’t if you’re only testing a small subset that is not

necessarily representative of the entire potential data set then you’re going to see a lot of variation in your

evaluation so your that the high variance is going to come from you using not necessarily a representative test

data set so it’s really important that that test data set is is very representative

um and so that can also lead to very variance but also biased performance estimation so you might actually think

your model is predicting very very well when in fact that test data set was

unrepresentative enough and then when you go to implement it in practice you’re going to see that those results

aren’t nearly as good as what you thought they were so how does validation works I’ve been

saying a lot of terminology like training and testing data set but you’re simply just taking your entire data set

that you collected for your model and you’re uh you’re taking a portion of that data set and you’re you’re saying

this is the data that I want to train my model with this is what I want to actually build my model with and you’re

reserving a smaller usually portion of that data set for testing to actually

evaluate the model with so by doing this what you’re what you’re doing is you’re is you’re not uh relying

on just having your entire training data set be your metric for how well the model is

performing because that’s uh that’s always going to give you an overestimation of how well the the model

is performing and so when I say different portions of data sets uh you know there’s a there’s

a pretty common uh I guess uh proportionality used in the industry

um and it’s and it’s not even actually just agriculture across a lot of Industries it’s it’s pretty common to

see that you would split your data set into anything from like a 70 30 split

all the way up to a like a 90 10. most common is the 80 20 uh 75 25 is also

very common um but the 80 training 20 test uh split

is probably the most common and as you can see from this graph on the bottom right

um actually what you’ll see is better performance for your training sets for your uh testing sets when you uh are

splitting out your data in that way if you are at 50 50 then you’re not putting

enough data into your model to create a robust enough model to then test student

other 50 of your data against if you are doing 95.5 so you have 95 of your data

in your training only five percent of your test you run into that problem that I was just discussing where that five

percent of your data isn’t going to be representative enough to actually give you a good idea of how the model is

performing it’s not a representative sample so the 80 20 split is probably the most common

75-25 is also a very common split number so if you’re thinking about building

your own models I would highly recommend you know this is the very first thing you do you set aside 25 to 20 of your

data for testing only not to be used to build the model

so after you’ve done a hold on validation this is just kind of your first standard you know you do it as you

build the model and you’re using it to kind of guide how you’re building the model and we talked about

um model optimization so you’re using this testing data set to help you

determine what the optimal hyper parameter is an optimal um wavelength range is for your for your

model because you want to make sure that it’s your test set that is performing

the best um and ideally you want your training and test data to both be performing at a

pretty equal level so uh that’s how hold out validation works and there’s another type of

internal validation test testing scheme that I want to talk about that is a little more involved but is also very

common and is a very useful way of Performing this validation so this is a

good first pass the holdout validation is a good first pass test it is great

for model optimization um but it’s not necessarily the most representative or robust way of doing

internal validation tests so instead if you want a good robust

method for internal validation testing you would use something like the k-fold

Types of Internal Validation Testing – K-fold Cross Validation

cross validation method so what this means the k-fold cross-validation is that you are

performing essentially that same train test split kind of uh validation but

you’re doing it and you’re on basically every potential grouping of of uh your

training for your entire data set and you’re iteratively testing each new set

of test data and you’re combining all of those different rmscs and statistics

that you’re going to get to to formulate you know exactly how well this model

will is performing for all the data in your data set now there’s a lot this is

a you know can be quite a complex process but it doesn’t have to be too complex it’s basically you’re just

splitting up train test data and then you’re iterate iteratively running the

the same validation the same model and you’re just running that model against that test test data and that every the

next iteration it’s going to be the next section of test data and then the third iteration it’ll be the next test all the

way through all possible iterations and so this can be so say for example

you have a hundred data points in your model probably not the most robust model but just for the purposes of this

discussion if you have a hundred data points in this model you can do a 100-fold cross-validation where you’re

taking a single individual data point and that is your only test data and

you’re testing every single data point in a new iteration and so that data

point changes from your very first possible data point all the way to the very last and by the end of it you run a

hundred different validations uh and you tested each individual data point so that’s one way of doing uh

k-fold cross-validation another way is to then group the data so for that 100 data set model we could group it into

groups of ten and that way we would have 10 iterations so we would have 10

iterations and each iteration would be testing a new group of of 10 data points

at the same time there’s a whole bunch of other ways to run this kind of validation where you’re then combining

different groups of data right so you’re still running a hundred data sets but you’re just moving that set of 10 data

points the whole way through there’s lots of different ways to go about doing it but in in essence what you’re doing

is essentially the same thing as a holdout validation just at in a whole bunch of different iterations so that

you get an idea of how every single data point in your data set will do how it

will predict if it is not included in the training data if it’s only in the testing beta so this is obviously by

just you know saying that it is obviously a much more robust way of doing internal validation testing you’ll

get a way better idea of how the model is performing um you know you’re using the data to its

fullest you’re you’re checking every single data point throughout the entire data set

um and and you’re reducing the dependency of that data as I mentioned in the in for the holdout you’re

reducing the chance of that data you know being you know us you know uh not

representative I guess of your of your entire data set now that being said the con of that is

the cons the biggest one being computational cost so as you can I

probably imagine if you’re running a simple holdout validation it’s just essentially one iteration of this k-fold

cross-validation that’s a really quick validation if you’re doing that a

hundred different times or a thousand different times or fifteen hundred two

thousand ten thousand different times you’re just increasing the amount of time it takes to do the validation at a

certain point it’s going to turn into a you know start to validate your model have a script written you’re going to

validate your model and you leave it and you let it run on your computer for a day or two uh before it gives you any

results um so there is that computational cost um and there is also uh you know an

aspect to this where um you might actually be biasing yourself uh to the type of model you’re

selecting um uh that con being uh it’s not that uh

not as bad as the cons I guess from just doing a simple holdout validation

um it’s not really as common to see a model selection bias um but it is something that’s possible

with the k-fold cross validation so those are the two types of internal

validation testing that are most common that we see most used uh especially when we’re building aggregate nir

spectroscopy models for the AG industry um that these are the two most commonly

used types of internal validation testing um and so this is always the like I mentioned this is the first step the

first step of validating is you just want to check to make sure your model is as robust as it can be for the data that

you’ve already collected to build the model um and so that’s all data that like the

model has already seen the model knows it has a training data set that’s representative and it’s going to know

that hey this training data is is or this testing data is I’ve seen this

before it’s similar to the training data um and it’s because it’s from the same fruit or from the same soil or from the

same whatever plant that you’re measuring it’s the same type of of Spectra the same action little fruit

from the training data set so it’s it’s a familiar uh it’s it’s going to be kind

of overestimating how well your model is performing because you’re in any internal validation no matter how robust

is always going to uh overestimate how well your model performs in an actual

independent scenario so what has to happen always every

single time after you’ve done internal validation is that you have to externally validate the model

an external validation um as it says here the most important step for determining model robustness

um it has has a number of steps as well so with the external validation it’s kind of like doing a little miniature

External Validation – Most important steps

model building exercise where you actually have to go out and you have to find a new independent data set or

sample set so you know say I built a a model that was all with avocados grown

in California and I collected them all uh you know from uh 2022 and let’s say

that now I want to validate the model externally there’s a number of different ways I could do that an independent

sample set would could be California avocados from 2022 but at a later point

in the year when I when I wasn’t collecting data for the model building set it could also be independent data

that is from a different year it could be 2023 avocados California it could be

avocados from a different region so there’s a lot of different ways that you’re going to want to externally

validate depending on what your actual practical use of the instrument is

so once you’ve collected your independent sample set you’re going to perform the same respectral and

reference method testing that you did for your model building exercise and

then once you collect that data you’ll then test that data against the current model to evaluate

the predictive performance so the way you do that is very much like a holdout validation where your training data set

is all of that data you collected before and then your test set or your validation set is your new data that you

just collected from your independent sampling and your testing so once you’ve evaluated that that

performance if it satisfactory to you if it hits your criteria which we’ll talk

about here in a second about some statistics but if it hits all of your criteria for for doing

um for you know like performance predictive performance or robustness whatever your criteria are that you need

your model to hit the then what you can do is you can say okay this is this model is good to go and you can skip to

step four and you can just wait and then perform another validation later on in

the year or whenever there’s a new variable that’s introduced to the samples that you’re measuring

but if you aren’t happy with the performance of the model after step two what you can do is you can then add that

data back into the training set and then re-go through that whole cycle of validation again so do your internal

validations optimize your model internal validation and then you would go and you

would do a new independent sampling and then you would evaluate that independent sample and then if you were happy with

the performance you would you would say this this is this model is ready to be deployed

um but as I kind of just briefly touched on in step four this isn’t a a process

that really ever truly is done or finished uh anytime there’s a new variable

introduced to your samples that you’re measuring with the model so any variable

including region season variety or cultivar temperature or like environmental

temperature that you’re measuring at anything like that you need to run an external validation to make sure that

that model is still predicting in a robust way and it’s still meeting your criteria for model performance

so this process is something that needs to be done on a regular basis essentially

um and uh and that’s just the reality of this technology and how this technology works

so I already kind of have said some terms about statistics but I really

wanted to quickly just uh show you guys uh what I mean by some of these

statistics that are most commonly used when evaluating these models the rmse is the root mean square error

Validating a Model – Model Performance Metrics

and so this is a very commonly used metric it’s essentially measuring the

average difference between the predicted values and the actual values so it’s giving you an idea of the oh the overall

prediction error of the model um the formula as I posted all the formulas for all these below here if you

are interested in in seeing those um so basically it’s it’s taking the the

uh the square of the sum of the difference between the prediction and

the actual and then you’re dividing that by the number of uh actual measurements

there were and you’re taking the square root of that so um because the reason we use this this

metric and not just an average of the errors is because

in model building you as you’ll see here in a little bit you’re going to have some error that’s positive some that’s

negative and if you just average those out they’re gonna make it look like it’s a very small at a small error when in

reality your error is actually quite big um so the Ruby and square error is

probably one of the most commonly used uh metrics to evaluate model performance

the mean absolute error uh is almost essentially the same thing instead of

doing a square root of the square here we’re just taking the absolute value of

the difference um and and the Mae the mean absolute error is actually a little bit less

sensitive to outliers because you’re not doing that squaring and then the square root process

um and so if you do notice that you have outliers in your data set um then you might want to also include

the Mae as a as a metric of model performance the coefficient of determination this

one is probably the actual most widely used wave for or has been historically

the most commonly used way to evaluate whether or not a model is is robust or a

good fit um and so yes star squared is an

important metric however the r squared on its own provides very little actual information

about how well a model is performing so and I will show you some examples

here in a little bit about why that is but um you know what we really need to be

doing is using these metrics in combination not as a singular indicator

yes the r squared does indicate goodness of fit however r squared is extremely

dependent on a few things it’s very sensitive to uh outliers it’s

very sensitive to sample size so if you have a sample size of 10 and you’re

looking at the r squared it’s going to be a lot harder for a sample size of 10 that have a error an average error of

let’s say one that’s gonna be really hard for that r squared to be you know really that high

uh it’s gonna probably be like 0.6 if we have a hundred data points but

they all also still have an average error of one so the average error hasn’t

changed that r squared is probably going to shoot up to 0.8 0.85 just because there’s more data in there

in that validation Center that’s that that data set that you’re using to evaluate the r squared

so sample size sample range if you have five data points and they’re all within

you know 0.1.5 or even just one of each other so they’re they’re actual values

their reference method values they’re all within a range of one to two let’s

say for example bricks one to two bricks versus a sample set with five samples

that has a range of um you know about six degrees bricks

that’s that’s the range of that data set the one with six degrees bricks is just going to have a better r squared because

it has a wider range and it makes it easier for this this statistic to have a

higher value when you have a higher range so there’s a lot of downfalls with using the r squared

um on its own you know it is still an important metric and it always should be looked at but always with a grain of

salt and always in combination with other metrics that that are like the

rmse that show you what the average prediction error is

um we also have some statistics like the bias so um you might notice that there’s a

systematic deviation um from the predictions to the actual values so that that’ll show itself as

something like uh predictions that are all consistently higher than the training data set or consistently lower

um and so that that can occur and that’s just simply the the mean if you if you

calculate the mean of the reference minus the mean of the of the predictions then uh you’ll get a a bias value

um and so this this will come into play more with external validation when

you’re actually going to validate independent data sets because you might notice that

um there are different biases and uh depending on the you know variety or the

temperature or um uh other factors like that

and then the standard deviation uh is also uh as important it just kind of shows you the variability and

reliability of the predictions and it’s always kind of good to compare like your reference standard deviation the

standard deviation of all your reference methods to the standard deviation of all your predictions

um they should be uh you know on this on a similar level you should you should see that your model is predicting

um uh you know the variability of the predictions is is on a similar level to the variability of your reference method

measurements um so that’s always useful to have as well so uh in general you know these are

kind of the big ones that you see used um all of these have also kind of like

uh secondary names I guess you could find in literature so when you’re doing rmsc

and you’re doing rmsc of a calibration data set or the training data set it’s

called rmsec if you’re doing the rmsc of a cross validation rmse CV if you’re

doing rmse of predictions or training data or validation data rmsep or rmse V

so you’ll see that a lot in literature where they’re they’re adding in the extra letter at the end to represent

what this is the rmse for essentially but the the Ruby and square error is the

name of the statistic and then the any letters that follow are designating what that statistic is being measured on what

data set that statistic is being measured on um so I wanted to also you know when

we’re talking about metrics we talked about how this validation uh uh how these validations work but I also wanted

to talk about and actually demonstrate to you some common problems I wanted to

show you why we want to combine the statistics to give us a good idea of how good the model is so I just wanted to

first present some uh let me see if my slides gonna Advance here oh there we go some common issues

Underfitting and Overfitting

when you’re doing internal validation and then external validation so internal

validation is when you’ll be able to identify things like underfitting and overfitting so that hold out validation

that cross validation the k-fold cross-validation will help you to identify issues like over and under

fitting so when I say under fitting this is what I mean on the left your training data

set is not all aligned on your ideal line your validation data set is also

not at all predicting how it should be on that kind of Ideal trend line

so when you see something like this where your training data isn’t even all correctly aligned on this ideal kind of

line then you know that you need to adjust things like your model hyper parameters your wavelength range you

might need to go and look at your original training data set and let’s see if there’s any data that might be

actually kind of messing up messing with the model any outliers that you’ve identified from

um your reference method testing or anything like that so when you are underfitting a model it’s not performing

well for either your training data or your validation data and so underfitting is pretty easily

identified and solved um you know that’s not like there’s any kind of uh you know potential accidental

issues where you’re gonna uh send a model that looks like this out into the real world to be used

overfitting however is probably the more common and like the more Sinister of the two because this happens quite often and

especially in literature that you read people quite often um build these models and publish data

on these models and they use the training data metrics and and that is a

huge mistake because if you’re just training a model you can make it look as good as you possibly want but that

doesn’t mean that it’s going to perform well for any sort of external or internal validation data so if you have

a model where all of your trading data is just really you know almost nearly

perfect but then you split out a validation set or a test data set and

you see it perform absolutely horribly even though that data in your is is the

same type of data from the same types of fruits and all sort and all those you

know everything it’s all the same data from your original data set um and and you see something like this

that is a really good indicator that your model is over fit which means it’s just it’s too hyper specific for the

exact data and the training set it’s it’s not looking at it’s not broad enough to understand any sort of new

data that is that is uh given to it so you really want to avoid these situations and the way you avoid that is

by doing validation this is why validation is important because if you were only to look at the trading data

here you would say hey this is the best model I’ve ever built I can’t wait to deploy this and everyone’s going to be

so happy with it but then when people go to actually use it they’ll realize very

quickly that this is indeed actually not a good model it’s a very overfit model so you really want to avoid both of

these scenarios and the best way to do it is by doing internal validation while you’re building the models and while

you’re optimizing the models and now with the external validation I

External Validation

kind of mentioned that that bias issue and so this is kind of how that would

present itself so all your training data would look still look good it’s still

you know very aligned with the ideal line and it’s it’s all performing very well low low average air prediction

error but your training data you know the r squared is still not is not too

bad it’s still it’s still actually pretty aligned and everything the only problem is everything is predicting high

and so your rmse your error is going to be naturally high and so there’s a lot

of factors that can cause this kind of prediction bias when you’re doing external validation

some common sources of bias are going to be if you are transferring a calibration

or a model to a different instrument or a different spectrometer type or even a

spec electrometer type of the same kind but is just you know it’s spectrometers

they’re not all exactly exactly the same um there are things like temperature and

IR Spectra are highly sensitive to temperature and not just environmental

temperature but also things like sensor temperature and the actual temperature of the sample that you’re measuring so

if you’re building a model based off of only you know room temperature fruit and

then you go to use it on a really super hot day out in the field or you go to use it in Cold Storage that model is

going to probably predict with a bias pH as the same way has the same effect

as as temperature on Spectra so if if you’re trying to

um you know model something that has a very variable pH then that could actually cause issues with uh

predictions uh the other more common factors are going to be season or the

year that you’re uh that the data is collected in the region where the data is from the cultivar or the variety and

then also your analytical or reference method so if you’re using a slightly different reference method than somebody

else that could then the way the model was built that can present itself as a bias as well

um so those are all things to consider and this is kind of what the symptom would look like in your data and so what

you would need to do if you see something like this in your validation uh in your external validation is that

you would then need to do that process of adding this data into the model and

repeating the external validation site and repeating it in the same conditions and the same you know all the same

variables as this first external validation but now you’ve added in some data to your model and retrained it so

now it might be do a better job of predicting with for those variables

um so that’s just another uh an example I wanted to bring up with you guys and

uh this this slide here um this is just something I wanted to talk about as far as the statistics uh

Using R^2 and RMSE in Combination

the combination of using to tip multiple statistics to get a better idea of how

well a model is performing so these are all this data by the way that

I’ve been showing you is just random data I generated in Python it’s not um not actual model data of any kind it’s

just all for example purposes um but I just wanted to show you like kind of three common scenarios that

you’ll see when you’re looking at training and validation data um and so the first one is going to be

the ideal which is a high r squared and a low rmse so the higher the r squared

the better the goodness of fit right the lower the rmse the lower the average error so that’s actually the combination

we want there’s a high r squared and a low rmse and that would look something like this graph on the left

so both our training and our validation sets have high r squared and low rmse so

they have similar statistics we want them to be you know looking almost

nearly identical we don’t really want to see a huge difference in our validation

set from our training data set now the absolute I guess worst case

scenario the complete opposite of this is when you get low r squared and low rmse

this can present itself in any number of ways it can just be a random smattering

of data like this that has absolutely zero correlation it can also present itself as something like a little cloud

of data that is positioned uh horizontally across the line

um and and that that would give you an indication that you know something’s wrong here we need

to go back and we need to actually retrain the model look at our we actually go back to even before that and

look at our data set itself make sure there’s no entry errors remember from the last time I mentioned that you know

keeping your data organized is one of the more important parts of this model building process if you have one

misalignment where you are associating the wrong analytical or reference value to the uh wrong Spectra then that can

cause all the other data points to go out of alignment and it can cause

something that looks like this so you’ll you want to go back and really retrace your steps when you get a

situation like this middle one um and then what is I would say uh kind

of a more common scenario is this High r squared and a high rmse

so you could look at this data set and you could accidentally just just be using r squared as a metric and you

could say oh yeah this model is great it’s fine it’s got r squared of 0.8 we should definitely be using that but if

you actually also calculate the rmse you can see that it’s got rmsc of about 1.5

now this was something like bricks or uh or something that are like

titratable acidity this might be way too high of an error for you in practice

like this might actually be an error that is unacceptable as far as your acceptance criteria when

you’re doing your analytical test so this is why we want to use this these

statistics in combination is because it’ll help us identify whether or not the model is actually something we want

to use in practice um so you can’t just get uh you know one

of these statistics and say it’s a bad model we could also have low r squared

and a relatively good rmse so uh you know there that that is a fourth

scenario that that does happen where you might look at the data and say oh you

know if the r squared is only 0.6 or point you know something like that but

our rmsc is less than one um now that that can happen and so that

that might be you know still an acceptable model for you because the error might be more important than the

actual correlation or goodness of fit but you know regardless it’s it’s the point

of this all is that we need to be using statistics in combination with each

other and not just a singular statistic to evaluate whether or not a model is

robust enough to be then deployed so these were all just examples I just

wanted to you know this is intended to be a quick kind of overview this is not by any means be uh you know the the full

library of and wealth of information there is on how to do model validation uh but this is how this is all the kind

of the most common things you see when you’re doing model validation with spectroscopy uh with agricultural

Commodities and so just wanted to kind of give you guys that base of information when you go into your model

building you can then have this kind of knowledge before you go into it knowing what you need to do throughout the

process so that being said

that concludes the first part of our model validation uh um uh uh section of

NIR & Chemometrics Webinar Series

the webinar so we’ve gone through all of our model building we’ve gone through part one of our our part uh four of I

guess the series but our first part of the model validation and what we’ll talk about in the next webinar is going to be

more things associated with challenges that we encounter with model validation

model building calibration transfer um basically just kind of the most

common problems and challenges that we face and what we can do to mitigate

those problems so that’ll be a really good uh webinar for people that are you

know struggling with their model building process and might have questions about you know uh their

challenges that they’re encountering and this might help provide some insight and then the last part of the webinar will

just be about how to maintain your model and keep it optimized uh and uh and then

after that you guys will be ready to build your own robust models and go out there and use them and help

revolutionize the agricultural industry so that concludes today’s webinar and if

you would like any information about our F750 or f751

um uh spectrometers or our CI 710s Leaf spectrometer

um then you can follow this link Susie will post this link in the chat and feel

Q&A

free to actually click on that you can request a quote you can request more information as always feel free to

go to our website we have a relatively newer website and you can go there you

can book consultations with myself or our application scientist Kendra you can

also find a lot more information on that website articles Publications we have compilations of

published literature if you’re interested to see how other people have used our instruments and want to compare

your application to other people’s um you can download a whole compilation of published papers that used our

instruments and we have them for each of our instrument lines so um yeah thank you all so much for

joining today I’m going to go ahead and jump into the Q a section right now and we’ll go ahead and uh yeah we’ll just

kind of start answering some questions so the first question is from andrit uh

are these statistical methods k-fold cross-validation holdout validation included in the software that comes with

the F750 so the app builder software uh you do have the ability to do either of

these validations the holdout validation is going to be the easiest because it is

uh um you simply click click which data is in test which data is in train

um the k-fold cross-validation is going to be more involved we don’t have that as an automated process in app builder

that would have to be a manual process where you select you do you run multiple

iterations where you’re manually selecting which data is in which uh test set for each iteration

um so both are possible at app builder k-fold cross-validation is not a fully automated method

so thank you for that question um so uh Umesh has asking do we have any

in-house kumometric software which we can use free of costs we do have an app builder software which is specifically

geared towards building models using our instruments uh using the F750 and the

f751 and it and it is free to download from our website um you uh it is all open source so you

may be able to use it for uh non-felix instruments uh spectrometers however it

might honestly just be easier for you to use if you um don’t own a Felix instrument

spectrometer it might be easier for you to use something like r or python

for that model building uh

uh in the next question is uh saying it’s interesting and much informative

thank you and the next question is about a certificate for attending the webinar

um yes we can if you if you are in need of a certificate feel free to reach out to Susie and we can generate one for you

uh the next question this is a great question uh do we generally have to build a model for each fruit cultivar

so uh this is actually a very in-depth question it’s something I want to talk about in the next webinar actually but

just to give you a little bit of a sneak peek into that um sometimes yes sometimes no and I know

that’s not the answer you probably want to hear but uh uh if you always want the absolute best

performance then I would highly recommend you build a unique model for every single possible variable so you

could have a a model for each cultivar you could have a model for different regions all and that’s going to give you

the best possible performance however how you know how useful is that in

practice you know it might not be that useful it might be actually kind of cumbersome for people to try to use

something that has you know 50 different models on it um also having a device that can that

can hold 50 different models for the same fruit um is going to be a challenge in and of

itself so what we go for and what we’re trying to do is we want to try to include as many

fruit Cults of ours into a single model as we can and using the more advanced neural

network deep neural network techniques then create a model that can easily predict you know cultivar independent so

it can just easily just any cultivar that you’ve included in your training data set it will be able to predict fine

for those they’re uh you know for example I guess a good example would be something like

uh in the avocado space uh you know when you’re combining something like a half

avocado with a smooth skin avocado sometimes it can work really well

sometimes it doesn’t work that well so pass and Shepherd tend to be you know

avocados that if we include them in the same model they predict pretty well for each other you gotta include something

like Fuerte or other smooth skin varieties and it kind of starts not

predicting as well so you might want to make an individual model for that for that specific cultivar so it’s really

gonna have to be a decision you make on what the end use of the instrument is

going to be and how much work you want to put into building the models

uh so the next question from Dr Suna congrats to me thanks for the presentation uh Dr sudo thank you for

being here I appreciate it um and uh thanks for joining uh the next

question is uh which one would you prefer Pearson correlation just the r or the r squared

um I always use we always use r squared it seems to be the most commonly used

um to to look for goodness of fit you know there is obviously the five statistics that I touched on there is

obviously way more than that that you can use um but if you want you what I would do

if I were you is if you’re interested in seeing the Pearson correlation the r then you can just add that in as another

one of your statistics and then use it as just another tool to help you assess

your model but not on its own and I don’t think it has to be one or the other it never has to be one or the

other when it comes to just statistics you can use all of them um to to help give you a better idea of

how the model is performing so thank you that question and I don’t see any other questions so

with that thank you all for joining the webinar today and we look forward to seeing you at the next one and I hope

you all have a great rest of your day

Future Proof Your Quality: Validating a Model Part 1 – Relevant Statistics and Types of Validation Testing

Full Transcript:

Related Reading

Future-Proof Your Quality: How to Harness the Power of Spectroscopy in Commercial Agriculture – Pt. 3

Avocado Quality Meter Quick Start Guide

Avocado Quality Meter Live Training

F 751 Avo Calibration Tutorial