29 Cross-Validation

Why We Need Cross-Validation?

R2 ,also known as coefficient of determination, is a popular measure of quality of fit in regression. However, it does not offer any significant insights into how well our regression model can predict future values.

One way to address this issue is to literally obtain a new sample of observations.

As an alternative, there exists a more practical procedure is cross-validation.

Cross-Validation

In cross-validation, the original sample is split into two parts.

One part is called the training sample, and the other part is called the validation sample.

For larger data sets, it is often best to split the sample in half.

For smaller samples, it is often best to split the sample 2/3 training, 1/3 validation ratio.

 

The Procedure

  1. Divide data into three sets, training, validation and test sets.
  2. Find the optimal model on the training set, and use the test set to check its predictive capability
  3. See how well the model can predict the test set
  4. The validation error gives an unbiased estimate of the predictive power of a model

 

Python 3 Example: Please click here to see the Python3 Example.

 

 

License

Building Skills for Data Science Copyright © by Dr. Nouhad Rizk. All Rights Reserved.

Share This Book