Cross-Validation

Dr. Nouhad Rizk

29 Cross-Validation

Why We Need Cross-Validation?

R² ,also known as coefficient of determination, is a popular measure of quality of fit in regression. However, it does not offer any significant insights into how well our regression model can predict future values.

One way to address this issue is to literally obtain a new sample of observations.

As an alternative, there exists a more practical procedure is cross-validation.

Cross-Validation

In cross-validation, the original sample is split into two parts.

One part is called the training sample, and the other part is called the validation sample.

For larger data sets, it is often best to split the sample in half.

For smaller samples, it is often best to split the sample 2/3 training, 1/3 validation ratio.

The Procedure

Divide data into three sets, training, validation and test sets.
Find the optimal model on the training set, and use the test set to check its predictive capability
See how well the model can predict the test set
The validation error gives an unbiased estimate of the predictive power of a model

Python 3 Example: Please click here to see the Python3 Example.

29 Cross-Validation

Why We Need Cross-Validation?

Cross-Validation

The Procedure

License

Share This Book