Shashi Vishwakarma
2 min readDec 30, 2023

--

Hello World with K-Fold Cross Validation in Data Science

Welcome guys to another post today where we are going to discuss about important statistical technique in data science i.e K-fold cross validation.

K-Fold Cross-Validation is a statistical method used to estimate the skill of machine learning models. It’s primarily used in situations where the objective is to predict the performance of a model on new, unseen data.

It helps reduce bias and overfitting, leading to more generalizable models.

Usually in K-fold validation we divide dataset in k subset and use it for model evaluation . Lets take example where k = 3 i.e you divide dataset into 3 parts , one part is use for testing and other parts are used for model training.

Fit a model on the training set and evaluate it on the test set. Retain the evaluation score and discard the model.

The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop

Below is sample code where we apply Logistic regression on input dataset followed by K-fold cross validation.

import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score



X , y = # Read your dataset here .

# As we have taken example of 3 split
k = 3

# Create KFold object
kf = KFold(n_splits=k, shuffle=True, random_state=42)


model = LogisticRegression()

accuracy_scores = []

for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracy_scores.append(accuracy)

average_accuracy = np.mean(accuracy_scores) # Average Accurayc for each step
print(f"Average Accuracy: {average_accuracy:.2f}")

K-fold cross validation helps to reduced bias and helps in accessing model robustness. Its always a good practice in data science where once you train your model , you perform k fold cross validation to understand model robustness.

Hope you like blog.

Keep Learning and Keep Sharing ..!!

--

--