Shashi Vishwakarma
2 min readDec 30, 2023

Hello World with K-Fold Cross Validation in Data Science

Welcome guys to another post today where we are going to discuss about important statistical technique in data science i.e K-fold cross validation.

K-Fold Cross-Validation is a statistical method used to estimate the skill of machine learning models. It’s primarily used in situations where the objective is to predict the performance of a model on new, unseen data.

It helps reduce bias and overfitting, leading to more generalizable models.

Usually in K-fold validation we divide dataset in k subset and use it for model evaluation . Lets take example where k = 3 i.e you divide dataset into 3 parts , one part is use for testing and other parts are used for model training.

Fit a model on the training set and evaluate it on the test set. Retain the evaluation score and discard the model.

The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop

Below is sample code where we apply Logistic regression on input dataset followed by K-fold cross validation.

import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score



X , y = # Read your dataset here .

# As we have taken example of 3 split
k = 3

# Create KFold object
kf = KFold(n_splits=k, shuffle=True, random_state=42)


model = LogisticRegression()

accuracy_scores = []

for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracy_scores.append(accuracy)

average_accuracy = np.mean(accuracy_scores) # Average Accurayc for each step
print(f"Average Accuracy: {average_accuracy:.2f}")

K-fold cross validation helps to reduced bias and helps in accessing model robustness. Its always a good practice in data science where once you train your model , you perform k fold cross validation to understand model robustness.

Hope you like blog.

Keep Learning and Keep Sharing ..!!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Shashi Vishwakarma
Shashi Vishwakarma

Written by Shashi Vishwakarma

Senior Software/AI Engineer , Technical Writer

No responses yet

Write a response