Hello World with K-Fold Cross Validation in Data Science

2 min readDec 30, 2023

Welcome guys to another post today where we are going to discuss about important statistical technique in data science i.e K-fold cross validation.

K-Fold Cross-Validation is a statistical method used to estimate the skill of machine learning models. It’s primarily used in situations where the objective is to predict the performance of a model on new, unseen data.

It helps reduce bias and overfitting, leading to more generalizable models.

Usually in K-fold validation we divide dataset in k subset and use it for model evaluation . Lets take example where k = 3 i.e you divide dataset into 3 parts , one part is use for testing and other parts are used for model training.

Fit a model on the training set and evaluate it on the test set. Retain the evaluation score and discard the model.

The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop

Below is sample code where we apply Logistic regression on input dataset followed by K-fold cross validation.

import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score



X , y = # Read your dataset here .

# As we have taken example of  3 split 
k = 3

# Create KFold object
kf = KFold(n_splits=k, shuffle=True, random_state=42)


model = LogisticRegression()

accuracy_scores = []

for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracy_scores.append(accuracy)

average_accuracy = np.mean(accuracy_scores)  # Average Accurayc for each step
print(f"Average Accuracy: {average_accuracy:.2f}")

K-fold cross validation helps to reduced bias and helps in accessing model robustness. Its always a good practice in data science where once you train your model , you perform k fold cross validation to understand model robustness.

Hope you like blog.

Keep Learning and Keep Sharing ..!!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Shashi Vishwakarma

10 Followers

5 Following

Senior Software/AI Engineer , Technical Writer

No responses yet

Write a response

What are your thoughts?

Also publish to my profile

Recommended from Medium

Data Science All Algorithm Cheatsheet 2025

Artificial Intelligence in Plain English

Ritesh Gupta

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

Jan 5

1.4K

Sentiment Analysis of Online Reviews with Different Lexicons using R

Marketing Data Science with Joe Domaleski

Sentiment Analysis of Online Reviews with Different Lexicons using R

This is the third article in a series that explores the topic of sentiment analysis using R. Sentiment analysis is a powerful technique…

Oct 6, 2024

Lists

Predictive Modeling w/ Python

20 stories1857 saves

Practical Guides to Machine Learning

10 stories2225 saves

Natural Language Processing

1977 stories1620 saves

data science and AI

40 stories340 saves

20 Cutting-Edge Statistical Techniques Every Data Scientist Should Master in 2025

The Data Beast

20 Cutting-Edge Statistical Techniques Every Data Scientist Should Master in 2025

In today’s fast-paced data world, traditional methods are evolving rapidly. In 2025, the fusion of classical statistics, AI, and modern…

6d ago

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jessica Stillman

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Oct 30, 2024

25K

732

My Data Scientist — 2 Interview Experience at Zepto

Ajit Kumar Singh

My Data Scientist — 2 Interview Experience at Zepto

Hey everyone! 👋

Jan 31

How Does Our Sense of Humor Change With Age? A Statistical Analysis

Fanfare

Daniel Parris

How Does Our Sense of Humor Change With Age? A Statistical Analysis

How do our comedic sensibilities form and transform over time?

Jun 22, 2024

343

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams