Hello World with K-Fold Cross Validation in Data Science
Welcome guys to another post today where we are going to discuss about important statistical technique in data science i.e K-fold cross validation.
K-Fold Cross-Validation is a statistical method used to estimate the skill of machine learning models. It’s primarily used in situations where the objective is to predict the performance of a model on new, unseen data.
It helps reduce bias and overfitting, leading to more generalizable models.

Usually in K-fold validation we divide dataset in k subset and use it for model evaluation . Lets take example where k = 3 i.e you divide dataset into 3 parts , one part is use for testing and other parts are used for model training.
Fit a model on the training set and evaluate it on the test set. Retain the evaluation score and discard the model.
The performance measure reported by k-fold cross-validation is then the average of the values computed in the loop
Below is sample code where we apply Logistic regression on input dataset followed by K-fold cross validation.
import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X , y = # Read your dataset here .
# As we have taken example of 3 split
k = 3
# Create KFold object
kf = KFold(n_splits=k, shuffle=True, random_state=42)
model = LogisticRegression()
accuracy_scores = []
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracy_scores.append(accuracy)
average_accuracy = np.mean(accuracy_scores) # Average Accurayc for each step
print(f"Average Accuracy: {average_accuracy:.2f}")
K-fold cross validation helps to reduced bias and helps in accessing model robustness. Its always a good practice in data science where once you train your model , you perform k fold cross validation to understand model robustness.
Hope you like blog.
Keep Learning and Keep Sharing ..!!