Study Evaluates Machine Learning Models for Health Insurance Risk Classification
Researchers tested three ensemble algorithms on a dataset of 59,381 insurance applicants to measure accuracy, fairness, and interpretability in underwriting decisions. The analysis compared performance across binary, three-class, and eight-class risk settings and examined disparities by age and body mass index.
ncbi.nlm.nih.govA peer-reviewed study published on 19 May 2026 examined machine learning methods for classifying insurance applicants into risk categories. The work focused on balancing predictive accuracy with fairness and explainability in health insurance underwriting.
Three ensemble models were tested on a benchmark dataset of 59,381 applicants. Researchers applied Random Forest, XGBoost, and LightGBM across binary, three-class, and eight-class risk classification tasks.
XGBoost recorded the highest test accuracy of 0.831 and Matthews Correlation Coefficient of 0.624 in the binary setting. Performance declined as the number of risk classes increased. Body Mass Index and applicant age together accounted for more than 40 percent of total model importance. Feature selection used the Boruta algorithm to reduce the input space.
Fairness metrics showed mild differences across age groups and larger differences across BMI categories. Statistical Parity Difference and Equal Opportunity Difference were used to quantify these disparities. Bootstrap resampling over 1,000 iterations and threshold sensitivity tests from 0.1 to 0.9 indicated stable performance.
Ranking Generalisation Assessment confirmed consistent model behavior under sampling variations. The study provides a framework that combines accuracy, interpretability via SHAP values, fairness audits, and robustness checks for potential use in insurance underwriting.


