EXPLORATORY EVALUATION OF MACHINE LEARNING ALGORITHMS IN SICKLE CELL GENOTYPE DETECTION FROM HAEMORHEOLOGICAL PARAMETERS DATASET OBTAINED IN NIGERIA

Authors

  • Rahman Abiodun Olalekan Federal University of Technology Akure, Nigeria
  • Ilesanmi Paul IGE Department of Medical Laboratory Science, Federal University of Technology Akure, Nigeria
  • Oludare Alani AGBEYANGI Department of Public Health, Federal University of Technology Akure, Nigeria
  • Daniel Paditeiye REUBEN Department of Medical Laboratory Science, Achievers University Owo, Ondo state, Nigeria

DOI:

https://doi.org/10.5281/zenodo.18104588

Keywords:

Machine learning, sickle cell disease, haemorheological parameters, genotype prediction, Random Forest, Support Vector Machine, Neural Network

Abstract

Sickle cell disease (SCD) patients have characteristic abnormal haemoglobins that cause red blood cells to become sickle-like in shape, leading to various complications. Early detection is desirous, yet existing diagnostic methods require high cost and deep learning curves. This study evaluated the potential of three machine learning (ML) algorithms—Random Forest, Support Vector Machine (SVM), and a Neural Network—in detecting sickle cell genotypes (SS, AS, AA) from a Nigerian dataset of 54 participants using haemorheological parameters. We employed a stratified 5-fold cross-validation methodology to ensure reliable performance evaluation. The Random Forest and SVM models achieved the highest mean accuracy at 90.9% ± 5.8%. Feature importance analysis confirmed Packed Cell Volume (PCV) as the most discriminative parameter, followed by Plasma Viscosity (PV) and Age. While all models demonstrated high sensitivity in identifying sickle cell anaemia (SS), they consistently failed to correctly classify the sickle cell trait (AS), a critical limitation highlighted by the validation. Our findings suggest that ML leveraging routine lab parameters is a promising screening tool for sickle cell disease, but is not yet viable for comprehensive genotype classification due to challenges with small dataset size and class imbalance. Future works need to focus on acquiring larger, more balanced datasets to improve the detection of the AS trait. 

Downloads

Download data is not yet available.

Downloads

Published

2025-12-31