INTERPRETABLE BINARY CLASSIFICATION MODELS USING XAI AND FEW DESCRIPTORS FOR PREDICTING BLOOD-BRAIN BARRIER PERMEABILITY OF PHARMACEUTICAL COMPOUNDS BASED ON RESAMPLING, CLUSTERING, AND MACHINE LEARNING METHODS

Aubin N’guessan; Ludovic Akonan; Ludovic Akonan; Jean-Louis Kouakou Kouakou; Logbo Moussé; Melalie Kéita; Raymond Kré; Nahossé Ziao; Eugène Megnassan

doi:10.22270/ujpr.v10i5.1420

Aubin N’guessan Fundamental Applied Physics Laboratory (FAPL), NanguiAbrogoua University, Côte d’Ivoire.
Ludovic Akonan Fundamental Applied Physics Laboratory (FAPL), NanguiAbrogoua University, Côte d’Ivoire.
Ludovic Akonan Fundamental Applied Physics Laboratory (FAPL), NanguiAbrogoua University, Côte d’Ivoire.
Jean-Louis Kouakou Kouakou Fundamental Applied Physics Laboratory (FAPL), NanguiAbrogoua University, Côte d’Ivoire.
Logbo Moussé Fundamental Applied Physics Laboratory (FAPL), NanguiAbrogoua University, Côte d’Ivoire.
Melalie Kéita Fundamental Applied Physics Laboratory (FAPL), NanguiAbrogoua University, Côte d’Ivoire.
Raymond Kré Fundamental Applied Physics Laboratory (FAPL), NanguiAbrogoua University, Côte d’Ivoire.
Nahossé Ziao Laboratory of Thermodynamics and Physico-chemistry of the Environment, Nangui Abrogoua University, Côte d’Ivoire.
Eugène Megnassan Fundamental Applied Physics Laboratory (FAPL), Nangui Abrogoua University, Côte d’Ivoire. QLS, ICTP-UNESCO, I 34151 Trieste, Italy.

10.22270/ujpr.v10i5.1420

Keywords:

blood-brain barrier permeability, curse of dimensionality, explainable AI, logBB, machine learning, QSAR

Abstract

Background: Designing pharmaceutical compounds to treat brain diseases, or drugs that interact with biological targets in peripheral organs without penetrating the blood-brain barrier, remains a very difficult task. It is evident that animal models are costly and unproductive; therefore, the pharmaceutical industries and/or regulatory bodies need reliable, accurate and interpretable predictive tools to assess the permeability of pharmaceutical compounds across the blood-brain barrier.

Method: This study proposes the development of artificial intelligence models characterized by greater accuracy and enhanced explanatory capacity, in the context of binary classification of blood-brain barrier permeability of drug candidate compounds. By applying a resampling approach and clustering technique, we developed five distinct artificial intelligence models support vector machine, k-nearest neighbor, classification and regression decision tree, random forest, and gradient boosting machine using only 10 molecular descriptors and a dataset of 1,726 molecular observations (comprising 1,000 originals and 726 synthetic compounds).

Results: Of all the models evaluated, Gradient Boosting Machine had the best 10-fold cross-validation statistics, achieving prediction accuracy (Q), MCC and AUC of 91.04%, 0.82 and 1.0 on the external test set respectively. The gradient boosting machine outputs are explained using Shapley additive explanation approach. This method allows the main modelling descriptors involved in predicting blood-brain barrier permeability to be ranked in order of importance.

Conclusion: Non-animal predictive models were designed to determine whether pharmaceutical compounds can penetrate the blood–brain barrier. The proposed model reached a reliable level of accuracy sufficient to prove extremely useful for virtual screening of large pharmaceutical compounds libraries. It revealed two key indicators for predictions: spatial distribution of atomic charges and electronegativity.

Peer Review History:

Received 3 August 2025; Reviewed 11 September 2025; Accepted 17 October; Available online 15 November 2025

Academic Editor: Dr. Amany Mohamed Alboghdadly, Ibn Sina National College for Medical Studies in Jeddah, Saudi Arabia, amanyalboghdadly@gmail.com