What Molecular Structure Cannot Tell Us: A Taxonomy of Explainability Gaps in GNN-Based Drug Toxicity Prediction

View PDF HTML (experimental)

Abstract:Graph Neural Networks (GNNs) have emerged as a structurally natural approach for molecular toxicity prediction, operating directly on atomic connectivity without the information loss inherent to fixed-length fingerprints. However, the fraction of a drug's known pharmacological profile that is actually encodable in its molecular structure remains systematically underexplored. This study addresses this question through a systematic case study using acetylsalicylic acid (ASA, Aspirin) - one of the most comprehensively characterized drugs in pharmacology - as a model compound. A Message Passing Neural Network (MPNN) is trained on the Tox21 benchmark and GNNExplainer is applied to characterize atom-level attribution. Results indicate that molecular structure explains approximately 45% (5/11) of known ASA adverse effects. A four-category Gap Taxonomy (GAP-1 through GAP-4) is introduced distinguishing between principally non-encodable effects, data gaps arising from Missing Not At Random (MNAR) mechanisms, assay panel mismatches, and representation errors. The MNAR gap is empirically quantified via a systematic ChEMBL query (42 documented assays, 0 retrievable bioactivity entries). An attention pooling experiment localizes the representation error to the MPNN message passing layers rather than the aggregation step. The Gap Taxonomy has direct implications for drug safety signal detection workflows and regulatory frameworks including Good Pharmacovigilance Practice (GVP) guidelines and New Approach Methodologies (NAMs).

Submission history

From: Juergen Dietrich [view email]
[v1] Mon, 25 May 2026 07:51:15 UTC (13 KB)