Abstract:Deep neural networks are vulnerable to adversarial perturbations that can simultaneously degrade prediction robustness and individual fairness across diverse application settings. However, existing evaluation protocols typically assess these dimensions in isolation, thereby obscuring critical failure modes. To bridge this gap, we formalize Robust Individual Fairness (RIF): under semantic-preserving (truth-condition-preserving) perturbations, predictions should remain both correct with respect to the ground truth and invariant across semantically equivalent individuals. To surface RIF violations in practice, we introduce RIFair, a black-box adversarial framework that leverages a decoupled perturbation strategy to construct semantically preserved yet unrobust and/or unfair instance pairs. Experiments across multiple model architectures and real-world textual datasets show that robustness-only or fairness-only metrics often miss Robust Biased and Unrobust Fair behaviors. RIFair}reliably exposes these hidden vulnerabilities, supporting RIF as a necessary criterion for trustworthy model assessment. The experimental code is publicly available at this https URL.
From: XuRan Li [view email]
[v1]
Mon, 1 Apr 2024 09:29:16 UTC (2,150 KB)
[v2]
Sat, 24 Jan 2026 10:11:20 UTC (1,195 KB)
[v3]
Sat, 30 May 2026 05:50:35 UTC (3,316 KB)
[v4]
Tue, 30 Jun 2026 01:09:01 UTC (3,316 KB)