Abstract:Ensuring safe behavior in reinforcement learning (RL) is challenging when safety constraints are implicit and cannot be densely measured. In many settings, supervision is limited to coarse approvals or rejections of whole trajectories (e.g., whether a rollout remained within an unknown safety threshold). We propose TraCeS (Trajectory-based Constraint Estimation for Safety), a method for learning per-timestep violation credit from such sparse trajectory-level labels. TraCeS trains a sequential violation estimator whose per-step credits factorize the predicted probability that a trajectory has not yet violated the constraint, and integrates this learned signal into constrained policy optimization. The method requires neither a known cost function nor a known threshold, and remains compatible with standard continuous-control algorithms. We provide a theoretical analysis of the approximation gap introduced by the learning objective, and demonstrate empirically that TraCeS improves constraint satisfaction and feedback efficiency over baselines across multiple continuous-control benchmarks, including long-horizon tasks and settings with noisy or inconsistent labels.
From: Siow Meng Low [view email]
[v1]
Thu, 17 Apr 2025 01:11:08 UTC (1,494 KB)
[v2]
Wed, 23 Apr 2025 04:44:58 UTC (2,132 KB)
[v3]
Tue, 30 Jun 2026 02:48:09 UTC (9,940 KB)