Abstract:Q-learning is a fundamental algorithmic primitive in reinforcement learning. This paper develops a new framework for analyzing Q-learning from a switching linear system (SLS) viewpoint. In particular, we derive a stochastic SLS representation of the Q-learning error, and a finite-time error analysis through the joint spectral radius (JSR) of the corresponding SLS model, where the JSR is the exact worst-case exponential rate of the associated SLS. To the best of our knowledge, this is the first convergence rate analysis of standard Q-learning whose leading exponential rate is expressed through the JSR. The resulting rate is tied to the intrinsic worst-case exponential rate of the direct SLS representation and can be sharper than row-sum upper bounds when those bounds are conservative.
From: Donghwan Lee [view email]
[v1]
Tue, 21 Apr 2026 15:22:42 UTC (18 KB)
[v2]
Sun, 3 May 2026 13:59:04 UTC (36 KB)
[v3]
Tue, 5 May 2026 14:00:33 UTC (35 KB)
[v4]
Tue, 30 Jun 2026 06:39:04 UTC (68 KB)