TL;DR

This blog studies action looping in LLM-based agents and shows that looping is not a sudden failure, but the outcome of a gradual collapse in action entropy in Section 3.

As illustrated in Figure 1(a), the action distribution progressively deviates from non-loop behavior tens to hundreds of turns before repetition begins. We operationalize this observation by comparing a recent sliding window of action entropy against a non-loop baseline using a Mann–Whitney U test, yielding a statistically grounded divergence score (Figure 1(b)). This enables reliable online early warning of loop risk.

Figure 1. Action entropy diverges gradually from non-loop behavior long before loop onset and can be detected online by statistically comparing recent actions with a non-loop baseline using a Mann–Whitney U test.

Building on this signal, In Section 4, we evaluate lightweight inference-time interventions. While adaptive temperature can effectively break loops by increasing action diversity, summarization offers a more stable and principled approach that better preserves accuracy for strong models. Overall, we argue that agent looping should be treated as an online control problem rather than a post-hoc bug.

</aside>

1 Introduction

Large Language Model (LLM)-based agents have demonstrated impressive capabilities in complex interactive environments, ranging from software engineering tasks, e.g. SWE-bench, to general purpose tool-using systems. Despite their success, these agents exhibit failure modes that differ fundamentally from those seen in classical reinforcement learning systems. One particularly disruptive and under-explored failure mode is looping behavior [1], where an agent repeatedly emits identical or equivalent actions without making progress toward task completion. See an example as follows. Once an agent enters such a loop, recovery is rare - leading to wasted computation and degraded performance, as the agent keeps looping until timeout or hits the maximum output length.

Example of command loop.

Most existing approaches address looping in an ad-hoc manner by limiting repeated actions, truncating trajectories, or simply aborting the run. While these heuristics may reduce looping, they offer little insights into why it occurs. For now, looping is treated as an isolated bug rather than a signal of deeper patterns in LLM decision-making.

To this end, we raises a more fundamental research question:

<aside> 💡

Is looping a sudden, unpredictable failure - or does it build up gradually in a way we can detect?

</aside>

In this work, we study looping from a dynamics and contro perspective. Empirically, we observe that before entering a loop, agents often pass through a long transition phase where their actions grow increasingly confident and predictable. We quantify this by tracking the collapse of action entropy, derived from token-level log-probabilities.

This raises a practical question: can entropy serve as an early warning for looping? Our goal is not just to detect loops after they occur, but to anticipate them early enough to intervene. This reframes looping from post-hoc diagnosis to real-time monitoring and control.

We also investigate how different interventions address the underlying failure. Specifically, we propose two strategies: (i) Action-level Control, which modifies the token sampling distribution (e.g. via adaptive temperature), and (ii) State-level repair, which restructures the agent’s context history (e.g., via summarization)

Our experiments show that both strategies can reduce looping: both adaptive temperature and summarization reduce loop rates for Klear-8B-SFT (26.7% → 22.9% / 24.5%) and Qwen3-Coder-30B-A3B (13.7% → 8.2% / 9.6%). However, they show notably different trade-offs in task success and cross-model generalization. This suggests that looping is not merely a sampling artifact, but is tightly coupled with how agents represent and reason about their evolving state.

More broadly, this study contributes to a growing line of research that treats LLM agents not just as black-box text generators, but as dynamical systems whose failures can be understood, predicted, and controlled. By linking looping behavior to entropy dynamics and control mechanisms, we aim to move beyond ad-hoc fixes toward a more principled understanding of agent reliability.

2 Preliminaries: LLM-based Agent and Its Looping Behaviors

In this section, we first formalize LLM-based agents and its looping behavior and then showcase the looping behaviors across a wide range of models and agent workflows.

TL;DR

Table of Contents

1 Introduction

2 Preliminaries: LLM-based Agent and Its Looping Behaviors

2.1 LLM Agent as a Sequential Decision Process.