Let's see how TRM reasons

October 31, 2025

I tapped into the Tiny Recursive Model by Alexia Jolicoeur-Martineau to see how it reasons on ARC. A few months ago, the Hierarchical Reasoning Model (HRM) showed that you can get quite far on ARC with only a 27M model when you 1) use Test Time Training, 2) do a lot of recurrent updates in latent space, 3) are clever about which recurrent updates get backproped through, and 4) split the recurrent state into a frequently updated part and an infrequently updated part (hence “hierarchical”). Although the hierarchical part made it to the paper title, an independent analysis later showed that it has minimal impact on performance. Then, the Tiny Recursive Model (TRM) simplified this recurrent process, keeping only a single state (along with several other improvements). I highly recommend reading both of these papers (you can skip the section about “brain correspondence” in HRM).

What I’ve been interested in lately is the notion of compositionality (aka compositional generalization) in inductive approaches to ARC. If you want to read more about compositionality, the ExeDec paper is relatively nice. In short, let’s say we train a model to output a vector given the ARC demonstrations pair, where that vector somehow captures the shared transformation between demonstration inputs and outputs - it’s the latent program representation (a la LPN). Now, we want this program encoder to generalize to unseen evaluation tasks with completely different transformations. In hand-wavy terms, the model has a much higher chance to achieve this (read: any chance at all) if it understands a transformation as a composition of simpler concepts rather than one monolithic transformation. By concepts, think “move red object by 3 pixels to the right” or “draw a line between objects”. These factorized concepts can be recombined in novel ways, hopefully generalizing to unseen transformations. Another way to say this: the space of concepts is much simpler than the space of full transformations, and we’d like to model the simpler space based on the very limited training samples.

There are many things to try as inductive biases for compositionality. However, before we do that, we should look at whether this does not arise naturally from data. Could it be that the recurrent updates of HRM/TRM correspond to simple, generally useful concepts? What do the intermediate recurrent states look like? Let’s find out!

TRM iteratively updates a latent “thinking” state \(z\), mixing in the input \(x\) and occasionally updating the latent answer \(y\). There are 16 steps performed between each backprop (let’s call them supervision steps). Each supervision step consists of 3 H-steps, which update the answer \(y\). Recursively, each H-step consists of 4 L-steps, which only update the thinking state \(z\). In my fork of TRM, I’m decoding the ARC grid after each L-step - effectively asking what the final output would look like if we truncated the thinking process at that point.

All of the data is visualized below (you can also explore it as a standalone web app). I think it could be useful to go through all the puzzles to deeply understand the failure modes of TRM, Karpathy-style. For now, a few observations:

The main problem seems to be converging to incorrect local optimum. See e.g. 17cae0c1, 2f0c5170, a57f2f04, 1a6449f1, where the recurrent latent update converges (h_cos_dist = 0) to something that makes some sense but is not correct. The recurrence found a fixed point and cannot recover.
Converging to an incorrect local optimum can often be detected - the cosine distance between consecutive latent states (h_cos_dist) drops to 0 (or oscillates below \(\epsilon\)), but the halting head estimating whether we are good to stop (halt_logit) stays negative (or is slightly positive, but less than 1.0, while correct solutions mostly have more than 1.0).
Sometimes the thinking state does not converge but oscillates in a cycle (fafd9572 - look at h_cos_dist).
Going the full 16 steps during evaluation instead of respecting the halting signal can sometimes hurt us, as the model changes a correct solution to an incorrect one (4e469f39).
On ARC-2, TRM is fighting bravely, but is not strong enough: 16b78196, d8e07eb2.

Regarding the first two points, the first thing I wanted to try was detecting local optima and injecting some noise to divert the process in another direction. If this worked, maybe we could do some fancy tree search or (excuse my language) an evolutionary algorithm!! As a proof of concept, I injected a significant gaussian noise to the thinking state at step 5 - you can see the results when you select the Perturbed option from the Eval Version dropdown below. Unfortunately, the model is stubborn about where it wants to go given a puzzle input. See e.g. 17cae0c1, a59b95c0, 3ee1011a, 96a8c0cd - even though the thinking state is completely destroyed at step 5, the model again converges to the same (incorrect) local optima. Ideally, I’d like to test pushing the model in the correct direction instead of a random one and see if it likes the new path better, but since TRM has no grid encoder, I can’t think of a straightforward way to do this.

Have fun!