arxiv:2606.27378

Formalizing Latent Thoughts: Four Axioms of Thought Representation in LLMs

Published on May 7

· Submitted by

Fahd Seddik on Jun 29

University of British Columbia - Okanagan Campus

Upvote

Authors:

Fahd Seddik ,

Abstract

An axiomatic evaluation framework reveals systematic failures in latent thought representations of LLMs across multiple reasoning tasks, demonstrating that current representations fail to satisfy fundamental functional axioms consistently across different model architectures.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

We introduce an axiomatic evaluation framework for latent thought representations in LLMs, comprising metrics that are independent of downstream benchmark scores and reveal representational failures that benchmark accuracy masks. Existing evaluations conflate representation quality with model capacity. Therefore, failures cannot be attributed to the representation rather than to the model that processes it. We formalize four functional axioms (Causality, Minimality, Separability, and Stability) and define a quantitative measure for each, computed directly on the representation independently of downstream accuracy. We audit open-weight LLMs across 23 reasoning tasks (e.g., Spatial Reasoning, Factual QA). We find that no candidate satisfies all four axioms simultaneously, that the representations distinguish task type reliably but cannot distinguish between two questions within the same task, and that the representations encode little information beyond what is already present in the input embedding. The failure is consistent across dense, reasoning-distilled, and RL-trained model families, indicating that the gap is structural rather than a property of model size or training procedure.

View arXiv page View PDF Project page GitHub 1 Add to collection

Community

FahdSeddik

Paper author Paper submitter about 17 hours ago

Rayssachiqueto

about 9 hours ago

It would work better if you used Asolaria ASI as A Nerual network. Uses 0 gpu

urroxyz

about 9 hours ago

•

edited about 9 hours ago

One axiom I'd add: necessity.

It checks that the model actually uses the latent state. Without it, a model can sometimes make $T$ look good while routing the real answer through residual prompt information or decoder priors.

To support this axiom, I would append three additional checks in a real training/optimization framework beyond the paper.

First, use latent ablation necessity. After the model produces $T$, corrupt it, swap it with another example’s $T$, zero it, or inject Gaussian noise. If answer quality barely changes, the model is not using the latent thought. A good latent reasoner should degrade gracefully under small noise but fail under semantic swaps.

Second, use counterfactual latent intervention. Take two minimally different inputs $x$ and $x'$ with different answers. Swap their latent thoughts. If the model follows the swapped latent state, $T$ is causally active. If it ignores the swap and answers from the prompt alone, the latent state is decorative.

Third, use multi-window causality, not only final-answer causality. Do not train $T$ only to predict the final answer. Split explicit reasoning traces into many windows and require latent states to substitute for intermediate reasoning prefixes. Otherwise, a latent vector that encodes only “answer = 42” could pass some final-output tests without representing the reasoning process.

$T$ can be skipped if it never helps the model in a predictable situation to save compute with early exiting.