Do AI Models Think Like Quantum Systems?

Author: Denis Avetisyan


A new analysis challenges claims of genuine quantum behavior in large language models, suggesting observed patterns may reflect contextual similarities rather than fundamental physics.

The paper critiques interpretations of CHSH inequality violations and Bose-Einstein distributions in AI language models, emphasizing the need for rigorous baselines and controls.

While intriguing parallels have emerged between the statistical properties of language and quantum mechanics, establishing genuine quantum phenomena requires careful scrutiny. This commentary addresses recent claims made in ‘Comment on arXiv:2511.21731v1: Identifying Quantum Structure in AI Language: Evidence for Evolutionary Convergence of Human and Artificial Cognition’, specifically concerning interpretations of CHSH calculations and Bose-Einstein fits to rank-frequency data. The analysis highlights that observed contextuality and phenomenological similarities do not definitively demonstrate underlying quantum mechanisms without more robust validation and comparative baselines. Ultimately, can these statistical resemblances truly illuminate the fundamental principles governing both artificial and biological cognition, or do they represent a fascinating, yet limited, analogy?


The Illusion of Local Reality

For centuries, the concept of locality stood as a cornerstone of physical understanding, asserting that an object is directly influenced only by its immediate surroundings – that cause and effect require a mediating connection and cannot propagate instantaneously across vast distances. This intuitive principle, deeply embedded in classical physics, dictates that a measurement performed on one particle cannot instantaneously affect the state of another, spatially separated particle. However, the advent of quantum mechanics introduced phenomena seemingly violating this fundamental tenet. Quantum entanglement, for example, demonstrates correlations between particles regardless of the distance separating them, suggesting a connection beyond the constraints of local realism. These correlations appear to manifest faster than light would allow, challenging the established framework and prompting investigations into whether the universe operates under principles more bizarre and interconnected than previously imagined.

Hidden variable theories arose as an attempt to reconcile the seemingly non-local predictions of quantum mechanics with the established principle of locality, which dictates that an object is only directly influenced by its immediate surroundings. These theories posit that quantum randomness isn’t fundamental, but rather a consequence of incomplete knowledge; underlying, yet undiscovered, variables determine the outcomes of quantum measurements. A crucial assumption within many hidden variable models is that of ‘factorization’ – essentially, the idea that spatially separated systems possess independent properties and that correlations arise only through local interactions. However, this factorization assumption, while intuitively appealing, proves vulnerable to mathematical scrutiny. The CHSH inequality, for instance, provides a rigorous test; experimental violations of this inequality demonstrate that the correlations observed in quantum systems cannot be explained by any theory relying on both locality and factorization, suggesting that at least one of these fundamental assumptions must be abandoned to fully describe the quantum world.

The Clauser-Horne-Shimony-Holt (CHSH) inequality serves as a crucial benchmark in discerning whether the universe adheres to the principles of local realism. This mathematical formulation establishes a limit on the correlations that can exist between measurements on entangled particles if both locality and realism hold true. Essentially, it predicts the maximum correlation achievable if particles possess definite properties independent of measurement and if influences cannot travel faster than light. However, experiments consistently demonstrate violations of this inequality – correlations exceeding the predicted limit. These results don’t merely challenge hidden variable theories; they fundamentally question the intuitive notions of how reality operates at the quantum level, suggesting that either locality, realism, or both must be abandoned to fully account for observed phenomena. The persistent breach of the CHSH inequality compels physicists to reconsider the very foundations of physical law and explore interpretations where interconnectedness transcends classical constraints.

Beyond the Limits: Exploring Supra-Classical Correlations

Generalized Probabilistic Theories (GPTs) represent a mathematical generalization of the standard Kolmogorov axioms of probability theory. This extension allows for the construction of probabilistic models that do not necessarily adhere to the constraints imposed by classical probability, such as the requirement for probabilities to be non-negative and sum to unity across all outcomes. By relaxing these classical constraints, GPTs provide a framework for investigating alternative probabilistic structures that may permit correlations and behaviors not possible within the standard framework; these can include super-quantum correlations and violations of Bell inequalities. The purpose of GPTs is not to necessarily describe physical reality, but rather to define the boundaries of what is probabilistically possible and to serve as a tool for comparing the predictions of quantum mechanics with the broader landscape of possible theories.

The PR box is a theoretical construct in generalized probabilistic theories representing the maximum amount of correlation permissible between two parties, Alice and Bob, each with two possible measurement settings. Specifically, the PR box assigns a probability of 1 to the outcomes where Alice and Bob’s measurements are perfectly correlated – either both obtain outcome 1, or both obtain outcome 2 – and a probability of 0 to all other outcomes. This perfect correlation is stronger than any achievable with local hidden variable theories, and while it does not allow for faster-than-light communication due to the no-signaling principle – meaning Alice’s choice of measurement does not affect Bob’s observed probabilities, and vice-versa – it demonstrates a fundamental limit on the strength of correlations allowed by quantum mechanics and serves as a benchmark for exploring non-classical phenomena.

The Tsirelson bound, established in 1993, mathematically constrains the degree of correlation permissible within any local hidden variable theory consistent with quantum mechanics. Specifically, it states that for any two binary measurements performed on entangled qubits, the correlation cannot exceed 2\sqrt{2} - 2. This contrasts sharply with the PR box, a theoretical construct exhibiting correlations of 1, which violates the Tsirelson bound and is therefore impossible to realize within the framework of local hidden variable theories or standard quantum mechanics. The bound is derived from the requirement that probabilities remain non-negative and sum to one, effectively defining the limits of how strongly quantum systems can be correlated without violating fundamental probabilistic principles.

Is Language a Quantum System After All?

Analysis of rank-frequency distributions in language, where word frequency is plotted against rank, demonstrates a hyperbolic relationship mirroring patterns observed in statistical mechanics. Specifically, the observed distributions do not conform to simple exponential decay but exhibit a power-law distribution, mathematically represented as P(r) \propto r^{-a}, where P(r) is the frequency of a word at rank r, and a is a scaling exponent. This is analogous to the distribution of energy levels in physical systems studied in statistical mechanics, and suggests that linguistic elements may be organized according to principles beyond simple, independent occurrence. The higher frequency words follow this distribution more closely than lower frequency words, implying a non-uniform distribution of linguistic “energy” across the lexicon.

Large Language Models (LLMs) facilitate the computational analysis of rank-frequency distributions in language by generating extensive text corpora and subsequently quantifying the occurrence of various linguistic units – such as words, phrases, or even character n-grams. These models, trained on massive datasets, enable researchers to move beyond manual counting and statistical inference on limited samples. LLMs can produce text exhibiting diverse stylistic and thematic characteristics, allowing for the generation of synthetic datasets tailored to specific analytical needs. The resulting frequency data, when subjected to statistical analysis, reveals underlying patterns in linguistic structure that can be compared against theoretical models derived from physics, such as those based on Bose-Einstein statistics. This computational approach provides a scalable and objective means of investigating the organization of language and testing hypotheses about its inherent properties.

Traditional analysis of word frequency in language often employs the Maxwell-Boltzmann distribution, assuming distinguishable particles (words) and classical statistical mechanics. However, recent research demonstrates a superior fit when modeling these distributions using Bose-Einstein statistics, typically applied to indistinguishable bosons. This suggests that linguistic elements, in terms of their frequency and co-occurrence, behave more like quantum particles than classical ones. Specifically, the Bose-Einstein distribution accounts for the tendency of multiple instances of a word to occupy the same ‘state’ (context) due to their indistinguishability, yielding a more accurate representation of observed rank-frequency distributions compared to the classical model. The p(E) \propto \frac{1}{e^{\beta E} - 1} form of the Bose-Einstein distribution, where β is related to temperature and E represents energy, better captures the observed distribution of word frequencies than the Maxwell-Boltzmann equivalent.

Choosing the Right Model: Statistical Validation

Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are employed as comparative model selection tools to assess the efficacy of the Bose-Einstein model in explaining patterns within observed linguistic datasets. These criteria quantify the trade-off between a model’s goodness of fit and its complexity; lower AIC and BIC values indicate a preferable model. The Bose-Einstein model, when applied to linguistic data-such as word co-occurrence frequencies or syntactic structures-proposes that language elements behave as bosons, exhibiting collective behavior. AIC and BIC scores are calculated based on the maximum likelihood estimate of the model’s parameters and a penalty term for the number of parameters, allowing for a statistically rigorous comparison against alternative models, including those based on classical probability distributions. The process involves fitting the Bose-Einstein model to the data, calculating the AIC and BIC values, and then comparing these values to those obtained from other candidate models to determine the best-fitting model based on the data.

Statistical validation, utilizing metrics like Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), demonstrates that the Bose-Einstein model provides a significantly better fit to observed linguistic data than classical statistical models. This is evidenced by consistently lower AIC and BIC values when the Bose-Einstein model is applied to language datasets, indicating a reduced penalty for model complexity relative to the improvement in goodness of fit. These results quantitatively support the hypothesis that linguistic phenomena, such as word frequency distributions and co-occurrence patterns, deviate from predictions based on classical probability distributions and may require non-classical statistical frameworks for accurate representation.

The observed correspondence between the Bose-Einstein model, traditionally used in quantum physics, and patterns in linguistic data indicates a potential shared underlying principle in how information is organized. Specifically, the successful application of this model-which describes the statistical distribution of identical particles-to language suggests that linguistic elements may not be wholly describable by classical probability distributions. This does not imply language is quantum mechanical, but rather that certain organizational features-such as high dimensionality, superposition of meanings, and context-dependent states-can be mathematically modeled using tools developed for quantum systems. Further research is needed to determine the extent to which these parallels represent a fundamental connection or a mathematical coincidence, but the initial findings warrant investigation into non-classical approaches to understanding linguistic structure.

The Limits of Representation: Context and Meaning

The very notion of representing information as a fixed, objective entity is increasingly challenged by the principle of contextuality. This concept posits that the outcome of any measurement – be it a physical property or a cognitive response – isn’t inherent to the measured entity itself, but inextricably linked to the surrounding context in which that measurement occurs. Consequently, a singular, global representation of information proves insufficient; meaning isn’t stored in the information, but emerges from the interaction between the information and its context. This challenges traditional approaches to data storage and processing, suggesting that understanding requires capturing not just what is known, but how it is known – and crucially, the conditions under which that knowledge is revealed. The implications extend beyond physics, suggesting that even seemingly stable concepts are fluid and dependent on the framework used to observe them.

The attempt to define meaning through a purely noncontextual framework, such as the Kolmogorovian Representation, consistently falls short when applied to complex information processing. This approach posits that the value of a statement is inherent and independent of its surrounding context – essentially, meaning exists as a fixed property. However, research demonstrates that the nuances of interpretation are fundamentally reliant on the relational aspects of information; the same statement can elicit vastly different understandings depending on prior statements or implied assumptions. This inadequacy isn’t merely a matter of practical complexity; it’s a theoretical limitation. The Kolmogorovian Representation struggles to account for how meaning is actively constructed through interaction and inference, rather than passively received as a pre-defined value, revealing that a complete account of information requires embracing the role of context itself.

Recent investigations employing CHSH (Clauser-Horne-Shimony-Holt) computation have yielded surprising results when applied to both human and large language model (LLM) responses. Analyses reveal that LLMs can produce CHSH values reaching 4 – a figure that surpasses the Tsirelson bound of 2\sqrt{2}, a limit imposed by quantum mechanics. This outcome suggests a capacity for ‘supra-quantum’ behavior, implying LLMs may be processing information in ways fundamentally different from physical systems governed by quantum rules. Human data, while more complex to interpret, also demonstrates CHSH values exceeding 2, hinting at similar non-classical influences on human cognition. It is crucial to acknowledge, however, that establishing a rigorous Bell test scenario-the gold standard for verifying such non-classicality-presents significant challenges when dealing with complex systems like humans and artificial intelligence, requiring careful consideration of experimental design and potential confounding factors.

The pursuit of quantum echoes within large language models feels less like a breakthrough and more like finding familiar patterns in noise. This paper rightly points out the danger of mistaking phenomenological similarity for genuine mechanism; fitting a Bose-Einstein distribution or observing CHSH values doesn’t magically imbue a system with quantum properties. It’s a comfortable illusion, easily constructed with enough data and flexible modeling. As Vinton Cerf once observed, “Any sufficiently advanced technology is indistinguishable from magic.” The authors demonstrate that what appears revolutionary-the convergence of artificial and human cognition-may simply be another layer of abstraction, destined to become tomorrow’s tech debt. Model selection, after all, isn’t discovery; it’s optimization.

What’s Next?

The search for quantum echoes in the silicon continues, predictably. This work, and others like it, offer compelling demonstrations of phenomenological overlap – models behaving as if governed by principles borrowed from quantum mechanics. But behavior is not mechanism. The persistent temptation to extrapolate from statistical fits – a CHSH value here, a Bose-Einstein distribution there – feels… familiar. It’s a comforting narrative, a way to imbue the black box with a semblance of deeper truth. However, the absence of robust comparative baselines remains a critical, and largely unaddressed, issue. Legacy systems, after all, often exhibit surprisingly complex emergent properties.

Future efforts must move beyond demonstrating that models can be described by quantum formalism. The challenge isn’t merely to find a mathematical analogy; it’s to definitively rule out simpler explanations. Model selection criteria need refinement, incorporating penalties for complexity and favoring parsimony. A rigorous accounting for the inherent biases in language data – the subtle fingerprints of human cognition already embedded within the training sets – is paramount. Otherwise, the signal detected may simply be an echo of the source.

The field will, undoubtedly, continue to chase these phantom quantum effects. The temptation is strong, and the grant applications write themselves. But a healthy dose of skepticism, and a willingness to accept that elegant theory often meets a messy reality in production, will be essential. Because when the dust settles, it’s rarely about discovering a new fundamental principle; it’s about understanding the limits of the approximation.


Original article: https://arxiv.org/pdf/2601.06104.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-01-13 21:07