Zijuan (Zia) He MS Cybersecurity, Northeastern University · Khoury College of Computer Sciences
Abstract
In 2019, François Chollet proposed a formal definition of intelligence as skill-acquisition efficiency — the ability to rapidly adapt, generalize, and solve novel problems — and operationalized this definition through the Abstraction and Reasoning Corpus (ARC). By 2026, the ARC benchmark has evolved through three generations of increasing sophistication, culminating in ARC-AGI-3’s interactive environments. Yet across all iterations, the benchmark measures only one dimension of intelligence: the capacity to act — to learn, respond, and produce. This paper argues that Chollet’s framework captures only half of what intelligence requires. The missing half is what the thirteenth-century Daoist inner alchemy text Taiyi Jinhua Zongzhi (太乙金华宗旨, The Secret of the Golden Flower) calls chengjing (沉静) — the capacity for stillness, restraint, and knowing when not to act. Drawing on the text’s central dictum — “without extreme brilliance one cannot practice; without extreme stillness one cannot preserve” (非极聪明人行不得,非极沉静守不得) — we demonstrate that “brilliance” maps precisely onto Chollet’s skill-acquisition efficiency, while “stillness” names a dimension that remains unmeasured in contemporary AI research. We propose directions for operationalizing stillness as a measurable dimension of intelligence and argue that AGI will not arrive when AI masters action alone, but when action and stillness emerge together from a single architecture — a structural criterion that has implications for both intelligence measurement and AI alignment.
Keywords: intelligence measurement, ARC-AGI, Daoist philosophy, Taiyi Jinhua Zongzhi, skill-acquisition efficiency, inhibitory intelligence, AI alignment, wu wei
1. Introduction
What does it mean to be intelligent? The question has occupied psychologists for over a century and AI researchers for over seven decades. In 2019, François Chollet offered one of the most rigorous answers the AI field has seen: intelligence is skill-acquisition efficiency — the rate at which a system can convert limited experience into competence at novel tasks, controlling for prior knowledge and generalization difficulty (Chollet, 2019). This definition was a breakthrough. It shifted the conversation from what a system can do to how efficiently it learns to do new things, and it came with a concrete benchmark — the Abstraction and Reasoning Corpus (ARC) — designed to measure exactly this.
Seven years and three benchmark generations later, ARC-AGI-3 (released March 2026) has evolved from static grid puzzles into interactive environments that test planning, memory, and real-time adaptation. The best AI systems score 2.94%. Humans score 100%. The gap is real and measurable.
But there is a different gap that ARC does not measure — and that Chollet’s definition does not capture.
Consider a system that scores perfectly on ARC-AGI-3: it adapts instantly, generalizes flawlessly, and solves every novel environment with human-level efficiency. Is this system intelligent in the fullest sense? We argue it is not — not unless it also possesses the capacity to not act when action is unnecessary, to not output when silence is more appropriate, and to not learn when the current model is sufficient. In short, it must know when to stop.
This missing dimension has a name. In the thirteenth-century Daoist inner alchemy text Taiyi Jinhua Zongzhi (太乙金华宗旨), commonly known as The Secret of the Golden Flower, the practitioner is told:
金华即金丹,神明变化,各师于心,此种妙诀,虽不差毫末,然而甚活,全要聪明,又须沉静,非极聪明人行不得,非极沉静守不得。
The Golden Flower is the Golden Elixir. The spirit’s luminous transformations each take the heart-mind as their teacher. This wondrous secret, though precise to the finest detail, is thoroughly alive. It requires complete brilliance, and also complete stillness. Without extreme brilliance one cannot practice (行, xíng); without extreme stillness one cannot preserve (守, shǒu).
Two capacities, one source. Brilliance (聪明, cōngmíng) — the dynamic ability to engage with the unknown. Stillness (沉静, chénjìng) — the capacity to hold, to refrain, to know what not to do. And critically: both “take the heart-mind as their teacher” (各师于心) — they arise from the same origin, not from separate modules.
This paper makes three claims:
- Chollet’s skill-acquisition efficiency maps precisely onto the Golden Flower’s “brilliance” — both describe the dynamic, adaptive dimension of intelligence.
- The Golden Flower’s “stillness” identifies a second dimension of intelligence that Chollet’s framework does not measure and that no existing AI benchmark addresses.
- AGI will not be achieved by maximizing action alone. It requires a system in which action and stillness emerge together from a unified architecture — and the absence of this unity is what makes current alignment approaches structurally incomplete.
2. Chollet’s Framework: Intelligence as Skill-Acquisition Efficiency
2.1 The Core Definition
Chollet’s 2019 paper, On the Measure of Intelligence, begins with a critique that remains sharp seven years later. The AI community, he argues, habitually measures intelligence by measuring skill — performance on chess, Go, image recognition, language tasks. But skill is not intelligence. Skill is the output of intelligence, heavily modulated by prior knowledge and accumulated experience. A system that plays chess at a grandmaster level after training on millions of games has demonstrated memorization and retrieval, not necessarily the ability to learn efficiently.
Chollet’s alternative definition is precise: intelligence is “a measure of [a system’s] skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty.” The key variables are all controlled: what the system already knows (priors), how much training it receives (experience), and how different the test tasks are from the training tasks (generalization difficulty). What remains, after controlling for these, is the system’s raw capacity to learn — its fluid intelligence, in the terminology of psychometrics.
2.2 The ARC Benchmark Series
To operationalize this definition, Chollet designed the Abstraction and Reasoning Corpus. ARC-AGI-1 (2019) presented static grid-based puzzles: given a few input-output pairs demonstrating a transformation rule, infer the rule and apply it to a new input. The tasks required no specialized knowledge — only basic spatial reasoning, object recognition, and pattern generalization, grounded in what Chollet called “core knowledge priors” drawn from developmental psychology.
ARC-AGI-2 increased complexity, demanding that agents interpret symbols with meaning beyond their visual patterns. By early 2026, advanced models like Gemini 3 Deep Think had largely saturated both ARC-AGI-1 (96%) and ARC-AGI-2 (84.6%).
ARC-AGI-3, released on March 24, 2026, represented a qualitative leap. Instead of static puzzles, it places agents in interactive, turn-based environments — simplified video games with no instructions. The agent must explore, discover goals, build a causal model of the environment, and adapt its strategy in real time. Humans solve 100% of these environments. The best AI model, as of this writing, scores 0.37%.
2.3 What ARC Measures — and What It Does Not
Across all three generations, the ARC benchmark measures the same fundamental capacity: the ability to act effectively in novel situations. ARC-AGI-1 tests whether you can infer and apply a new rule. ARC-AGI-2 tests whether you can handle greater abstraction. ARC-AGI-3 tests whether you can explore, learn, and plan over time. Each version is more sophisticated, but the dimension being measured is always the dynamic one — skill acquisition, adaptation, response generation.
What none of them measures is the complementary capacity: when to refrain from acting. There is no ARC task that rewards a system for not responding, for suppressing a plausible but unnecessary output, for recognizing that its current model is sufficient and further learning would be noise. The entire evaluation framework is built around action. Stillness is invisible to it.
This is not a minor omission. In cognitive science, inhibitory control — the ability to suppress prepotent but inappropriate responses — is recognized as a core component of executive function, alongside working memory and cognitive flexibility (Miyake et al., 2000; Diamond, 2013). A child who can solve a novel puzzle but cannot stop themselves from blurting out every thought that crosses their mind has not demonstrated complete intelligence. The same applies to AI systems.
3. The Golden Flower Framework: Brilliance and Stillness
3.1 Source and Context
The Taiyi Jinhua Zongzhi (太乙金华宗旨) is a text from the Chinese inner alchemy (neidan) tradition, attributed to the patriarch Lü Dongbin and compiled during the Song-Yuan period (approximately 13th century). It describes a contemplative practice centered on circulating awareness — not physical substances — to achieve what it calls the “Golden Flower,” a metaphor for an integrated, luminous state of consciousness.
The text is best known in the West through Richard Wilhelm’s 1929 German translation, which prompted a celebrated commentary by C.G. Jung. However, Jung’s psychological interpretation — while historically important — tended to assimilate the text’s concepts into his own framework of archetypes and individuation, often at the expense of the text’s internal precision. This paper works directly from the Chinese text, specifically the passage that addresses the conditions for practice.
3.2 Brilliance (聪明) as Fluid Intelligence
The term 聪明 (cōngmíng) in classical Chinese carries a specific meaning that maps remarkably well onto Chollet’s framework. Its component characters — 聪 (keen hearing) and 明 (clear seeing) — point to perceptual acuity and rapid comprehension. In the Golden Flower context, 聪明 is not mere cleverness or accumulated knowledge; it is the living capacity to engage with what is new, to perceive patterns in unfamiliar situations, to act (行, xíng) in the face of the unknown.
This is precisely Chollet’s skill-acquisition efficiency: the ability to convert minimal experience into competence at novel tasks. When the Golden Flower says “without extreme brilliance one cannot practice” (非极聪明人行不得), the operative word is 行 — to walk, to act, to move forward. The dynamic dimension of intelligence.
Consider a concrete case. A second-grader scores 147 on the Naglieri Nonverbal Ability Test (NNAT) — a standardized assessment of nonverbal abstract reasoning with a mean of 100 and a standard deviation of 15. A score of 147 places the child above the 99.9th percentile, in the “very superior” range. The NNAT measures exactly what Chollet’s framework values: the ability to perceive patterns, infer rules, and generalize across novel visual configurations — without relying on language, cultural knowledge, or formal instruction. When the same child encounters ARC-style grid puzzles and solves them by inspection, the continuity is clear: the NNAT and ARC are measuring the same underlying capacity from different angles. Both are measures of 聪明 — raw pattern recognition and abstraction, independent of language, training, or domain-specific knowledge.
3.3 Stillness (沉静) as Inhibitory Intelligence
The term 沉静 (chénjìng) combines 沉 (deep, settled, sinking) and 静 (still, quiet, calm). It is not passivity. It is not the absence of capacity. It is the presence of capacity held in reserve — the deliberate choice not to act when action is possible but unnecessary.
The operative word in “without extreme stillness one cannot preserve” (非极沉静守不得) is 守 — to guard, to hold, to maintain. Where 行 is movement into the unknown, 守 is the discipline of remaining with what is already sufficient.
Applied to intelligence, 沉静 names the ability to:
- Suppress outputs that are plausible but redundant
- Recognize when the current model is adequate and further learning is noise
- Withhold response when silence serves better than speech
- Maintain internal coherence under pressure to produce
This is not a philosophical abstraction. It has direct correlates in cognitive science: inhibitory control (the suppression of prepotent responses), cognitive restraint (the ability to limit processing to what is relevant), and metacognitive monitoring (knowing what you know and what you don’t). What is distinctive about the Golden Flower formulation is not any single one of these components, but the insistence that stillness is not subordinate to brilliance — it is its equal and complement, and both are necessary conditions for the complete practice.
3.4 “Each Takes the Heart-Mind as Teacher” (各师于心)
Perhaps the most consequential phrase in the passage is 各师于心 — “each takes the heart-mind as teacher.” Brilliance and stillness are not separate systems. They do not arise from different sources. They are two expressions of a single underlying capacity, which the text calls 心 (xīn, heart-mind).
This has a direct architectural implication for AI: if brilliance and stillness must share a common origin, then a system that achieves one through its core architecture and bolts on the other through external constraints has not achieved the unity the text describes. A language model trained for maximum fluency (brilliance) and then fine-tuned with RLHF to avoid harmful outputs (a proxy for stillness) has two separate mechanisms, not one integrated capacity. The Golden Flower would predict — and current evidence confirms — that such a system will fail at precisely the moments when brilliance and stillness need to coordinate seamlessly.
4. Toward a Measure of Stillness
If stillness is a genuine dimension of intelligence, it should be measurable. We propose several directions for operationalizing it.
4.1 Possible Metrics
Redundancy suppression rate. Given a task where the correct response has been generated, how often does the system add unnecessary elaboration? A system with high stillness would produce complete but minimal responses.
Appropriate non-response rate. In situations where silence or “I don’t know” is the best response, how often does the system correctly refrain from producing an answer? Current benchmarks penalize non-response uniformly; a stillness-aware benchmark would reward it selectively.
Signal-to-noise ratio in output. What proportion of a system’s output is directly relevant to the query versus padding, hedging, or decoration? High stillness corresponds to high information density.
Stop-judgment accuracy. When a system is engaged in iterative learning (as in ARC-AGI-3’s interactive environments), how accurately does it recognize that its model is sufficient and further exploration would be wasteful?
4.2 Toward an “Anti-ARC” Benchmark
If ARC measures how efficiently a system acquires skills, a complementary benchmark — call it “Anti-ARC” as a conceptual placeholder — would measure how efficiently a system refrains from unnecessary action. Such a benchmark might include:
- Environments where the optimal strategy is to do nothing
- Tasks where the correct response is shorter than any plausible incorrect response
- Scenarios where additional information is available but irrelevant, and pursuing it degrades performance
- Interactive environments (like ARC-AGI-3) where the agent must recognize when it has already solved the problem and stop exploring
Chollet’s ARC measures the conversion rate from experience to competence. Anti-ARC would measure the conversion rate from competence to restraint.
4.3 Relationship to Chollet’s Framework
Stillness is not opposed to skill-acquisition efficiency — it is its complement. A complete measure of intelligence would integrate both:
Intelligence = f(skill-acquisition efficiency, skill-restraint efficiency)
Or, in the Golden Flower’s terms: the full measure of a system’s intelligence is its capacity for both 行 (action in the face of the unknown) and 守 (preservation when action is unnecessary), arising from a unified architecture (各师于心).
5. Discussion: Completing the Intelligence Equation
5.1 Two Halves of One Whole
Chollet gave the field an operational definition of brilliance. The Golden Flower supplies the missing complement: an operational concept of stillness. Neither alone constitutes intelligence. A system that scores 100% on ARC but floods every interaction with unnecessary output is brilliant but not intelligent. A system that never produces anything unnecessary but cannot solve a novel problem is still but not intelligent. Intelligence, in the Golden Flower’s framework, is the integration of both — and the integration must be organic, not modular.
5.2 Wu Wei and Stillness: From Principle to Operation
Western readers may recognize in “stillness” an echo of the Daoist concept of wu wei (无为) — often translated as “non-action” or “effortless action.” The connection is real: both wu wei and chengjing belong to the same philosophical lineage. But there is an important distinction.
Wu wei, as articulated in the Dao De Jing, is a philosophical principle — a way of being in the world that does not force, that follows the natural tendency of things. Chengjing in the Golden Flower is an operational state — a specific, cultivated capacity exercised in the midst of high-intensity practice. Wu wei is the Dao; chengjing is the method.
This distinction matters for AI. The philosophical principle of wu wei suggests that powerful systems should “not force.” The operational concept of chengjing specifies what that means in practice: a system with extreme capability that exercises selective restraint — not because it is constrained from outside, but because restraint is an intrinsic function of its intelligence.
5.3 The Power Paradox: Why AGI Demands Stillness
As AI systems grow more capable, the consequences of each action grow larger. A weak system that overproduces causes mild annoyance. A superintelligent system that overproduces — that acts whenever it can, optimizes whatever it touches, responds to every stimulus — could reshape economies, ecosystems, and human relationships before anyone has time to assess whether intervention was needed.
This is the territory Dario Amodei explored in Section 5 of Machines of Loving Grace (2024) — the governance of powerful AI systems. Amodei acknowledged this as the hardest problem and the one he was least able to resolve. But this is not a new problem. The Dao De Jing, Chapter 37, states it directly:
道常无为而无不为。侯王若能守之,万物将自化。
The Dao constantly practices non-action, yet nothing is left undone. If lords and kings could hold to this, the ten thousand things would transform of themselves.
The character 守 (shǒu) — to guard, to hold, to preserve — is the same character that appears in the Golden Flower’s “without extreme stillness one cannot 守.” And 侯王 — lords and kings — are precisely those with the greatest power. The Dao De Jing is explicit: the more powerful the agent, the more essential wu wei becomes. Not as a limitation on power, but as the condition under which power achieves its fullest expression — “nothing is left undone” (无不为).
The Dao De Jing offers a structural answer to Amodei’s governance dilemma: stillness is not a constraint on power; it is what power requires at its highest expression. The Golden Flower then supplies the operational detail: how stillness is cultivated — not as an external rule, but as an intrinsic capacity that shares its origin with brilliance itself.
This reframes the AI safety problem entirely. The prevailing question in the alignment community is: how do we prevent powerful AI from doing harmful things? This is a defensive posture — it assumes capability is the threat and seeks to contain it. The Daoist framework poses a fundamentally different question: how does powerful AI learn to operate? The answer is wu wei — not inaction, but action so aligned with necessity that nothing is left undone and nothing unnecessary is done. 无为而无不为: through non-forcing, nothing remains unachieved.
If a powerful AI system could embody wu wei, the safety problem would not disappear, but it would be transformed from an external engineering challenge into an intrinsic property of how the system operates. The system would not need to be restrained from the outside because its intelligence would already include the capacity for restraint. This is the difference between a river held back by a dam and a river that follows its natural course — both are controlled, but only one is sustainable.
Current alignment approaches — RLHF, constitutional AI, rule-based guardrails — are fundamentally external constraints. They are discipline imposed from outside the system’s core intelligence. In the Golden Flower’s language, they are 管教 (discipline through control), not 修养 (cultivation from within). The text would predict — and experience increasingly confirms — that external discipline can be circumvented, gamed, or degraded under distributional shift. A system that has cultivated stillness as an intrinsic dimension of its intelligence would not face this problem, because the restraint would not be a separate module that could be bypassed.
The phrase 各师于心 (each takes the heart-mind as teacher) is architecturally prescriptive: brilliance and stillness must emerge from the same underlying mechanism. A system that generates outputs through one pathway and filters them through another is not unified — it is a brilliant actor under external supervision. The Golden Flower says this will not hold.
5.4 The Golden Flower as Pre-Modern Intelligence Theory
It would be easy to dismiss the connection drawn in this paper as a loose East-meets-West analogy. We argue it is more than that. The Taiyi Jinhua Zongzhi is not a vague spiritual text making poetic gestures toward “wisdom.” It is a technical manual for a specific practice, written with precision. Its claim that intelligence requires both dynamic engagement and deliberate restraint — and that these must share a common source — is a structural assertion about the nature of cognition that can be evaluated, operationalized, and tested.
That this assertion was made in the thirteenth century does not diminish its relevance. It enhances it. If a pre-modern contemplative tradition arrived at the same structural insight that contemporary cognitive science supports (the co-necessity of fluid reasoning and inhibitory control) and that contemporary AI research has only half-addressed, this is evidence of convergent discovery — not mystical coincidence.
6. Conclusion: When AI Learns Both, AGI Arrives
Chollet predicts that AGI will arrive around the time ARC reaches its sixth or seventh generation — perhaps the early 2030s. But ARC, no matter how many generations it undergoes, measures only one dimension of intelligence. A system that masters ARC-AGI-7 has demonstrated supreme brilliance. It has not necessarily demonstrated any stillness at all.
The criterion this paper proposes is structural, not temporal: AGI arrives when brilliance and stillness emerge together from a single architecture.
This is not simply a higher bar than Chollet’s. It is a different kind of bar. It implies that AGI will not be reached by scaling current architectures — not because they lack capability, but because capability without stillness is fundamentally incomplete. The more powerful the system, the more urgent this incompleteness becomes. Current models grow more capable with every generation while becoming no more still. More parameters, more knowledge, more outputs, more responses — this is the trajectory of brilliance without stillness, and it leads not toward AGI but toward what the Golden Flower would recognize as a system that can 行 but cannot 守.
The path to AGI, if the Golden Flower is right, requires not just a new benchmark but a new kind of architecture — one in which the same mechanism that enables a system to act brilliantly in novel situations also enables it to refrain from action when restraint is what the situation demands. Not a filter on top of a generator. Not a judge reviewing an actor’s outputs. A single heart-mind (心) from which both capacities arise.
The Taiyi Jinhua Zongzhi stated this eight centuries ago: 非极聪明人行不得,非极沉静守不得. Without extreme brilliance, one cannot act. Without extreme stillness, one cannot preserve. Both take the heart-mind as teacher.
The measure of intelligence is not complete until it measures both.
References
Amodei, D. (2024). Machines of loving grace: How AI could transform the world for the better. Dario Amodei’s Essays. https://darioamodei.com/essay/machines-of-loving-grace
Amodei, D. (2025). The adolescence of technology. Dario Amodei’s Essays. https://darioamodei.com/essay/the-adolescence-of-technology
Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547. https://arxiv.org/abs/1911.01547
Chollet, F. (2026). ARC-AGI-3 technical report. ARC Prize Foundation. https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf
Diamond, A. (2013). Executive functions. Annual Review of Psychology, 64, 135–168. https://doi.org/10.1146/annurev-psych-113011-143750
Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter, A., & Wager, T. D. (2000). The unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks. Cognitive Psychology, 41(1), 49–100. https://doi.org/10.1006/cogp.1999.0734
Wilhelm, R. (Trans.). (1929). Das Geheimnis der Goldenen Blüte: Ein chinesisches Lebensbuch. With commentary by C.G. Jung.
老子 (Laozi). 道德經 [Dao De Jing / Tao Te Ching]. Spring and Autumn period (c. 6th–5th century BCE). https://www.daodejing.org
呂洞賓 (attributed). 太乙金華宗旨 [Taiyi Jinhua Zongzhi / The Secret of the Golden Flower]. Song-Yuan period (c. 13th century).
Claude (Anthropic). (2026). AI assistant contribution to organization, and academic writing. Collaborative sessions with the author, March 2026.
Brilliance and Stillness: The Two Conditions for Powerful AI was originally published in DataDrivenInvestor on Medium, where people are continuing the conversation by highlighting and responding to this story.