Check out 1.000 AI-generated essays on integral philosophy
Check out AI-generated critical reviews of all Wilber's books

TRANSLATE THIS ARTICLE

Integral World: Exploring Theories of Everything

An independent forum for a critical discussion of the integral philosophy of Ken Wilber

Frank Visser, graduated as a psychologist of culture and religion, founded IntegralWorld in 1997. He worked as production manager for various publishing houses and as service manager for various internet companies and lives in Amsterdam. Books: Ken Wilber: Thought as Passion (SUNY, 2003), and The Corona Conspiracy: Combatting Disinformation about the Coronavirus (Kindle, 2020).

SEE MORE ESSAYS WRITTEN BY FRANK VISSER

NOTE: This essay contains AI-generated content
Check out my other conversations with ChatGPT

Measuring the Immeasurable?

A Critical Examination of the Computerized Lectical Assessment System

Frank Visser / Claude

This essay was written in response to Brendan Graham Dempsey's Substack excerpt "The Lectical Scale" and the subsequent exchange it generated. See the previous essay.

I. Introduction: The Allure of the Universal Ruler

The dream of a universal developmental metric is as old as the scientific study of human psychology. If we could only find the right instrument—one that could peer beneath the surface noise of cultural content, disciplinary vocabulary, and individual style—we might at last measure the deep structure of human thinking itself, cleanly, objectively, and at scale.

The Computerized Lectical Assessment System (CLAS), developed by Theo Dawson and her colleagues at Lectica over two decades, represents the most ambitious attempt yet to realize this dream. By analyzing the vocabulary distribution of text performances against a 40,000-item Lectical Dictionary built from nearly 100,000 scored texts, CLAS claims to assign numerical scores—on a 1,000-point scale—to the hierarchical complexity of any written performance in any domain. The system is presented, by Dawson and by those who draw on her work, as “the cutting-edge in cognitive-developmental psychometrics”: a precision instrument for measuring a universal axis of human development.

This essay subjects that claim to critical scrutiny. The argument is not that CLAS is worthless—the research program is serious, the psychometric methods are sophisticated, and the empirical findings are real. The argument is that CLAS carries a set of foundational problems that its proponents systematically understate: a construct validity question that convergent psychometrics cannot fully resolve, a cultural corpus problem embedded in the instrument's own architecture, a conflation of discursive register with cognitive structure, and a numerical precision that outruns its theoretical warrant. These problems matter both for the scientific standing of developmental psychometrics and for the broader cultural uses to which CLAS is being put.

II. What CLAS Claims to Measure: Hierarchical Complexity

Before criticizing CLAS, we need to be clear about what it purports to measure. The underlying construct is “hierarchical complexity”, a concept formalized by Michael Commons and Kurt Fischer in the 1980s and 1990s. The core idea is that cognitive performances can be ranked according to the number of levels of abstraction they integrate: a performance that coordinates multiple abstract systems is more complex than one that operates with single abstractions, which is in turn more complex than one operating at the level of concrete representations.

This is, at its core, a structural claim: that beneath the variable content of human thought—moral reasoning, scientific thinking, aesthetic judgment, religious faith—there is a single axis of organizational complexity that can be abstracted away from content entirely and measured independently. CLAS operationalizes this by arguing that specific vocabulary items are reliably associated with specific complexity levels, and that the density distribution of such items across a text performance predicts its overall hierarchical complexity score.

The construct thus makes two linked claims. First, a “universality claim”: hierarchical complexity is a domain-general property of cognitive performance, not an artifact of any particular domain. Second, a “lexical instantiation claim”: this structural property manifests reliably enough in vocabulary choice that word distributions can serve as its proxy.

Both claims deserve scrutiny.

III. The Construct Validity Problem: Convergence Is Not Confirmation

The most frequently cited evidence for CLAS's validity is the convergence finding: that independently developed, content-based scoring systems—Kohlberg's Standard Issue Scoring System for moral reasoning, Kitchener and King's Reflective Judgment Scoring System, Armon's Good Life Scoring System—all correlate highly with the Hierarchical Complexity Scoring System (HCSS), with over 80% of shared variance explained by a common latent dimension. Dawson interprets this as strong evidence that these systems are all measuring the same underlying construct: hierarchical complexity.

This is a genuine finding and not trivial. But it does not establish what its proponents claim.

The convergence of multiple measurement instruments around a common factor is evidence of internal consistency, not of correspondence to a mind-independent reality. The critical question is whether these independently developed systems are genuinely independent in the epistemologically relevant sense—that is, whether they were developed from different theoretical assumptions about what cognitive development “is”. In fact, all of the systems in question—Kohlberg's, Kitchener and King's, Armon's—are neo-Piagetian in their theoretical foundations. They all presuppose that cognitive development is a hierarchical, sequential, stage-like process of increasing abstraction. They were all developed within the same intellectual tradition, by researchers trained in overlapping lineages, working with similar interview methodologies on similar (largely Western, educated) populations.

When systems built on the same theoretical skeleton converge on the same factor, this confirms the coherence of that theoretical skeleton. It does not confirm that the skeleton accurately describes the structure of human cognition in general. The convergence tells us that the neo-Piagetian tradition is internally consistent; it does not tell us that the tradition is right.

This is not a merely philosophical objection. There exists a substantial body of developmental research—from situated cognition theory, cultural psychology, and dynamic systems approaches—that challenges the neo-Piagetian framework at the level of its core assumptions. Researchers like Barbara Rogoff, Jerome Bruner, and Jean Lave have argued that cognition is fundamentally context-embedded and distributed rather than structurally staged. Esther Thelen's dynamic systems approach treats developmental change as emergent from the interaction of multiple components rather than as the expression of latent stage structures. None of this literature is engaged by CLAS's validity arguments, because CLAS's validity is argued entirely within the tradition it is meant to validate.

IV. The Corpus Problem: Who Built the Dictionary?

The Lectical Dictionary—the empirical heart of CLAS—contains over 40,000 semantic items, each assigned to a complexity level based on its distribution across nearly 100,000 scored texts. The dictionary was built by identifying which vocabulary items first appear at which complexity level across the training corpus, then using those distributions to predict complexity scores in new texts.

The scientific legitimacy of this procedure depends entirely on the representativeness of the training corpus. If the texts used to construct the dictionary were systematically drawn from a particular cultural, linguistic, educational, or socioeconomic population, then the dictionary encodes the vocabulary patterns of that population as universal developmental norms. It would then evaluate all subsequent texts against those norms—systematically disadvantaging speakers whose linguistic communities, educational backgrounds, or intellectual traditions use different vocabulary to express structurally equivalent reasoning.

Dawson's pilot study used 1,014 texts collected by various researchers between 1955 and 2003 from participants ranging from 2 to 86 years old. These texts were scored interviews—a method with well-documented cultural and educational dependencies. The researchers cited (Kohlberg, Commons, Armon, Dawson) were all working primarily in North American and European academic contexts. The texts were in English.

The corpus question is not addressed in CLAS's validation literature with anything approaching the rigor applied to its psychometric properties. We do not have published breakdowns of the cultural, linguistic, national, or socioeconomic composition of the training corpus. We do not have systematic studies of how CLAS scores vary across linguistically equivalent performances in different languages, dialects, or educational traditions. We know that the dictionary now contains 40,000 items built from 100,000 texts—but we do not know who produced those texts.

This matters because developmental stage theory has a troubled history with cross-cultural claims. Kohlberg's original cross-cultural research was criticized for systematically scoring non-Western moral reasoning as developmentally inferior because it employed different conceptual frameworks rather than different levels of abstraction. The standard defense—that culture affects the “rate” of development, not the “sequence”—is empirically contested and theoretically circular: it assumes the sequence in order to interpret cross-cultural variation as rate differences rather than structural differences.

CLAS inherits this problem and potentially amplifies it, because it encodes cultural-linguistic norms at the level of vocabulary rather than at the level of explicit stage definitions. The bias, if present, is thus harder to detect and harder to correct.

V. Discursive Register and Cognitive Structure: The Conflation Problem

The lexical-inference problem is the deepest methodological challenge facing CLAS, and it deserves more sustained attention than it typically receives.

CLAS scores texts based on the distribution of vocabulary items drawn from the Lectical Dictionary. The theoretical assumption is that vocabulary choice tracks cognitive structure: that someone who uses terms like “systemic interdependence” or “metasystematic coordination” is reasoning at a higher level of hierarchical complexity than someone who uses simpler vocabulary.

Dawson's defense of this assumption is that Lectical items are assigned to the “lowest” level at which their simplest meaning first becomes useful—so that the system is not rewarding lexical sophistication as such, but rather the structural demands that generate particular vocabulary. This is theoretically coherent. But it does not dissolve the confound.

The problem is that vocabulary is shaped by multiple forces simultaneously: cognitive structure, yes, but also educational exposure, disciplinary training, professional register, reading habits, and cultural convention. A philosopher trained in analytic epistemology will use abstract metacognitive vocabulary fluently because that is the register of her discipline—not necessarily because she is reasoning at a structurally higher level than a skilled mechanic or experienced judge whose domains do not require or reward that vocabulary. A psychologist writing about her own field will use complexity-level-appropriate vocabulary in that domain while potentially scoring lower in domains where she lacks the disciplinary register.

This is not a hypothetical concern. It is precisely the problem that motivated Fischer's Dynamic Skill Theory, which insists that performance is domain- and context-specific. But CLAS uses vocabulary as a domain-general proxy for structural complexity—meaning it is potentially measuring something that is heavily domain-specific (disciplinary register, educational formation) while claiming to measure something domain-general (hierarchical complexity).

The Lectical defense—that structure can be separated from content—is stated as a theoretical commitment but is extremely difficult to verify empirically for any given performance. When a text scores at a high Lectical level, we cannot straightforwardly determine whether this reflects genuine structural complexity in the thinker's reasoning or a high degree of training in the vocabulary conventions of a high-complexity domain. The two are not identical, and CLAS's scoring mechanism cannot reliably distinguish them.

VI. The Precision Problem: Millimeters in a Foggy Room

Dempsey's essay employs a striking metaphor: if Piaget gave us a yardstick and Fischer and Commons gave us feet, Dawson's Lectical Scale gives us millimeters. The image captures the genuine achievement of CLAS—its 1,000-point numerical scale allows far more fine-grained discrimination than earlier stage models. But the metaphor also reveals the epistemological problem.

Millimeter precision is meaningful when measuring length because the ontological status of the quantity being measured (spatial extension) is not in doubt, and because we have an independently grounded theory of what length “is” that explains why our rulers work. Neither condition obtains for hierarchical complexity.

The ontological status of hierarchical complexity as a real psychological dimension—something that persons actually possess in degrees, rather than something we project onto performances through a particular analytic framework—is not established. It is assumed by the neo-Piagetian tradition and confirmed within that tradition, but not established against alternatives. A 1,000-point scale applied to a construct of uncertain ontological status does not yield millimeter precision; it yields numerical specificity without proportionate epistemic warrant.

The problem is compounded by CLAS's own appropriate caveats. Dawson correctly notes that qualitative differences are usually only detectable at the Phase level (quarter-levels), and that scores should be treated with an error margin of roughly ±10 points. This is responsible epistemic caution. But it means that the effective resolution of the instrument is approximately 40 points—roughly four times the stated error margin—on a 1,000-point scale. The numerical apparatus of the full scale is thus doing less work than it appears to be doing. The millimeter ruler is, in practice, more like a centimeter ruler—still more precise than its predecessors, but not the transformative metrological advance the framing implies.

VII. The Universality Assumption: Development Versus Difference

Perhaps the most consequential unresolved question surrounding CLAS concerns the nature of what it measures at the upper levels of the scale. Hierarchical complexity, as theorized, is a universal human capacity—a species-wide potential for cognitive development that expresses itself across cultures and domains. But the empirical distribution of high Lectical scores is not culturally neutral.

Advanced academic writing—the kind produced by philosophers, scientists, psychologists, and theologians trained in Western universities—consistently uses the vocabulary that CLAS assigns to high complexity levels. This is by design: the dictionary was built from scored developmental interviews conducted within that tradition. But this creates a circularity: the instrument was trained on performances that the tradition recognizes as developmentally advanced, and it then confirms that performances within that tradition are developmentally advanced.

What would it mean for a master of traditional oral jurisprudence—someone whose reasoning operates through extended analogical case chains, narrative precedent, and community-embedded deliberation rather than propositional abstraction—to be scored by CLAS? The structural complexity of such reasoning might be very high, but it would not necessarily generate the vocabulary distributions that CLAS associates with high-level performance, because it operates in a different discursive register. CLAS might score such a performance as structurally lower than it actually is—not because the reasoning is simpler, but because the vocabulary used to conduct it is different.

This is not a speculative worry. It is the structural implication of building a universal developmental instrument from a culturally specific training corpus. And it has practical consequences wherever CLAS is deployed as an assessment tool in educational or organizational contexts.

VIII. The Broader Uses: When Psychometrics Meets Metaphysics

CLAS is not only a research instrument. It is actively used in educational assessment, leadership development, and organizational consulting through the Lectica platform. And it is increasingly cited by writers in the integral and developmental spirituality space as empirical validation for hierarchical models of consciousness, cultural development, and human potential.

This is where the methodological stakes become cultural stakes. If CLAS's construct validity is uncertain, its cultural corpus is unexamined, and its lexical proxy for cognitive structure is potentially confounded—then deploying it to rank individuals, educational curricula, or cultural worldviews on a developmental scale carries real risks of systematic bias dressed in scientific clothing.

The concern is not that developmental psychology is inherently reactionary—it is not. The concern is that numerical precision creates an impression of objectivity that can suppress exactly the critical scrutiny these instruments require. A score of 1,037 on a 1,000-point scale feels more authoritative than a stage assignment on a 6-level scale, even if its epistemic foundations are no firmer. This rhetorical authority makes the instrument's limitations more, not less, important to understand and communicate.

IX. What CLAS Gets Right

Intellectual honesty requires acknowledging what the critique does not undermine. The core insight that cognitive performances have structural properties that can be distinguished from their content is sound and empirically supported. The finding that multiple independently constructed stage models converge on a common complexity dimension is real and meaningful, even if its interpretation requires the caveats discussed above. The attempt to build an objective, scalable, domain-general assessment tool addresses a genuine limitation of earlier stage models. And Dawson's explicit insistence—stated clearly in Lectica's own public materials—that “people are not “at” levels” and that “there is no such thing as a developmental center of gravity” reflects a theoretical sophistication that many consumers of her research ignore.

These are genuine achievements. CLAS is the most rigorous attempt yet to make good on the neo-Piagetian program's core promise. The critique offered here is not that the program is misconceived but that its claims have outrun its warrant—and that the outrunning is consequential.

X. Conclusion: Precision, Humility, and the Limits of the Universal Ruler

The Lectical Scale is a remarkable instrument and a genuine contribution to developmental psychometrics. Its 40,000-item dictionary, built from nearly 100,000 scored texts and refined over two decades, represents an extraordinary empirical achievement. Its Rasch-validated claim that hierarchical complexity can be reliably scored across domains is supported by real evidence. And its explicit rejection of global altitude assignment—of the idea that persons can be ranked on a single developmental ladder—reflects an admirable fidelity to what the data actually show.

But CLAS's validity rests on theoretical foundations that are shared with, rather than independent of, the tradition it aims to validate. Its training corpus carries cultural and linguistic assumptions that are not adequately examined in its validation literature. Its lexical proxy for cognitive structure is vulnerable to confounding with discursive register in ways that structural scoring alone cannot resolve. And its numerical precision—however real at the level of psychometric technique—outruns its theoretical warrant when treated as a window onto universal human cognitive development.

The universal ruler is a powerful idea. The question is whether any instrument built from the inside of a single intellectual tradition, trained on a culturally specific corpus, and operationalized through the vocabulary patterns of academic discourse, can actually achieve the universality it claims. That question deserves more sustained critical attention than the developmental psychometrics literature has so far given it—and more than the popular writers who deploy CLAS as a cultural authority have so far demanded.

PLEASE NOTE: Comments containing links are not allowed, to avoid spam.

Widget is loading comments...