How does \expandafter work: The meaning of expansion
Introducing TeX’s concept of “expansion”
Having reviewed TeX tokens and token lists, next on our list of background topics for \expandafter
is TeX’s concept of expansion: the core TeX process \expandafter
is designed to work with.
On the challenge of explaining expansion...
For most of us, TeX’s concept of expansion, and the reasons/mechanism(s) behind it, can be confusing—they are not easy to fully explain in a simple, concise manner. Many of TeX’s inner operations/processes are interrelated in quite complex ways, which can present challenges when trying to “dissect” certain aspects of TeX’s operations: attemping to discuss them in isolation from the program as a whole. Expansion is certainly a concept which falls into that category because it is a fundamental element of TeX’s behaviour, and deeply embedded throughout TeX’s inner processes. However, this article attempts to construct an explanation of expansion, offering a foundation on which readers can build.
To explain TeX’s expansion mechanism we’ll proceed step-by-step, surveying a range of related/relevant topics, and piece them together to develop an understanding of this crucial element of TeX’s behaviour. Later in the article, and for the more intrepid reader who is fond of detail, we take a look inside TeX itself, summarizing the key source code which drives the expansion process.
Key points about TeX tokens
Throughout the following discussion it’s important to remember that TeX can obtain its next item of input from two sources:
- physical text files: generating new tokens by reading/scanning characters and commands within those files, or
- token lists: getting input by reading from a list of existing, pre-prepared, tokens stored in TeX’s memory.
Also recall that TeX uses tokens as a convenient way to neatly “package” information about any item (character or command) it has read-in.
When TeX is generating new tokens it will, at the point of token production, have immediate access to information about the item whose token value it is calculating. For example, when generating new command tokens, TeX first calculates a value called curcs
that enables TeX to lookup information about the command whose token it is about to produce.
However, if TeX is reading from token lists (sequences of stored tokens) it would first need to “unpackage” any token in that list before it could determine what each token represents. For example, TeX could be reading tokens stored as part of a macro definition: those tokens would have been generated from text that has long since been read-in, converted to tokens and stored away. The token type (character or command) determines what TeX’s “unpackaging” actually means:
- for command tokens (token value is > 4095): calculate
curcs
via \(\text{curcs}=\text{token value}-4095\) and usecurcs
to look up information about the command; - for character tokens (token value < 4095): split the character token into its constituent pairing of (character code, category code).
In the beginning...
To begin our exploration of expansion we’ll look at how TeX engines classify, or categorize, the set of commands they can process. For the moment, assume that “expandable” is simply some, as yet unexplained, “property” of a TeX command.
The notion of “expandable” commands
All TeX-based typesetting programs (TeX “engines”) understand two main “classes” of command:
- commands built into the executable engine itself: the so-called primitives;
- commands defined by the user: so-called macros.
TeX engines also use “being expandable”, a type of command “behaviour”, as a mechanism for classifying all commands into two groups:
- expandable commands;
- non-expandable commands.
So we can consider that the set of all commands (primitives plus macros) can be classified as shown in the following graphic:
Although we don’t yet know what expansion/expandable actually means, it’s possible to see that:
- all macros are classified as expandable commands;
- some primitive (built-in) commands are also classified as expandable (the vast majority are not).
Note on active characters: In addition to primitives and macros there is a third class of “expandable” item: active characters—any characters assigned to have category code 13 are also considered “expandable” because they are processed as “miniature macros”.
How does TeX know if a command is “expandable”?
From our discussion of TeX tokens we know that tokens are nothing more than integers which TeX calculates and uses to “package” information about an item of input: a character or a command. We also noted that, given any token (integer) value, TeX can, if/when required, reverse the tokenization process to “unpackage” a token and reveal data about the command or character it represents.
Using data produced when TeX creates a new token, or data extracted from an existing token—e.g., one stored in a macro or other token list—TeX can check its internal data-storage tables to find detailed information about the command (or character) represented by that token. The following graphic summarizes the process of creating a command token (“packaging”) and then, when TeX needs some information, reversing the process (“unpackaging” the token) to obtain data about the command represented by a particular token value.
Command codes
Part of the information that TeX stores about every command (primitive or macro) is something called its command code: an integer value that TeX engines use (internally) to classify commands—commands that have similar behaviour/functionality share the same command code.
Note: For an individual, non-active, character, its command code is the category code it was assigned at the point it was read-in (scanned) by TeX. Note too that, except for characters, command codes are purely internal to TeX: they are not accessible via macros or primitive commands—to see those details you’ll need to read TeX’s source code!
Here are some examples of command codes:
\hbox
,\vbox
,\vtop
,\vcenter
,\box
(and others) are all classified as “box making” commands and share command code 20;\def
,\edef
,\gdef
,\xdef
are all “macro definition” commands and share command code 97;- all macros are classified with one of the command codes 111, 112, 113 or 114: the different command codes are determined by the use of
\long
or\outer
when defining the macro: - 111: command code for a non-long, non-outer macro (e.g.,
\def\foo{...}
) - 112: command code for a long, non-outer macro (e.g.,
\long\def\foo{...}
) - 113: command code for a non-long, outer macro (e.g.,
\outer\def\foo{...}
) - 114: command code for a long, outer macro (e.g.,
\long\outer\def\foo{...}
)
TeX also uses command codes to identify the subset of commands classified as “expandable”. To do that, TeX associates expandable commands with command code values that exceed the threshold value of 100 (the value used by Knuth’s original TeX, e-TeX and pdfTeX). Consequently:
- non-expandable commands (the vast majority) have a command code <= 100
- expandable commands have a command code > 100
Hence, if TeX looks up the details for a particular command and detects a command code > 100, TeX immediately knows it is expandable. All macros and a small number of primitives (built-in commands) are assigned a command code indicating they are expandable: but why does TeX bother with this “expandable” classification? As you may suspect, it simply means that those commands are somewhat “special”; i.e., they can be subjected to TeX’s process of expansion.
Note (characters): Unless a character has been defined as active (category code 13) all characters are non-expandable. Active characters can be thought of as single-letter “mini macros”.
Yes, yes but what is expansion???
We noted that token values greater than 4095 are used to represent commands; consequently, TeX can, if/when required, “unpackage” that token to access information about the TeX command represented by that token. TeX can lookup that command’s command-code value to determine if it falls into the special category of being expandable. But what does TeX do with that information and what does “being expandable”, i.e., expansion, really mean?
Expansion: A general term for a set of operations
At certain points during TeX’s processing it needs to know whether a token represents an expandable command:
- No: If that command token does not fall into the expandable category, TeX simply “passes it on” to the next stage of processing.
- Yes: If that command token does fall into the expandable category, you can think of TeX needing to take action which results in “filtering out” that particular token by performing a process that TeX refers to as “expanding that token”, or performing expansion.
In essence, expansion involves TeX removing that token from its input and replacing it with tokens arising from the “expansion process” specific to that command or group of commands sharing the same command code.
It takes practice and experience to know when TeX is, and is not, performing token expansion—page 215 of The TeXbook lists the situations in which an expandable tokens are not expanded. The reality is that TeX’s process of performing expansion is complex and nuanced.
Which commands are expandable?
If you look at a list of primitive (built-in) commands that TeX engines classify as expandable (e.g. pages 212–215 of The TeX book) it becomes apparent that, in practice, expansion is a rather general term which TeX uses to describe the behaviour of a somewhat eclectic group of commands. The results from expansion of a particular (expandable) command do vary and we can see this by listing a set of expandable commands—grouped by their functionality. The precise semantics of individual commands is not important here, of most interest are the general consequences of their actions—what their expansion actually does or achieves.
Based on the list above, is it possible to extract any “general behaviours” from this collection of commands—to further refine our understanding of expansion? To a fair approximation, the results of expanding a command fall into a few key activities:
- token generation
- token insertion
- token filtering
- change/control expansion of the next token
And finally: the meaning of expansion?
We can consider that expansion of a command token (including active characters) involves TeX removing that command token from the input and replacing that token with other tokens that arise from its expansion behaviour: generating/inserting tokens, filtering tokens or changing/controlling expansion of the next token. Once a token has been expanded, TeX would continue to read/process any tokens that might arise from the expansion process. Exactly when or where TeX will perform expansion is a very different question: due to the nuances and complexities of TeX’s processing it’s one that we can’t hope to fully address in this article but we can give an overview of TeX’s main input processing behaviour.
Expansion and the structure of TeX
Arguably, the best way to see how/where TeX distinguishes between expandable and non-expandable commands is to look at TeX’s source code.
Internally, TeX is an extremely complex piece of software; consequently, it is impractical to attempt a full description of the many nuanced behaviours inherent to TeX’s operation. As noted, exactly when or where TeX will actually perform expansion as it reads tokens is “somewhat complex”; consequently, the following discussion assumes TeX is in a state that requires expansion to take place. Situations where TeX would be reading/creating tokens, but not performing expansion, include creating/storing tokens for the \def
macro-definition command or storing tokens in a \toks
register.
The following diagram provides a simplified overview of the core parts of TeX’s structure which deal with input and processing of tokens: the objective is to highlight the way that expandable and non-expandable tokens are “filtered” during TeX’s input processing. The C programming language is used to describe the various key functions but, hopefully, the basic structure should be clear even if you are not familiar with the C language.
Explanation of the TeX structure diagram
In outline, TeX’s processing is driven by a function called maincontrol()
which, in the source code of TeX, Knuth refers to as the “chief executive” and describes it as follows:
“We come now to the
maincontrol
routine, which contains the master switch that causes all the various pieces of TeX to do their things, in the right order… this is the grand climax of the program… We are now at the hub of the web, the central nervous system that touches most of the other parts and ties them together.”
The following description is somewhat simplified but gives a “flavour” of the underlying structure of TeX-based software: how expandable commands are identified (“intercepted”) and filtered-out during TeX’s input processing.
In essence, during TeX’s earliest stage of input processing it uses the command code value of the current command (token) to detect exandable commands (macros, expandable primitives and active characters); if one is identified, such commands are “intercepted” and “pre-processed”—by being expanded. Through this filtering (expansion) process TeX removes expandable commands from the input and replaces them with tokens arising from their respective expansion behaviour. Once the expansion process is complete, TeX will read any tokens produced from that expansion. Only non-expandable items are passed through to the next stage of TeX’s processing: the “big switch” where TeX will execute all non-expandable primitive commands and typeset characters.
Understanding maincontrol()
The maincontrol()
function contains a large so-called switch statement—Knuth calls it the “big switch”—that TeX uses to action all non-expandable primitive commands: those with a command code <=100.
To action a particular command (or character) this “big switch” statement uses a combination of TeX’s current mode:
- internal/outer vertical mode, or
- restricted/paragraph horizontal mode, or
- inline/display math mode
plus the value of the command code for each primitive or charcter—the command code of a non-active character is its category code value. For the record, all 6 of TeX’s modes are allocated an integer value that can be used by TeX to save and restore its current mode whilst it switches between typesetting various constructs—mathematics, paragraphs, boxes and so forth. Here are the integers allocated to TeX’s various modes:
Mode | Meaning | Mode value |
Outer vertical | Between paragraphs (building pages) | 1 |
Internal vertical | Immediately inside a \vbox{...}
|
-1 |
Horizontal | When typesetting/building paragraphs | 102 |
Restricted horizontal | Immediately inside an \hbox{...}
|
-102 |
Display math | Typesetting display equations/formulae | 203 |
Inline (formula) math | Typesetting inline equations/formulae | -203 |
Note the use of negative values for the complement of the various modes, such as internal vs. outer vertical mode. When processing commands in maincontrol()
, TeX uses the absolute value of the current mode—i.e., ignores the negative sign. So, for example, if TeX is in either of its horizontal modes and needs to process a character with category code 11, it would used the sum \(102 + 11 = 113\) inside its “big switch” to branch off to the code which processes those characters in either horizontal mode (building paragraphs or inside an \hbox
).
Getting the next token
The maincontrol()
function calls the function getxtoken()
to get the next token of input; in turn, getxtoken()
calls the lowest-level input function getnext()
. After getnext()
has completed its work, and set the value of key global variables, getxtoken()
tests whether the item just read-in represents a macro or an expandable primitive.
Some notes on getnext()
and getxtoken()
:
getnext()
: this is TeX’s core input function. It reads and processes characters input from a from a physical file or reads tokens from a token list. Note thatgetnext()
doesn’t actually calculate token values but it does set a number of key global variables (e.g., the command code andcurcs
) which are used later in the processing.getxtoken()
: “get expanded token” usesgetnext()
to read the next item of input. It then tests the command code of the item just read in to check if that item represents an expandable command: if so, it is expanded or the macro is called. For non-expandable items,getxtoken()
calculates a token value and these details are passed on to the next stage of procesing: the “big switch” where non-expandable primitives are executed and characters are typeset.
In essence, it is via getxtoken()
that all expandable items are “filtered out” before they are processed in maincontrol()
: this results TeX’s “big switch” only proceesing non-expandable primitive commands.
Once again we stress that the above diagram is a simplification because other areas of TeX’s processing will also cause tokens to be expanded; for example:
- when TeX is looking (scanning) for numeric values it will trigger expansion in its search for integers (by calling
getxtoken()
) ; - the primitive command
\romannumeral
causes expansion—see the article Expansion using\romannumeral
for a nuanced use of that fact; - The
\edef
(“expanded definition”) macro-definition command forces full expansion; - TeX will expand tokens at certain points when processing the preamble of an
\halign
or\valign
(both commands are related to TeX’s table typesetting algorithms).
Overleaf guides
- Creating a document in Overleaf
- Uploading a project
- Copying a project
- Creating a project from a template
- Using the Overleaf project menu
- Including images in Overleaf
- Exporting your work from Overleaf
- Working offline in Overleaf
- Using Track Changes in Overleaf
- Using bibliographies in Overleaf
- Sharing your work with others
- Using the History feature
- Debugging Compilation timeout errors
- How-to guides
- Guide to Overleaf’s premium features
LaTeX Basics
- Creating your first LaTeX document
- Choosing a LaTeX Compiler
- Paragraphs and new lines
- Bold, italics and underlining
- Lists
- Errors
Mathematics
- Mathematical expressions
- Subscripts and superscripts
- Brackets and Parentheses
- Matrices
- Fractions and Binomials
- Aligning equations
- Operators
- Spacing in math mode
- Integrals, sums and limits
- Display style in math mode
- List of Greek letters and math symbols
- Mathematical fonts
- Using the Symbol Palette in Overleaf
Figures and tables
- Inserting Images
- Tables
- Positioning Images and Tables
- Lists of Tables and Figures
- Drawing Diagrams Directly in LaTeX
- TikZ package
References and Citations
- Bibliography management with bibtex
- Bibliography management with natbib
- Bibliography management with biblatex
- Bibtex bibliography styles
- Natbib bibliography styles
- Natbib citation styles
- Biblatex bibliography styles
- Biblatex citation styles
Languages
- Multilingual typesetting on Overleaf using polyglossia and fontspec
- Multilingual typesetting on Overleaf using babel and fontspec
- International language support
- Quotations and quotation marks
- Arabic
- Chinese
- French
- German
- Greek
- Italian
- Japanese
- Korean
- Portuguese
- Russian
- Spanish
Document structure
- Sections and chapters
- Table of contents
- Cross referencing sections, equations and floats
- Indices
- Glossaries
- Nomenclatures
- Management in a large project
- Multi-file LaTeX projects
- Hyperlinks
Formatting
- Lengths in LaTeX
- Headers and footers
- Page numbering
- Paragraph formatting
- Line breaks and blank spaces
- Text alignment
- Page size and margins
- Single sided and double sided documents
- Multiple columns
- Counters
- Code listing
- Code Highlighting with minted
- Using colours in LaTeX
- Footnotes
- Margin notes
Fonts
Presentations
Commands
Field specific
- Theorems and proofs
- Chemistry formulae
- Feynman diagrams
- Molecular orbital diagrams
- Chess notation
- Knitting patterns
- CircuiTikz package
- Pgfplots package
- Typesetting exams in LaTeX
- Knitr
- Attribute Value Matrices
Class files
- Understanding packages and class files
- List of packages and class files
- Writing your own package
- Writing your own class