How TeX macros actually work: Part 5
Introduction and overview
In Part 4 we reviewed some basic properties of TeX macros in preparation for the next two articles where we take a close look at the underlying mechanics of TeX macros: specialized token lists. In these final two articles we use diagrams, called node lists, that were prepared from data generated using a specially modified version of Knuth’s original TeX software—those modifications were designed to access internal TeX data structures which are normally inaccessible to the user. By “hooking into” TeX’s internal macro-processing and execution routines it was possible to write out graphical data which enables a more detailed and accurate discussion of TeX’s macro-processing behaviour. Overleaf hopes that these diagrams assist readers to achieve a better understanding of how TeX macros really work.
Possible additional background reading
Overleaf has already published two token-related articles that provide additional background information on TeX tokens and TeX token lists. Do please take time to check them out if you need to fill any gaps in your understanding and help you get the most from Parts 5 and 6 of this series.
Macros as token lists
When TeX detects a macro-creation command (\def
, \edef
, \gdef
or \xdef
) within the input stream it triggers a process which converts both of the sections <parameter text><replacement text>
of our macro’s definition into one long token list—but a very particular type of token list.
Token lists for macros are slightly different to other token lists used within TeX because they contain “special” token values that only processes internal to TeX itself can create/generate: those special tokens cannot be directly created by any commands that you can include in your .tex file. TeX creates and uses those “special” token values to help with processing your macro call, as we’ll explore and explain below.
A brief word on how token lists are stored: nodes
To store a list of tokens (integer values) TeX uses a data structure called a linked list, which, in TeX’s case, comprises a list of so-called nodes. You can think of a node as a small package of computer memory which can be used to store a collection of data items. To store a macro, these nodes are strung together like a chain, where each node (link in the chain) can store several pieces of information—including a token value and the memory address of the next node in the list. For further information, you can read the article What is a TeX token list but the following diagram summarizes the key features of a macro stored as a token list:
Reminder: the 4 parts of a macro definition
As discussed in Part 4, the structure of any macro can be written as:
<TeX macro primitive><macro name><parameter text>{<replacement text>}
where:
<TeX macro primitive>
= one of\def
,\edef
,\gdef
or\xdef
;<macro name>
=the name of your macro, such as\foo
;<parameter text>
can be “null” (not present) or it can be an string of delimiter tokens and macro parameter tokens;<replacement text>
is the actual body of your macro: the section that is “executed” when you call the macro.
NOTE: (As also observed in Part 4) throughout the discussion we are assuming that <macro name>
will be followed by a space character of category code 10 to act as a delimiter to terminate the <macro name>
. We have not explicity shown that space character in our text/discussion but we assume it is there. Strictly speaking, we should represent it something like this:
<TeX macro primitive><macro name><space><parameter text>{<replacement text>}
However, we will omit explicit inclusion of a <space>
character and implicitly assume its presence.
NOTE: The characters {
and }
do not become part of the macro token list: their purpose is simply to tell text’s input scanner (which creates tokens) where the <replacement text>
starts and stops.
When TeX defines a macro, the sections <parameter text><replacement text>
are converted into one long continuous token list—the total number of tokens in that list depends on the complexity of the macro. As we’ve seen, the <parameter text> section has a specific purpose of acting as a “token template” or “blueprint” that TeX uses to pick out the tokens which form the arguments (values) to use with the actual macro: i.e., the tokens to feed into the <replacement text>
.
To firm-up these ideas, let’s take an example macro but keep it short so that subsequent diagrams do not become too cluttered:
\def\foo A#1\fake{123 #1}
For our macro, \foo
<parameter text>
=A#1\fake
<replacement text>
=123 #1
Although this example is a simple macro, it contains all the features we need to explore.
As noted, TeX will convert <parameter text><replacement text>
into one long token list which you can see in the diagram below. In our example, the tokens formed from A#1\fake{123 #1}
have been converted to a consecutive sequence of tokens stored in a token list (as a linked list of nodes).
Graphic showing a real macro token list
The following diagram, showing how the macro \def\foo A#1\fake{123 #1}
is stored, uses real data from inside a TeX engine. It was created using a customized version of Knuth’s TeX that was modified with additional code to intercept macro calls, examine TeX’s internal data and export it to format for processing using an open-source graphics program called Graphviz.
You can download the following graphic as a PDF file (675 KB) or SVG file (1.8 MB).
Understanding the nodes
Within the diagram above you’ll see that each node contains two data items called the next node and the current node. These are just integer values that represent memory locations inside TeX—locations where other nodes are stored. The values of next node and current node are not important, they simply store the locations (memory addresses) which allow nodes to be linked together in a list.
Back to the example
In the node diagram, the token list formed from A#1\fake{123 #1}
contains several “special tokens” introduced at the start of this article. In addition, the node list representing our macro starts with a “special first node”: we’ll explore what these are and what they do.
The very first item in a macro token list (and some other token list types) does not store a token value but a data item called the macro’s reference count which TeX uses to track the use of the macro.
The first token of the <parameter text>
is stored in the node that follows immediately after the reference count: you can see it is a token representing the letter A
with category code 11. From discussions in Parts 2 and 3 we know that a character token is calculated using
\[\text{token value}=256\times \text{category code} + \text{character code}\]
which, for a letter A
with category code 11, is
\[\text{token value}=256\times 11 + 65\]
giving the value 2881, as shown in the node.
The “command” \fake
used in \foo
Within our macro definition \def\foo A#1\fake{123 #1}
one of the delimiters is an undefined command \fake
which is stored within the token list as part of the <parameter text>
section. As you can see, within the overall macro token list \fake
is a token whose value is 19491
—an integer value calculated by TeX using the formula discussed in Part 3. When TeX attempts to execute \foo
it will expect to find the \fake
token value at the end of the <parameter text>
section. TeX will not try to execute the \fake
command because its role is merely to provide a form of “punctuation” within the <parameter text>
“token template”.
Special tokens in the <parameter text>
token list
The “end match” token
When calling a macro, TeX’s first task is to scan the macro as typed by the user and compare the tokens present in the user’s <parameter text>
section to the tokens contained within the template <parameter text>
stored in memory (created at the time the macro was defined). Because the macro’s full definition, constructed from <parameter text><replacement text>
is stored as one long consecutive list of tokens, TeX needs to know where, in that token list, <parameter text>
stops and where <replacement text>
starts. To achieve this, when TeX is defining the macro (building the token list) it will insert a special terminator token called an end match token as the very last token in the set of tokens generated from <parameter text>
. The end match token cannot be generated from user commands, only TeX itself can create it, hence TeX is certain to detect the end of the <parameter text>
.
Here, we can see that the first token following after end match is a token representing the digit 1
with category code 12. This should be expected because the <replacement text>
for our macro \foo
is 123 #1
—i.e., it starts with the token representing the digit 1
(with category code 12).
From the discussion in Parts 2 and 3 we know that a character token is calculated using
\[\text{token value}=256\times \text{category code} + \text{character code}\]
which, for a digit 1
with category code 12 is
\[\text{token value}=256\times 12 + 49\]
giving the token value 3121, as shown in the node.
“match parameter” tokens
When TeX stores the macro definition, it converts any parameter tokens (#1
, #2
… #9
) within <parameter text>
to one called a match parameter token. These tokens tell TeX that it needs to start looking for tokens, within the user’s macro call, that are the arguments of the macro.
Special tokens in the <replacement text>
token list
“output parameter” tokens
When TeX has processed everything and is ready to actually run (expand) the macro, the output parameter tokens instruct TeX of locations within the <replacement text>
where it needs to feed-in the tokens representing the arguments provided by the user when the macro was called. In effect, “At this location, insert the tokens representing the user’s argument n, where n=1...9”.
Within the <replacement text>
section of the stored macro-definition token list there will be an output parameter token corresponding to each #1
, #2
... #9
present in the original definition.
If we look at our definition of \foo
(\def\foo A#1\fake{123 #1}
) we see there is only 1 macro parameter (#1
) in the <parameter text>
(A#1\fake
) and subsequently only 1 macro parameter (#1
) appears in the <replacement text>
(123 #1
): this results in just 1 output parameter token present in the token list representing the <replacement text>
.
Note the following in the node list representing \foo
’s <replacement text>
:
- the token immediately before the output parameter token represents a space character (category code 10, character code 32) because there is a space between the
123
and the macro parameter (#1
) in the original definition of\foo
; - the output parameter is the last token in the list: the next node has a special value of “null” (meaning “empty”) which is used to terminate the list: there are no more nodes after output parameter because it is the final token, indicating the end of the
<replacement text>
and thus the end of the macro definition.
Part 6
In Part 6 we use some detailed graphics to explain and explore the exact meaning of macro expansion and the consequences of TeX’s tokenization of macro arguments prior to feeding them into a macro’s <replacement text>
.
Overleaf guides
- Creating a document in Overleaf
- Uploading a project
- Copying a project
- Creating a project from a template
- Using the Overleaf project menu
- Including images in Overleaf
- Exporting your work from Overleaf
- Working offline in Overleaf
- Using Track Changes in Overleaf
- Using bibliographies in Overleaf
- Sharing your work with others
- Using the History feature
- Debugging Compilation timeout errors
- How-to guides
- Guide to Overleaf’s premium features
LaTeX Basics
- Creating your first LaTeX document
- Choosing a LaTeX Compiler
- Paragraphs and new lines
- Bold, italics and underlining
- Lists
- Errors
Mathematics
- Mathematical expressions
- Subscripts and superscripts
- Brackets and Parentheses
- Matrices
- Fractions and Binomials
- Aligning equations
- Operators
- Spacing in math mode
- Integrals, sums and limits
- Display style in math mode
- List of Greek letters and math symbols
- Mathematical fonts
- Using the Symbol Palette in Overleaf
Figures and tables
- Inserting Images
- Tables
- Positioning Images and Tables
- Lists of Tables and Figures
- Drawing Diagrams Directly in LaTeX
- TikZ package
References and Citations
- Bibliography management with bibtex
- Bibliography management with natbib
- Bibliography management with biblatex
- Bibtex bibliography styles
- Natbib bibliography styles
- Natbib citation styles
- Biblatex bibliography styles
- Biblatex citation styles
Languages
- Multilingual typesetting on Overleaf using polyglossia and fontspec
- Multilingual typesetting on Overleaf using babel and fontspec
- International language support
- Quotations and quotation marks
- Arabic
- Chinese
- French
- German
- Greek
- Italian
- Japanese
- Korean
- Portuguese
- Russian
- Spanish
Document structure
- Sections and chapters
- Table of contents
- Cross referencing sections, equations and floats
- Indices
- Glossaries
- Nomenclatures
- Management in a large project
- Multi-file LaTeX projects
- Hyperlinks
Formatting
- Lengths in LaTeX
- Headers and footers
- Page numbering
- Paragraph formatting
- Line breaks and blank spaces
- Text alignment
- Page size and margins
- Single sided and double sided documents
- Multiple columns
- Counters
- Code listing
- Code Highlighting with minted
- Using colours in LaTeX
- Footnotes
- Margin notes
Fonts
Presentations
Commands
Field specific
- Theorems and proofs
- Chemistry formulae
- Feynman diagrams
- Molecular orbital diagrams
- Chess notation
- Knitting patterns
- CircuiTikz package
- Pgfplots package
- Typesetting exams in LaTeX
- Knitr
- Attribute Value Matrices
Class files
- Understanding packages and class files
- List of packages and class files
- Writing your own package
- Writing your own class