An introduction to \endlinechar: How TeX reads lines from text files
As humans, we prefer to view and edit text files on a line-by-line basis. Once we think a line of text is long enough we hit “return” (in the text editor) to signal the end of that line. Behind the scenes, your text editor will interpret that as an instruction to add a newline character at the point where you decide to break a line.
However, if only it were that simple: the trouble is that different operating systems have varying notions of what constitutes a newline character. To make matters worse, Windows will treat newline characters differently depending on whether a file is opened in so-called binary mode or text mode. The result is that, depending on the host operating system, lines in a text file can be terminated by varying combinations of characters called carriage return (ASCII/Unicode character 13) and line feed (ASCII/Unicode character 10): denoted by \r
and \n
respectively.
Clearly, to be system-independent TeX needs a way to deal with the vaguries introduced through the different characters used to terminate a line within text files it needs to read and process.
TeX’s input buffer
You may, or may not, be surprised to learn that TeX engines (including LuaTeX and XeTeX) read input files a line at a time: they don’t read the entire text file into memory. Even though most text files processed by TeX engines are miniscule compared to the available memory on modern devices, each line in the file is individually read and stored in a small internal buffer. But, of course, TeX’s process of reading and storing a line has some additional twists.
“I’ll do it My Way”—TeX’s \endlinechar
command
When TeX reads another line of text from an input file it performs two “housekeeping tasks”:
- it removes any terminating newline characters (\r or \n) found at the end of the line—i.e., it strips out all line endings added when the text file was originally saved to disk;
- it also removes all trailing space characters found at the end of the line.
These two processes happen before TeX actually starts to scan the characters contained in the line itself: think of them as a form of “housekeeping” in preparation for the next stage of processing (scanning). So, during this initial stage of the line-reading process TeX has stripped off all platform-dependent line endings (and any trailing whitespace): so how will TeX know (detect) where that line ends? TeX has one more “trick” up its sleeve: the \endlinechar
command.
To avoid the problem of platform-dependent newline characters TeX introduces the concept of \endlinechar
, a user-definable parameter that TeX uses to insert its own end-of-line character to the very end of a line of text it has just read from a file. Note again that this happens before TeX actually starts scanning the characters—it is the final step is TeX’s “housekeeping” before it is ready to start reading (scanning) the actual characters contained in the line.
TeX will use the value stored in \endlinechar
to add its own end-of-line terminator if, and only if, \endlinechar
is appropriately defined—in Knuth’s TeX that means it has to have a value that is >-1 and < 256. Typically, \endlinechar
is assigned the value of 13: the carriage return character—usually denoted by \r
within programming literature.
But if you write \endlinechar=-1
somewhere within your input then the next time TeX reads a line of text from a file it will not add any additional terminator to the end of a line. Consequently, your input will be treated as one long continuous string of text until you reset \endlinechar
to an appropriate value—typically 13 (\r
):
\endlinechar=13
One of TeX’s 16 category codes (value 5) is reserved to identify the “end of line” character which is usually the character that \endlinechar
inserts—which is inserted if (and only if) the value of \endlinechar
is set to an appropriate value.
Summary of end-of-line processing
Although these details are quite low-level they will be of interest to anyone who wants to explore writing macros which deal with reading lines of text.
- When TeX reads a line from your file it will first strip out all end-of-line characters (
\r
and\n
) added by your text editor when the file was saved. In addition: - TeX also strips out any trailing space characters from the end of the line;
- TeX does not remove trailing tab characters (ASCII character code 9).
- After step (1) TeX adds (inserts) an additional character whose value is stored in \endlinechar (provided that is suitably defined: >-1 and < 256)
- \endlinechar is typically set to the value 13 (
\r
), which means that the character added in step (2) is usually character 13 (\r
)—but, of course, you can set \endlinechar to another value to achieve special effects via macro programming. - When its input scanning routines detect the character
\r
(character code 13) at the end of its internal buffer, TeX will, as usual, check its category code in order to decide what to do with it. - Character 13 usually has category code value of 5 (“end of line”) unless, of course, its category code has been changed—some macros make the end-of-line character active in order to do sophisticated processing.
- Depending on TeX’s internal state (in effect, what it is doing) TeX can change that the end-of-line character (usually
\r
, category code 5) into a space character—this is how end of lines characters become spaces. - Note too that TeX uses characters with category code 5 to detect when it has read an empty line and needs to generate a
\par
token.
Aside: One of LuaTeX’s source code files, the one which has code to perform this stripping of spaces, contains the following note:
(Cited in the file
luatex.c
) “David Fuchs mentions that this [space] stripping was done to ensure portability of TeX documents given the padding with spaces on fixed-record "lines" on some systems of the time, e.g., IBM VM/CMS and OS/360.”
The following graphic gives a visual summary of steps (1) and (2): stripping newline characters and trailing space characters and inserting \endlinechar
ready for the task of scanning the input.
Overleaf guides
- Creating a document in Overleaf
- Uploading a project
- Copying a project
- Creating a project from a template
- Using the Overleaf project menu
- Including images in Overleaf
- Exporting your work from Overleaf
- Working offline in Overleaf
- Using Track Changes in Overleaf
- Using bibliographies in Overleaf
- Sharing your work with others
- Using the History feature
- Debugging Compilation timeout errors
- How-to guides
- Guide to Overleaf’s premium features
LaTeX Basics
- Creating your first LaTeX document
- Choosing a LaTeX Compiler
- Paragraphs and new lines
- Bold, italics and underlining
- Lists
- Errors
Mathematics
- Mathematical expressions
- Subscripts and superscripts
- Brackets and Parentheses
- Matrices
- Fractions and Binomials
- Aligning equations
- Operators
- Spacing in math mode
- Integrals, sums and limits
- Display style in math mode
- List of Greek letters and math symbols
- Mathematical fonts
- Using the Symbol Palette in Overleaf
Figures and tables
- Inserting Images
- Tables
- Positioning Images and Tables
- Lists of Tables and Figures
- Drawing Diagrams Directly in LaTeX
- TikZ package
References and Citations
- Bibliography management with bibtex
- Bibliography management with natbib
- Bibliography management with biblatex
- Bibtex bibliography styles
- Natbib bibliography styles
- Natbib citation styles
- Biblatex bibliography styles
- Biblatex citation styles
Languages
- Multilingual typesetting on Overleaf using polyglossia and fontspec
- Multilingual typesetting on Overleaf using babel and fontspec
- International language support
- Quotations and quotation marks
- Arabic
- Chinese
- French
- German
- Greek
- Italian
- Japanese
- Korean
- Portuguese
- Russian
- Spanish
Document structure
- Sections and chapters
- Table of contents
- Cross referencing sections, equations and floats
- Indices
- Glossaries
- Nomenclatures
- Management in a large project
- Multi-file LaTeX projects
- Hyperlinks
Formatting
- Lengths in LaTeX
- Headers and footers
- Page numbering
- Paragraph formatting
- Line breaks and blank spaces
- Text alignment
- Page size and margins
- Single sided and double sided documents
- Multiple columns
- Counters
- Code listing
- Code Highlighting with minted
- Using colours in LaTeX
- Footnotes
- Margin notes
Fonts
Presentations
Commands
Field specific
- Theorems and proofs
- Chemistry formulae
- Feynman diagrams
- Molecular orbital diagrams
- Chess notation
- Knitting patterns
- CircuiTikz package
- Pgfplots package
- Typesetting exams in LaTeX
- Knitr
- Attribute Value Matrices
Class files
- Understanding packages and class files
- List of packages and class files
- Writing your own package
- Writing your own class