My PhD has too many directories in it.
I have—currently—three parts, and each of those has several chapters. All the chapters have been written independently, so I have a structure which more or less looks like this:
titlepage.tex
thesis.tex
-- 1.part1
|
+- 1.chapter1
|
+-- standalone.tex
+-- title.tex
+-- chapter1.tex
+- 2.chapter2
etc, only with three parts and sensible names. Each of the chapters builds to
a separate pdf with standalone.tex
, but now the Final Thing is upon me and I
needed to write thesis.tex
.
First, I thought of using a jinja2
template to generate it after scanning the
whole directory. This was the right solution, but it was boring, so I abandoned
it.
Then I thought of just \input
ting the files in thesis.tex
. And indeed I
started doing so, but after copy-pasting three or four times my sense of
neatness revolted. “What would Knuth think?” I asked myself. “Why did he give
you a turing-complete language? It wasn’t so you could do all the work with
emacs macros.”
Thus I decided to do the work in Tex. When I have to write Tex I prefer raw
Tex. There’s nothing nice about programming in a macro language, but it is
curious, in the sense that museums are full of curiosities. The only thing
worse than museums is museums with guided tours, which is what LaTeX is. On the
other hand, \newcommand
gets nice syntax highlighting, so we should use it for
the API. So I wrote:
\newcommand{\includeChapter}[1]{%
\inputTitle[#1]
\chapter{\title}
\inputChapter[#1]
}
So far, so clear. \inputTitle
has to exist because the title is a chapter
title in thesis.tex
, but a section title in standalone.tex
. So it
consists of a single definition:
\ifdef{\title}{\let\title\relax}{}
\newcommand{\title}{Spiritual Tradition and Theological Tradition}
Alright. I should probably just come clean and admit that I can remember some
things in LaTeX and others in Tex, and I just mix them freely. Anyhow. This
gives us a macro \title
, but how do we load it programmatically?
\includeChapter
is called with a relative path, like
0.frontmatter/0.acknowledgements
(yes, this is over-engineered). So loading
the title is easy: we can just append:
\def\inputTitle[#1]{\input{#1/title}}
Now all we need is to load the content. But—and I have home-rolled CI and all
kinds of other fragile things hanging on this convention—the name of each
chapter is meaningful. Thus the chapter on the controversy over Nature and
Grace is called nature-grace.tex
and lives at
1.part1/1.nature-grace/nature-grace.tex
. So we have to do some string
munging. What we want is something like python’s str.split()
. What we get
is another macro:
\def\splitstr #1.#2{#2}
\def\inputChapter[#1/#2]{\input{#1/#2/\expandafter\splitstr #2}}
This works because there’s actually nothing going on with []
in a TeX macro.
Or rather, TeX macros are defined with \def\name$ARGS$BODY
. (And they look
as awful as that does.) The {}
of LaTeX are just group markers. The []
are
just arbitrary (1-long groups of) characters between which the macro expander looks for
arguments. And those arguments are separated by whatever you put in.
So this is perfectly valid:
\def\munge Hi#1!#2Ho{Generates: #2 (#1)}
\munge Hiread this!bet you can'tHo
(If you don’t believe me, try it.) Now you know why LaTeX errors are so horrible and are never going to make sense.
Anyhow. So \def\splitstr #1.#2{#2}
is equivalent to
lambda x: x.split(".", maxsplit=1)[1]
in python, and we can extract our title.
The same trick splits the
path on /
, and now we can input the chapter.
At this point, mighty pleased with myself, I stuck includeChapter{path}
calls
in the right place, fixed the inevitable typos with words like “acknowledgements”
(which I simply can’t spell), and watched it compile.
Then a little later I wanted to see how close I was to the wordcount, so I ran
texcount thesis.tex
. Hang on: nothing. Because of course, this unparseable
macro language is unparseable. The only way to know what it will do is to run
it.1 So texcount
has implemented just enough TeX to follow \input
calls,
but it hasn’t (obviously) got any idea how to parse my macro.
Oh well, we can just count the chapters individually and sum the counts as we input them, can’t we. So we get:
\newcounter{wordCount}
\newcommand{\includeChapter}[1]{%
\inputTitle[#1]
\chapter{\title}
\inputChapter[#1]
\addWordCount[#1]
}
Hang on. If we do this, the counter will only be correct at the very end of the
file, when the final includeChapter
is expanded. But a wordcount on the back
page isn’t much use. So we need to call the macro before we output anything,
and defer actually including anything till later. In other words, we need it to
define another macro (in a non macro language we’d have it return something,
but hey, global state is what a document is). So we want a variadic macro:
\newcommand{\includeChapter}[2]{%
\expandafter\def\csname input#2\endcsname{%
\inputTitle[#1]
\chapter{\title}
\inputChapter[#1]
}
\addWordCount[#1]
}
Yes! If you want key-value pairs in TeX, you use macros. At any rate I think
it’s clearer than some “hashmap” package which is just going to do this under
the hood. \csname
and \endcsname
make macro names,2 and then we pass the
name in (I originally used a counter to generate \chapterOne
, chapterTwo
,
etc, but that just gets silly). But how do we write addWordCount
?
My first attempts is what everyone tries:
\def\addWordCount[#1]{\addtocounter{\input[#1/words.txt]}}
My second attempt was copy-pasted from the Tex.SE question everyone then lands on:
\usepackage{readarray}
\def\addWordCount[#1]{%
\readdef{#1/words.txt}\theReadCount
\addtocounter{wordCount}{\theReadCount}
}
Right. Now how do we get words.txt
? We could use an external shell call.
But actually I was simultaneously developing a proper build system, so we might
as well use that.
To start with, every chapter had its own makefile, generated by cookiecutter. But recursive make is bad, and anyhow we have top-level dependencies on lower files, so it makes no sense. Really we should have one makefile. But building it by hand is the same, mutatis mutandis, as before, except that RMS is probably not as polite as Knuth.
One option would be build systems. I consdered Meson for about ten seconds, and cmake for fifteen. (Autotools isn’t a build system, it’s a preventable disaster.) So we have a python script:
#!/usr/bin/python
from pathlib import Path
header = """\
.PHONY: all standalones thesis thesis-clean
LATEXMK ?= latexmk --pdf --synctex=1
"""
footer = """
thesis-clean:
${LATEXMK} -C
thesis: standalones titlepage.tex
${LATEXMK} thesis.tex
"""
pdf_template = """{deps}
cd {dir} && ${{LATEXMK}} standalone.tex
"""
skip = {"papers", "template"}
standalones = [
x for x in Path(".").glob("**/standalone.tex") if x.parent.parent.name not in skip
]
outf = Path("Makefile")
pdf_entries = {
str(s.with_suffix(".pdf")): pdf_template.format(
deps=" ".join(
str(x)
for x in (
s,
s.with_stem(s.parent.name.split(".")[1]),
s.with_name("title.tex"),
)
),
dir=s.parent,
)
for s in standalones
}
parts = [
header,
"all: standalones thesis",
"\n",
"standalones:" + " ".join(pdf_entries.keys()),
"\n",
"\n".join([":".join([k, v]) for k, v in pdf_entries.items()]),
footer,
]
outf.write_text("\n".join(parts))
There. We get a makefile, and however naughty it is to use a modern scripting language to generate a Makefile, and however horribly readable python is, it’s a makefile in the end. And best of all it took less time to write than it would take me to remember exactly how the whole thing with $ and { and } and @ works for arrays in bash.
Now all we need to do is to add s.with_name("words.txt")
to the dependencies
for pdf_entries
and add a rule to generate words.txt
by calling texcount
.
I actually have a bona fide shell script for this, although my eyes hurt slightly
when I look at it:
#!/usr/bin/env sh
cd $1 && texcount -brief -merge -sum=1,0,1,0,1,1,1 2>/dev/null "$2" | cut -d ':' -f1 | awk '{print $1}' > words.txt
I always forget about cut
. These days I’d probably use sed
for the whole
thing. (Those days I mostly used google.) Anyhow, it gets us the word count,
which is what we want, and then we concatenate it. Slightly to my surprise
LaTeX has no built-in ability to print numbers with separators, but
\usepackage{comma}
puts paid to that.
Now how many words do I have? 94,477. Drat. The limit 100,00 and I have two chapters still to go. I’d better get cutting.
Techically this is true3 of any turing complete language, but you can get actually do a pretty good job of parsing most sane languages without executing them, assuming the programmer hasn’t done anything stupid.
texcount
is a Perl script, so yes, it’s just one big regex and we’re parsing with a regex. ↩︎That will do for here. ↩︎
I think. I’m only a poor PhD student in theology. ↩︎