Version 0.2 released

Over the last few months, An Infinite Descent into Pure Mathematics has undergone a large overhaul, which I think will significantly improve its usefulness for readers. The result is version 0.2, which is a ‘preview’ of what will—hopefully this summer—be the first print edition.

What follows is a description of the major changes since version 0.1.

Both versions can be downloaded from the Archive page.

New structure

The most obvious difference between v0.1 and v0.2 is the order in which the sections appear and the overhaul of the material on logic and proofs.

Introductory material. The background material in the old §1.1 (Getting started) has been moved to a standalone chapter (new Ch 0) to separate it from the emphasis of new Ch 1 on logic. Its content is essentially unchanged, but I have a desire to expand it in order to be able to prove more interesting results in the early chapters of the book.

Logic and proofs. The material on proof techniques (old §1.2) and symbolic logic (old §2.1) has been almost entirely rewritten, although many of the examples remain the same as before. This material fills up new Ch 1 (Logical structure).

A lot of thought has gone into this change, with the guiding question being, why do we want students to learn about logic in the first place? The answer to that question is (or should be) that a solid understanding of the rules of logic assists with the creation, structure and wording of mathematical proofs. The updated material emphasises the connection between rules of logic and strategies for proof. Dealing with how the logical structure of a proposition informs the structure and wording of a proof will be the content of §A.2 (Writing a proof).

  • New §1.1 (Propositional logic) and §1.2 (Variables and quantifiers) introduce symbolic logic side-by-side with direct proof techniques. The goal of these sections is to learn how to decompose a proposition in terms of its logical structure, and to use the logical structure of a proposition to inform a proof strategy. With this in mind, logical operators and quantifiers are now introduced in terms of the rules of inference that define them, which makes them more meaningful and informative for use in proofs than reducing them to Boolean algebra using truth tables.
  • New §1.3 (Logical equivalence) introduces the notion of logical equivalence, now defined in terms of logical inference, as it should be. Truth tables are introduced now as a tool for proving logical equivalence, rather than as a means of defining logical equivalence. The emphasis of this section is on using logical equivalence to inform proof strategies.

Sets and functions. The material on functions was divided between old §2.3 and old §4.1—this material has been rearranged for added continuity and a more natural flow of content.

  • New §2.1 (Sets and set operations) is essentially the same as old §2.2. Some of the material has been reworded or rearranged, new material on intervals and open sets in $\mathbb{R}$ has been added, and reference to absolute complements of sets has been removed.
  • New §2.2 (Functions) is essentially the same as old §2.3, except the material on existence and uniqueness has been moved to new §1.2.
  • New §2.3 (Injections and surjections) is essentially the same as the portion of old §4.1 up to, but excluding, the content on finite sets and counting. I felt guilty proving that every surjection has a right inverse without mentioning the axiom of choice (when, in fact, this assertion is equivalent to the axiom of choice), so this matter is now treated in this section.

Finite and infinite sets. The focus of old Ch 4 was on using injective and surjective functions to compare the sizes of sets, finite or infinite. With injections and surjections defined in new Ch 2, it seemed pertinent to give finite sets and infinite sets their own treatments in their own chapters. New Ch 3 (Finite sets) focuses on everything finite, and Ch 6 (Infinite sets) focuses on infinite, or rather ‘not-necessarily-finite’, sets.

  • New §3.1 (The natural numbers) is the same in spirit as old §1.3 (Induction on the natural numbers), but has been improved in three main ways. First, definition by recursion is covered explicitly, rather than implicitly as it had been before. Second, a more careful treatment has been given of proof by strong induction where multiple base cases are required. Third, we cover the matter of Peano axioms and constructions of the natural numbers far more carefully than before. The order of material has been shuffled according to the new emphasis.
  • New §3.2 (Finite sets) focuses on defining the term ‘finite’ and on proving criteria for finiteness and properties of finiteness. The questions of interest in new §3.2 are when a set is finite, rather than how many elements a finite set has. It combines material from old §4.1 (from ‘A first look at counting’ onwards) and old §4.2 (up to but excluding the combinatorial treatment of binomials and factorials).
  • New §3.3 (Counting principles) is essentially the same as the latter part of old §4.2—the treatment of the addition and multiplication principles has been made less confusing and less messy.
  • New §6.1 (Countable and uncountable sets) is essentially the same as old §4.3. It has been moved later in the book in order to allow more interesting examples to be given.
  • Two new sections concerning infinite sets have been added, namely §6.2 (Cardinal arithmetic) and §6.3 (Ordinal arithmetic and the axiom of choice). When written, there will be a natural track through the book towards further study in mathematical logic and set theory: Ch 1 → Ch 2 → §3.1 & §3.2 → Ch 5 (at least §5.1 and §5.2) → Ch 6.

Number theory. The material on number theory (old Ch 3) was placed between two sections on functions (old §2.3 and §4.1). This was to provide temporary relief from the technical content on sets and functions, but it was unnatural from a mathematical point of view. Furthermore, some proofs in old Ch 3 used induction and counting principles. For these reasons, the number theory material appears in new Ch 4 (Number theory), after induction, functions and counting have been covered in their entirety. An instructor who is willing to skip some details in proofs could cover Ch 4 immediately after Ch 1.

Real analysis. The two sections on sequences and series have been combined into new §7.2 (Sequences and series), with room created for coverage of open sets and continuous functions in new §7.3 (Continuous functions).

Probability theory. Previously, only discrete probability spaces were covered, and old §7.3 (Expectation) was unnaturally short. The section on expectation of discrete random variables has been incorporated into new §8.2 (Discrete random variables), with a new §8.3 (Measure theory) focusing on defining the notion of a measure space and the more general notion of a probability space.

Additional topics. Old Ch 8 (Additional topics) has been deleted, since it was mostly unwritten anyway. Some of these topics have found their way into other sections, e.g. Boolean algebra in old/new §5.2, ordinal and cardinal numbers in new §6.2 and §6.3, and limits in new §7.3. The other topics will hopefully find their way into future versions of the book (it is an infinite descent, after all).

Appendices. The appendices have been slightly restructured, and the confusing material on theories and models has been deleted.

The approximate correspondences between new sections and old sections are summarised in the following table.

 

New section/chapter Old section(s)/chapter(s)
Ch 0 Getting started §1.1 — content unchanged
§1.1 Propositional logic §1.2 and §2.1
§1.2 Variables and quantifiers §1.2 and §2.1
§1.3 Logical equivalence §1.2 and §2.1
§2.1 Sets and set operations §2.2 — content unchanged
§2.2 Functions §2.3 — content unchanged
§2.3 Injections and surjections Beginning of §4.1
§3.1 The natural numbers §1.3
§3.2 Finite sets End of §4.1 and beginning of §4.2
§3.3 Counting principles End of §4.2
Ch 4 Number theory Ch 3 — content unchanged
Ch 5 Relations Ch 5 — content unchanged
§6.1 Countable and uncountable sets §4.3 — content unchanged
§6.2 Cardinal arithmetic
§6.3 Ordinal numbers and the axiom of choice
Ch 7 Real analysis Ch 6 — content restructured
Ch 8 Discrete probability theory Ch 7 — content restructured
Ch 8 Additional topics
Apx A Mathematical writing Apx C incorporated as §A.3
Apx B Miscellany §B.2
Apx C Hints to selected exercises Apx A — content unchanged

Other miscellaneous changes

  • The default size for on-screen reading is now 6″ × 9″, which will (eventually) be the dimensions of the print version of the book. I have set a parameter to display the book in two-up format (in most PDF reader software), meaning less scrolling is required. The A4, US letter, phone and tablet versions remain available.
  • Examples and exercises are now distinguished by colour; formerly they were both #007777 (for a reason I do not recall), but now exercises are #994400.
  • The book now makes use of enumerated ‘Strategies’ rather than the ‘tips’ that were in the previous version. Each strategy is an interpretation of a definition, axiom or theorem in terms of how it can be used in a proof.
  • The cleveref package has been implemented throughout, so now all references to definitions, theorems, sections, pages, and so on, are labelled automatically, and the label constitutes part of the hyperlink.
  • The font has been changed to Times New Roman. I have nothing against Computer Modern, but for some reason it just looks better with Times.
  • The ‘Proof’ label has been given more emphasis: it is now bold and on its own line for increased readability.
  • End-of-chapter exercises are in the process of being added. If you are an instructor, I heavily recommend avoiding referring to these by number (e.g. Chapter 1 Q7) because the numbers will most certainly change—instead, I recommend copying out the question.
  • An index of LaTeX notation has been added

Future updates

I am prioritising the following major changes for future versions of the book:

  • Finishing the long-unfinished sections, notably §7.2 and §7.3 on the convergence of sequences and series. (It goes without saying that I want to fill in all the gaps that currently exist.)
  • Including more exercises throughout, particularly end-of-chapter exercises.
  • Writing the appendix on mathematical writing. I have the ideas but not the time!

It’s all arbitrary

The word arbitrary is ubiquitous in mathematical texts. It is a very useful word, but it is a common cause of confusion for a few reasons:

  • The meaning of arbitrary in mathematical texts is different from its standard meaning in English;
  • The concept of arbitrariness is confusing in its own right;
  • Even in the context of mathematics, the word arbitrary can mean more than one thing.

My aim in writing this post is to clarify the differences between the standard English and mathematical English uses of the word arbitrary, and to provide some examples of the word arbitrary at work.

“Let $x$ be arbitrary”

The word arbitrary is most commonly used when introducing variables for the purposes of proving universally quantified statements.

What I mean by this is the following. Suppose you are trying to prove that every even integer is the sum of two odd integers. Your proof might look a little bit like this:

Let $x$ be an arbitrary even integer. Then $x=2n$ for some integer $n$. It follows that $x = 1 + (2n-1)$, which is the sum of two odd integers.

Expressions of the form “let $x$ be an arbitrary […]” are typical when proving that a property $p(x)$ holds for all elements $x$ of some set $X$. These kinds of statements are called universally quantified statements. [Symbolically, we would write the assertion that $p(x)$ is true for all elements $x$ of a set $X$ as “$\forall x \in X,\ p(x)$”.]

In the above example, $X$ was the set of even integers and $p(x)$ was the property “$x$ is the sum of two odd integers”.

A direct proof of a universally quantified statement $\forall x \in X,\ p(x)$ usually looks like this:

Let $x \in X$ be arbitrary. […proof of $p(x)$ goes here…]

And this is where the confusion begins. To illustrate, consider the following (non-)proof that every even integer is the sum of two odd integers.

Let $x$ be an arbitrary even integer, say $x=42$. Then $x = 17 + 25$, which is the sum of two odd integers, as required.

A seasoned mathematician might scoff at such a proof, but to a novice it is less clear why the proof doesn’t work.

The issue here is the dissonance between the standard English usage of arbitrary—meaning ‘based on individual discretion’ or ‘determined by impulse rather than reason’—and the mathematical usage. This meant that as soon as the proof-writer wrote the word arbitrary, their thought process was then distorted.

From the proof-writer’s perspective, having just written ‘let $x$ be an arbitrary even integer’, they picked an even integer arbitrarily (it just happened to be $42$), and the proof went through just fine!

If they took a step back, they’d realise what they wrote was not a proof that every even integer can be written as the sum of two odd integers. In fact, what they proved is that some even integer, namely $42$, can be written as the sum of two odd integers.

So what did they do wrong?

The reader and the writer

The main observation to be made is that, when you write ‘let $x \in X$ be arbitrary’, the power to choose a value of $x$ arbitrarily belongs not to the person writing the proof, but to the person reading it.

What this means is that, as soon as you have written ‘let $x \in X$ be arbitrary’, the reader should be able to replace all subsequent instances of the variable $x$ by a value of their choice, and the proof should remain true.

In this sense, the word arbitrary really means generic: the variable $x$ is treated as an element of $X$, but when reasoning about $x$, the only things we can assume about $x$ are those things that are true of all elements of $X$.

Let’s return to the previous example, where $X$ is the set of even integers. Lots of things are known about all even integers. For example, by definition of ‘even’, every even integer can be expressed in the form $2n$ for some integer $n$. This means that when reasoning about an ‘arbitrary’ even integer $x$, we are free to write $x=2n$ for some integer $n$, which may depend on the value of $x$.

There are things that are not true of all even integers, even though they might be true of some even integers. It is safe to say that ‘$x=17+25$’ is such a statement; this is only true for the integer $42$, and hence it is not a valid thing to use in a proof that all even integers $x$ are the sum of two odd integers. This was the shortcoming of the non-proof given above.

Recall the correct proof above that every even integer is the sum of two odd integers.

Let $x$ be an arbitrary even integer. Then $x=2n$ for some integer $n$. It follows that $x = 1 + (2n-1)$, which is the sum of two odd integers.

The reader should be able to replace $x$ by an arbitrary even integer, and the remaining proof should go through.

Let’s do this. Replacing $x$ by $42$, we obtain the following:

[Then] $42=2n$ for some integer $n$. It follows that $42 = 1 + (2n-1)$, which is the sum of two odd integers.

Is this true? Well, yes. The assertion that $42=2n$ for some integer $n$ is seen to be true by taking $n=21$. In this case, the rest of the proof reads:

It follows that $42 = 1 + 41$, which is the sum of two odd integers.

We can certainly agree that $1$ and $41$ are odd and that $1+41=42$.

But the point is that the reader didn’t need to have picked $x=42$. The reader could just as well have taken $x=64101272$ and the proof would still work.

“The values are arbitrarily small”

What makes the word arbitrary more confusing is that it can be used to mean something subtly different, especially in its adverbial form arbitrarily.

Here are some examples of statements that a learner of mathematics might encounter:

  • Since the terms of the sequence $(x_n)$ are eventually arbitrarily small, it follows that $\lim_{n \to \infty} x_n = 0$.
  • There are intervals in $S$ of arbitrarily long length.
  • If a theory $\mathbb{T}$ has arbitrarily large finite models, then $\mathbb{T}$ has an infinite model.

This kind of usage of arbitrar(il)y is even more confusing on first sight than the one discussed above (see here and here and here for some examples of questions asked by people confused by this very issue).

The best I can do to define this usage in the abstract is as follows: the expression ‘[object] is arbitrarily [adjective]’ means that no matter how [adjective] you want [object] to be, there is some instance of [object] which is at least as [adjective] as you wanted.

To illustrate, let’s look at what the relevant phrases in the three examples above really mean:

  • ‘The terms of the sequence $(x_n)$ are eventually arbitrarily small’ means that, for all $\varepsilon > 0$, there is a stage in the sequence after which all terms $x_n$ satisfy $|x_n| \le \varepsilon$.
  • ‘There are intervals in $S$ of arbitrarily long length’ means that, for all $\ell \ge 0$, there is an interval in $S$ whose length is $\ge \ell$.
  • ‘The theory $\mathbb{T}$ has arbitrarily large finite models’ means that, for all $n \in \mathbb{N}$, there is a model of $\mathbb{T}$ of size $\ge n$.

Notice in each case that the word arbitrar(il)y has been replaced by a univerally quantified statement: ‘for all $\varepsilon > 0$’ or ‘for all $\ell \ge 0$’ or ‘for all $n \in \mathbb{N}$’.

Just like before, this means that the reader—not the writer—has the power to choose the value to be made arbitrarily [adjective].

But there is another source of confusion in such statements, which is that what people want to believe is that the existence of arbitrarily [adjective] [objects] means that there exists an infinitely [adjective] [object].

For example, you might want to say that if the terms of $(x_n)$ are eventually arbitrarily small, then they are eventually zero. But this is not true: for example, taking $x_n = \frac{1}{n+1}$, we see that the terms are eventually arbitrarily small, but no value of $x_n$ is equal to zero.

As another example, take $S = \{ [0,n] \mid n \in \mathbb{N} \}$. This is a set of intervals, and they have arbitrarily long lengths since for any $\ell \ge 0$ the interval $[0, \lceil \ell \rceil]$ has length $\ge \ell$, where $\lceil \ell \rceil$ is the smallest integer greater than or equal to $\ell$. Since $S$ contains intervals of arbitrarily long length, you might be tempted to say that it contains an interval of infinite length… but it doesn’t, since the length of each interval in $S$ is a natural number, so each interval in $S$ has finite length.

The moral of the story

To summarise:

  • The word arbitrary means generic when used in the context ‘let [variable] be arbitrary’, meaning that the reader should be able to substitute whatever value for the variable that they please, and the proof should go through.
  • The expression arbitrarily [adjective] means that there is no bound on how [adjective] the object in question can be, but it does not necessarily imply that some object is infinitely [adjective], whatever that means.
  • When using the word arbitrary in mathematical writing, always remember that it is for the reader, not the writer, to make the arbitrary decisions.

Talks at Pitt ACOG seminar

Earlier today, I gave two talks at the Pitt Algebra, Combinatorics and Geometry seminar. The first was about my research, and the second was about the book. Their titles and abstracts are below.

Talk #1 title: Algebraic structures from dependent type theory

Abstract: There are many different notions of ‘model’ of dependent type theory, with each notion allowing us to view it in a different way. In recent work with Steve Awodey, we used the notion of a natural model to view dependent type theory through the lens of polynomial functors—this allowed us to identify a surprising monoid-like structure in the syntax of dependent type theory. This talk will be a high-level overview of our results and the ideas behind the proofs; I will assume basic knowledge of category theory but no knowledge of dependent type theory.

Talk #2 title: An infinite descent into pure mathematics

Abstract: A few years ago, I wrote a set of lecture notes for a course I was teaching over the summer. Various factors led to this evolving, almost by accident, into a fully fledged textbook. In this talk, I will discuss my experience of writing a textbook alongside doing my PhD research, with emphasis on the research-based pedagogical principles underlying its design.

Slides: available here.

Inquiry-based learning and solutions to exercises

One of the things I have come to discover since becoming more interested in the research behind teaching and learning is that what students like is often at odds with what helps students learn.

(This has huge implications for the correlation between student evaluations and student learning, and how this impacts career progression in academia, but that’s a topic for another day…)

At the end of each semester that my textbook has been used to teach a course at Carnegie Mellon, I have sent a feedback questionnaire to the students in order to find out how it can be improved. Every semester, by far the most common request from students is to include solutions to the (many) exercises.

To illustrate, here are a couple of quotes:

Could you have the answers to the exercises somewhere? It was hard to know if you did it right when there were no answers.

I understand the point of exercises, but to me without solutions they don’t really help. Going to the professors and TAs all the time isn’t very appealing to know if we have done the problem correctly.

My goal in writing this post is to justify my decision to withhold solutions to exercises, by way of introducing inquiry-based learning and discussing some of the research surrounding its effectiveness for teaching mathematics.

But first, in keeping with the principle of ‘if it ain’t broke, don’t fix it’, I should motivate the need to transition away from the traditional teaching model.

Tradition, tradition!

In a traditional undergraduate mathematics course, new material is delivered to students during class time by means of lecture—some courses include additional sessions where an instructor or teaching assistant demonstrates some examples or reviews some concepts. There might or might not be an accompanying textbook, where students can look up definitions, precise statements of theorems and fully fleshed-out proofs.

Assessment is typically by means of regular problem sheets,  a small number (0 to 3) of midterm examinations and a final exam; there might also be quizzes during class time. A student’s grade in the course is usually computed by taking some kind of weighted average of their scores on individual assessments (e.g. 20% homework average, 10% quiz average, 20% midterm 1 score, 20% midterm 2 score, 30% final exam score).

This formula for teaching is so entrenched in undergraduate mathematics education that it is difficult to imagine any other way of doing it. It is so widely accepted as the norm that no-one even things to ask whether there are improvements that can be made.

If you take a step back for a moment, things appear less than ideal.

Under this model, for example, problem sheets are the first opportunity students have to practise using their new knowledge, and yet they must turn in all their answers for credit. Whilst they may have seen examples in class or problem sessions, they have not had any chance to work independently and get things wrong without it hurting their grade.

I would argue that this is unfair.

Let me elaborate. A student’s final course grade should be a reflection of their performance of the course’s learning objectives upon completion of the course. Thus if a student achieves a B grade in a course, you should expect them to have a better grasp of the concepts and skills taught in the course than a student who achieves a C grade. This much I don’t think is too controversial.

If we begin assessing a skill for credit before providing opportunities to practise that skill, then we end up punishing students for acquiring a skill after it is first introduced but before the end of the course. This means that two students might end a course having met the same learning objectives, yet they could achieve different grades because one found it harder at first than the other to grasp the ideas or skills involved.

One way of resolving this problem is to build in opportunities for students to practise a skill or apply a concept before assessing their ability to do so for credit.

This is where inquiry-based learning (IBL) comes in.

Seek, and ye shall find

The main underlying principle of IBL is this: people learn more when they discover something for themselves than they do if someone tells them about it.

There are many flavours of IBL and many possible course design models on the spectrum of ‘traditional lecture’ to ‘fully fledged IBL classroom’.

A fully fledged IBL approach changes the teacher-student dynamic from giver and receiver (of knowledge) to facilitator and inquirer. Rather than copy down notes for later reading, a student in an IBL classroom is presented by the teacher with some kind of task, and is challenged to identify what assumptions must be made to complete the task, to ask questions, to make conjectures and to discover useful facts. The teacher’s role in this process is to guide the student through this process.

Of course, this is at the other extreme of the spectrum; inquiry-based strategies can be incorporated on a small scale, too; indeed, IBL activities could be interspersed throughout a predominantly lecture-based class.

At a very basic level, this could look something like stating a theorem and asking students to find a proof; or indicating what a new mathematical concept will need to do and then asking students to write down a proposal for a definition for that concept; or stating a problem and having students work out a solution.

By having students think in this way about the mathematical content being delivered, they stand to gain better conceptual knowledge. Furthermore, it helps students to practise skills in an environment where they can be wrong and learn from being wrong before being assessed on these skills for credit.

See here and here for just two studies demonstrating the effectiveness of IBL for teaching undergraduate mathematics.

Why there are no solutions in my textbook

Returning to the quotes from students at the beginning of this post, it is evident that the reason why students want solutions is because they want to know if they are correct.

When I was deciding whether to include exercises, I had to answer the following two questions:

  1. Does knowing when you’re correct improve your learning?
  2. If so, what is the best means of finding out if you’re correct?

After reviewing the research on IBL, I couldn’t quite convince myself either way on the first question. If the answer were to be ‘no’, then I would feel entirely justified in omitting solutions without further argument. So, for the sake of playing devil’s advocate against myself, I assumed the answer was ‘yes’ and moved to the second question.

I firmly believe that an important part of mathematical education is learning how to communicate mathematics. Being able to articulate your approach to solving a problem is a skill in its own right, and acquiring this skill is of fundamental importance to succeeding as a mathematician.

(Can you see where I’m going with this?)

By forcing students to communicate with others in order to determine whether their approach to a problem is correct, I am covertly incorporating an IBL strategy into the design of my textbook.

Each time a student approaches another student, their course instructor, their teaching assistant, or even their pet guinea pig, and articulates their approach to a problem, they are strengthening their mathematical communication skills and are performing mathematical inquiry.

To include solutions would be to deny students, particularly those lacking the necessary self-control to avoid looking at the solutions, the opportunity for learning a key mathematical skill and, at that, a skill that is not often explicitly targeted in undergraduate mathematics curricula.

And this is why I did not include solutions in my textbook.

A compromise

Having given my argument that providing solutions to exercises is antithetical to inquiry-based learning, I am now ready to announce a compromise that I recently made. Namely, there are now hints to selected exercises in the appendices of the textbook.

These hints are not really intended for students who are using the textbook; instead, they are intended for teachers: in order for a teacher and a discussion to have a fruitful solution about the student’s approach to a problem, it is helpful (but certainly not necessary) if the teacher has some idea of how to solve it!

Whilst students may find it helpful to look at a hint in order to solve an exercise, it is likely that their long-term conceptual knowledge of pure mathematics will reap greater rewards from speaking to others about it.

Dependent type theory is a monad and an algebra

I recently uploaded a paper to the arXiv entitled Polynomial pseudomonads and dependent type theory, joint work with Steve Awodey. It is a rather technical paper—the goal of this post is to provide some intuition behind the ideas. I assume some familiarity with dependent type theory and category theory.

Our main result is that, in a suitable sense, dependent type theory is a monad and an algebra.

More specifically, given a dependent type theory $\mathbb{T}$, we can interpret $\mathbb{T}$ into a suitably structured category $\mathbb{C}$. Under this interpretation, each syntactic rule of $\mathbb{T}$ corresponds with a certain feature of $\mathbb{C}$. What we prove in the paper is that the rules for a unit type $\mathbb{1}$, dependent sum types $\sum_{x : A} B(x)$ and dependent product types $\prod_{x:A} B(x)$ together describe the structure of a polynomial (pseudo)monad which is, in a suitable sense, a (pseudo)algebra over itself.

Let’s see what this all means.

Natural models

In this paper, Steve introduces the notion of a natural model, which is a category $\mathbb{C}$ equipped with a representable natural transformation $p : \dot{\mathcal{U}} \to \mathcal{U}$ between presheaves $\mathcal{U}, \dot{\mathcal{U}} : \mathbb{C}^{\mathrm{op}} \to \mathbf{Set}$.

We think of $\mathbb{C}$ as being the category whose objects are the contexts $\Gamma, \Delta, \dots$ of a type theory $\mathbb{T}$ and whose morphisms are the substitutions $\sigma : \Delta \to \Gamma$.

The presheaf $\mathcal{U}$ can be thought of as the presheaf of types-in-context. Thus an element $A \in \mathcal{U}(\Gamma)$ is a type $\Gamma \vdash A \text{ type}$. The action of $\mathcal{U}$ on morphisms provides the behaviour of types under substitutions: given a type $\Gamma \vdash A \text{ type}$ and a substitution $\sigma : \Delta \to \Gamma$, we obtain a type $\Delta \vdash A\sigma \text{ type}$.

The presheaf $\dot{\mathcal{U}}$ can be thought of as the presheaf of terms-in-context. Thus an element $a \in \mathcal{U}(\Gamma)$ is a term $\Gamma \vdash a : \dots$ wait, what’s the type of $a$? Well that’s what $p$ tells you: to say $\Gamma \vdash a : A$ is to say that $a \in \dot{\mathcal{U}}(\Gamma)$ and $p_{\Gamma}(a) = A \in \mathcal{U}(\Gamma)$.

Naturality of $p$ tells us that subtitutions on terms match up with substitutions on types, namely if $\sigma : \Delta \to \Gamma$ and $\Gamma \vdash a : A$, then $\Delta \vdash a\sigma : A\sigma$.

Representability of $p$ tells us that we can extend a context by any type in that context: given $\Gamma \vdash A \text{ type}$, there is an extended context $\Gamma.A$, together with a projection $\mathsf{p}_A : \Gamma.A \to A$ and a term $\mathsf{q}_A \in \dot{\mathcal{U}}(\Gamma.A)$.

If you know the definition of a category with families, this may look familiar: that’s because the two notions are equivalent!

Polynomials

In any locally cartesian closed category $\mathcal{E}$, we can regard a morphism $f : B \to A$ as a polynomial, which induces a polynomial functor $P_f : \mathcal{E} \to \mathcal{E}$ defined by $$P_f(X) = \sum_{a : A} X^{B_a}$$

The map $p : \dot{\mathcal{U}} \to \mathcal{U}$ lives in the presheaf category $\mathcal{E} = [\mathbb{C}^{\mathrm{op}}, \mathbf{Set}]$, which is locally cartesian closed; regarding $p$ as a polynomial, we obtain a polynomial functor $P=P_p : \mathcal{E} \to \mathcal{E}$ as before.

The polynomial functor $P$ turns out to be useful for describing the features of the natural model $(\mathbb{C},p)$ corresponding with syntactic rules of a dependent type theory $\mathbb{T}$. For example, the object $P(\mathcal{U})$ is described in the internal language of $\mathcal{E}$ by $\sum_{A:\mathcal{U}} \mathcal{U}^{[A]}$, where I’m writing $[A]=\dot{\mathcal{U}}_A$ for readability. Internally, the elements of $P(\mathcal{U})$ are pairs $\langle A,B \rangle$, where $A$ is a type and $B : A \to \mathcal{U}$ is a dependent type over $A$. Thus morphisms $P(\mathcal{U}) \to \mathcal{U}$ correspond with formation rules in type theory of the form $$\cfrac{\Gamma \vdash A \text{ type} \qquad \Gamma, x:A \vdash B(x) \text{ type}}{\Gamma \vdash C(A,B) \text{ type}}$$

A monad and an algebra (kind of)

In Steve’s paper on natural models, he describes the conditions on a natural model $(\mathbb{C}, p)$ which correspond with the syntactic rules for a unit type $\mathbf{1}$, dependent sum types $\sum_{x:A} B(x)$ and dependent product types $\prod_{x:A} B(x)$.

In each case, the condition on the natural model is simply the existence of a pullback square. These conditions can be translated into the language of polynomial functors in the following way:

  • The rules for a unit type correspond with the existence of a cartesian, strong natural transformation $\eta : 1 \Rightarrow P$;
  • The rules for dependent sum types correspond with the existence of a cartesian, strong natural transformation $\mu : P \circ P \Rightarrow P$;
  • The rules for dependent product types correspond with the existence of a cartesian morphism of polynomials $\zeta : P(p) \Rightarrow p$.

Having made this observation, any self-respecting category theorist would ask the following questions:

  • Is $(P,\eta,\mu)$ a monad?
  • Is $(p,\zeta)$ an algebra for this monad in any sense?

The answer to each question is… almost.

Remember, everything we do in the natural model reflects some kind of syntactic rule of the type theory. As such, the monad and algebra laws should mean something in the syntax. The following table shows the translation from the monad and algebra laws to the syntax at the level of types:

Monad laws Type theory
Associativity $\sum\limits_{\langle x,y \rangle \sum\limits_{x:A}B(x)} C(x,y) = \sum\limits_{x:A} \sum\limits_{y:B(x)} C(x,y)$
Left unit $\mathbf{1} \times A = A$
Right unit $A \times \mathbf{1} = A$
Algebra laws Type theory
Associativity $\prod\limits_{\langle x,y \rangle \sum\limits_{x:A}B(x)} C(x,y) = \prod\limits_{x:A} \prod\limits_{y:B(x)} C(x,y)$
Unit $A^{\mathbf{1}} = A$

The rules in the column on the right do not hold strictly in general. Rather, they hold up to canonical isomorphism. Thus at best we may hope for $(P,\eta,\mu)$ to be a pseudomonad and for $(p,\zeta)$ to be a pseudoalgebra over this pseudomonad.

What we do in our paper is to define the appropriate notion of equivalence, called adjustment, to state what it means to be a pseudomonad and pseudoalgebra in this sense. With this done, we prove that this is indeed what we have. I will not go into the details in this post, since that would amount to repeating what you can already read in the paper!

Conclusion

Our result elucidates a structure in type theory that, to our knowledge, has not been noticed before. In part this is because what was needed was the right point of view: in this case, what was needed was to use natural models to view dependent type theory through the lens of polynomial functors.

The result is also an interesting and perhaps unexpected connection between logic and abstract algebra, which raises the question of what other connections there may be out there!

Talk about formalisation at the GSS

Yesterday, I gave a talk at the CMU Graduate Student Seminar entitled Formalisation or: How I Learned to Stop Worrying and Love the Computer.

Abstract: What distinguishes mathematics from the empirical sciences is that we prove stuff with complete certainty. Ha, just kidding! If you look at the mathematical literature in a bit more detail, you’ll see a body of work littered with errors, omissions, disagreements, leaps of faith, appeals to intuition and duplicated efforts. In this talk, I will describe some ongoing efforts to use computers to help us overcome these problems, including work currently being done right here in our beloved Steel City. Prerequisites: nothing but an open mind and a hungry stomach.

Slides: available here.

Video recording: