Posts

It’s all arbitrary

The word arbitrary is ubiquitous in mathematical texts. It is a very useful word, but it is a common cause of confusion for a few reasons:

  • The meaning of arbitrary in mathematical texts is different from its standard meaning in English;
  • The concept of arbitrariness is confusing in its own right;
  • Even in the context of mathematics, the word arbitrary can mean more than one thing.

My aim in writing this post is to clarify the differences between the standard English and mathematical English uses of the word arbitrary, and to provide some examples of the word arbitrary at work.

“Let $x$ be arbitrary”

The word arbitrary is most commonly used when introducing variables for the purposes of proving universally quantified statements.

What I mean by this is the following. Suppose you are trying to prove that every even integer is the sum of two odd integers. Your proof might look a little bit like this:

Let $x$ be an arbitrary even integer. Then $x=2n$ for some integer $n$. It follows that $x = 1 + (2n-1)$, which is the sum of two odd integers.

Expressions of the form “let $x$ be an arbitrary […]” are typical when proving that a property $p(x)$ holds for all elements $x$ of some set $X$. These kinds of statements are called universally quantified statements. [Symbolically, we would write the assertion that $p(x)$ is true for all elements $x$ of a set $X$ as “$\forall x \in X,\ p(x)$”.]

In the above example, $X$ was the set of even integers and $p(x)$ was the property “$x$ is the sum of two odd integers”.

A direct proof of a universally quantified statement $\forall x \in X,\ p(x)$ usually looks like this:

Let $x \in X$ be arbitrary. […proof of $p(x)$ goes here…]

And this is where the confusion begins. To illustrate, consider the following (non-)proof that every even integer is the sum of two odd integers.

Let $x$ be an arbitrary even integer, say $x=42$. Then $x = 17 + 25$, which is the sum of two odd integers, as required.

A seasoned mathematician might scoff at such a proof, but to a novice it is less clear why the proof doesn’t work.

The issue here is the dissonance between the standard English usage of arbitrary—meaning ‘based on individual discretion’ or ‘determined by impulse rather than reason’—and the mathematical usage. This meant that as soon as the proof-writer wrote the word arbitrary, their thought process was then distorted.

From the proof-writer’s perspective, having just written ‘let $x$ be an arbitrary even integer’, they picked an even integer arbitrarily (it just happened to be $42$), and the proof went through just fine!

If they took a step back, they’d realise what they wrote was not a proof that every even integer can be written as the sum of two odd integers. In fact, what they proved is that some even integer, namely $42$, can be written as the sum of two odd integers.

So what did they do wrong?

The reader and the writer

The main observation to be made is that, when you write ‘let $x \in X$ be arbitrary’, the power to choose a value of $x$ arbitrarily belongs not to the person writing the proof, but to the person reading it.

What this means is that, as soon as you have written ‘let $x \in X$ be arbitrary’, the reader should be able to replace all subsequent instances of the variable $x$ by a value of their choice, and the proof should remain true.

In this sense, the word arbitrary really means generic: the variable $x$ is treated as an element of $X$, but when reasoning about $x$, the only things we can assume about $x$ are those things that are true of all elements of $X$.

Let’s return to the previous example, where $X$ is the set of even integers. Lots of things are known about all even integers. For example, by definition of ‘even’, every even integer can be expressed in the form $2n$ for some integer $n$. This means that when reasoning about an ‘arbitrary’ even integer $x$, we are free to write $x=2n$ for some integer $n$, which may depend on the value of $x$.

There are things that are not true of all even integers, even though they might be true of some even integers. It is safe to say that ‘$x=17+25$’ is such a statement; this is only true for the integer $42$, and hence it is not a valid thing to use in a proof that all even integers $x$ are the sum of two odd integers. This was the shortcoming of the non-proof given above.

Recall the correct proof above that every even integer is the sum of two odd integers.

Let $x$ be an arbitrary even integer. Then $x=2n$ for some integer $n$. It follows that $x = 1 + (2n-1)$, which is the sum of two odd integers.

The reader should be able to replace $x$ by an arbitrary even integer, and the remaining proof should go through.

Let’s do this. Replacing $x$ by $42$, we obtain the following:

[Then] $42=2n$ for some integer $n$. It follows that $42 = 1 + (2n-1)$, which is the sum of two odd integers.

Is this true? Well, yes. The assertion that $42=2n$ for some integer $n$ is seen to be true by taking $n=21$. In this case, the rest of the proof reads:

It follows that $42 = 1 + 41$, which is the sum of two odd integers.

We can certainly agree that $1$ and $41$ are odd and that $1+41=42$.

But the point is that the reader didn’t need to have picked $x=42$. The reader could just as well have taken $x=64101272$ and the proof would still work.

“The values are arbitrarily small”

What makes the word arbitrary more confusing is that it can be used to mean something subtly different, especially in its adverbial form arbitrarily.

Here are some examples of statements that a learner of mathematics might encounter:

  • Since the terms of the sequence $(x_n)$ are eventually arbitrarily small, it follows that $\lim_{n \to \infty} x_n = 0$.
  • There are intervals in $S$ of arbitrarily long length.
  • If a theory $\mathbb{T}$ has arbitrarily large finite models, then $\mathbb{T}$ has an infinite model.

This kind of usage of arbitrar(il)y is even more confusing on first sight than the one discussed above (see here and here and here for some examples of questions asked by people confused by this very issue).

The best I can do to define this usage in the abstract is as follows: the expression ‘[object] is arbitrarily [adjective]’ means that no matter how [adjective] you want [object] to be, there is some instance of [object] which is at least as [adjective] as you wanted.

To illustrate, let’s look at what the relevant phrases in the three examples above really mean:

  • ‘The terms of the sequence $(x_n)$ are eventually arbitrarily small’ means that, for all $\varepsilon > 0$, there is a stage in the sequence after which all terms $x_n$ satisfy $|x_n| \le \varepsilon$.
  • ‘There are intervals in $S$ of arbitrarily long length’ means that, for all $\ell \ge 0$, there is an interval in $S$ whose length is $\ge \ell$.
  • ‘The theory $\mathbb{T}$ has arbitrarily large finite models’ means that, for all $n \in \mathbb{N}$, there is a model of $\mathbb{T}$ of size $\ge n$.

Notice in each case that the word arbitrar(il)y has been replaced by a univerally quantified statement: ‘for all $\varepsilon > 0$’ or ‘for all $\ell \ge 0$’ or ‘for all $n \in \mathbb{N}$’.

Just like before, this means that the reader—not the writer—has the power to choose the value to be made arbitrarily [adjective].

But there is another source of confusion in such statements, which is that what people want to believe is that the existence of arbitrarily [adjective] [objects] means that there exists an infinitely [adjective] [object].

For example, you might want to say that if the terms of $(x_n)$ are eventually arbitrarily small, then they are eventually zero. But this is not true: for example, taking $x_n = \frac{1}{n+1}$, we see that the terms are eventually arbitrarily small, but no value of $x_n$ is equal to zero.

As another example, take $S = \{ [0,n] \mid n \in \mathbb{N} \}$. This is a set of intervals, and they have arbitrarily long lengths since for any $\ell \ge 0$ the interval $[0, \lceil \ell \rceil]$ has length $\ge \ell$, where $\lceil \ell \rceil$ is the smallest integer greater than or equal to $\ell$. Since $S$ contains intervals of arbitrarily long length, you might be tempted to say that it contains an interval of infinite length… but it doesn’t, since the length of each interval in $S$ is a natural number, so each interval in $S$ has finite length.

The moral of the story

To summarise:

  • The word arbitrary means generic when used in the context ‘let [variable] be arbitrary’, meaning that the reader should be able to substitute whatever value for the variable that they please, and the proof should go through.
  • The expression arbitrarily [adjective] means that there is no bound on how [adjective] the object in question can be, but it does not necessarily imply that some object is infinitely [adjective], whatever that means.
  • When using the word arbitrary in mathematical writing, always remember that it is for the reader, not the writer, to make the arbitrary decisions.

Talks at Pitt ACOG seminar

Earlier today, I gave two talks at the Pitt Algebra, Combinatorics and Geometry seminar. The first was about my research, and the second was about the book. Their titles and abstracts are below.

Talk #1 title: Algebraic structures from dependent type theory

Abstract: There are many different notions of ‘model’ of dependent type theory, with each notion allowing us to view it in a different way. In recent work with Steve Awodey, we used the notion of a natural model to view dependent type theory through the lens of polynomial functors—this allowed us to identify a surprising monoid-like structure in the syntax of dependent type theory. This talk will be a high-level overview of our results and the ideas behind the proofs; I will assume basic knowledge of category theory but no knowledge of dependent type theory.

Talk #2 title: An infinite descent into pure mathematics

Abstract: A few years ago, I wrote a set of lecture notes for a course I was teaching over the summer. Various factors led to this evolving, almost by accident, into a fully fledged textbook. In this talk, I will discuss my experience of writing a textbook alongside doing my PhD research, with emphasis on the research-based pedagogical principles underlying its design.

Slides: available here.

Inquiry-based learning and solutions to exercises

One of the things I have come to discover since becoming more interested in the research behind teaching and learning is that what students like is often at odds with what helps students learn.

(This has huge implications for the correlation between student evaluations and student learning, and how this impacts career progression in academia, but that’s a topic for another day…)

At the end of each semester that my textbook has been used to teach a course at Carnegie Mellon, I have sent a feedback questionnaire to the students in order to find out how it can be improved. Every semester, by far the most common request from students is to include solutions to the (many) exercises.

To illustrate, here are a couple of quotes:

Could you have the answers to the exercises somewhere? It was hard to know if you did it right when there were no answers.

I understand the point of exercises, but to me without solutions they don’t really help. Going to the professors and TAs all the time isn’t very appealing to know if we have done the problem correctly.

My goal in writing this post is to justify my decision to withhold solutions to exercises, by way of introducing inquiry-based learning and discussing some of the research surrounding its effectiveness for teaching mathematics.

But first, in keeping with the principle of ‘if it ain’t broke, don’t fix it’, I should motivate the need to transition away from the traditional teaching model.

Tradition, tradition!

In a traditional undergraduate mathematics course, new material is delivered to students during class time by means of lecture—some courses include additional sessions where an instructor or teaching assistant demonstrates some examples or reviews some concepts. There might or might not be an accompanying textbook, where students can look up definitions, precise statements of theorems and fully fleshed-out proofs.

Assessment is typically by means of regular problem sheets,  a small number (0 to 3) of midterm examinations and a final exam; there might also be quizzes during class time. A student’s grade in the course is usually computed by taking some kind of weighted average of their scores on individual assessments (e.g. 20% homework average, 10% quiz average, 20% midterm 1 score, 20% midterm 2 score, 30% final exam score).

This formula for teaching is so entrenched in undergraduate mathematics education that it is difficult to imagine any other way of doing it. It is so widely accepted as the norm that no-one even things to ask whether there are improvements that can be made.

If you take a step back for a moment, things appear less than ideal.

Under this model, for example, problem sheets are the first opportunity students have to practise using their new knowledge, and yet they must turn in all their answers for credit. Whilst they may have seen examples in class or problem sessions, they have not had any chance to work independently and get things wrong without it hurting their grade.

I would argue that this is unfair.

Let me elaborate. A student’s final course grade should be a reflection of their performance of the course’s learning objectives upon completion of the course. Thus if a student achieves a B grade in a course, you should expect them to have a better grasp of the concepts and skills taught in the course than a student who achieves a C grade. This much I don’t think is too controversial.

If we begin assessing a skill for credit before providing opportunities to practise that skill, then we end up punishing students for acquiring a skill after it is first introduced but before the end of the course. This means that two students might end a course having met the same learning objectives, yet they could achieve different grades because one found it harder at first than the other to grasp the ideas or skills involved.

One way of resolving this problem is to build in opportunities for students to practise a skill or apply a concept before assessing their ability to do so for credit.

This is where inquiry-based learning (IBL) comes in.

Seek, and ye shall find

The main underlying principle of IBL is this: people learn more when they discover something for themselves than they do if someone tells them about it.

There are many flavours of IBL and many possible course design models on the spectrum of ‘traditional lecture’ to ‘fully fledged IBL classroom’.

A fully fledged IBL approach changes the teacher-student dynamic from giver and receiver (of knowledge) to facilitator and inquirer. Rather than copy down notes for later reading, a student in an IBL classroom is presented by the teacher with some kind of task, and is challenged to identify what assumptions must be made to complete the task, to ask questions, to make conjectures and to discover useful facts. The teacher’s role in this process is to guide the student through this process.

Of course, this is at the other extreme of the spectrum; inquiry-based strategies can be incorporated on a small scale, too; indeed, IBL activities could be interspersed throughout a predominantly lecture-based class.

At a very basic level, this could look something like stating a theorem and asking students to find a proof; or indicating what a new mathematical concept will need to do and then asking students to write down a proposal for a definition for that concept; or stating a problem and having students work out a solution.

By having students think in this way about the mathematical content being delivered, they stand to gain better conceptual knowledge. Furthermore, it helps students to practise skills in an environment where they can be wrong and learn from being wrong before being assessed on these skills for credit.

See here and here for just two studies demonstrating the effectiveness of IBL for teaching undergraduate mathematics.

Why there are no solutions in my textbook

Returning to the quotes from students at the beginning of this post, it is evident that the reason why students want solutions is because they want to know if they are correct.

When I was deciding whether to include exercises, I had to answer the following two questions:

  1. Does knowing when you’re correct improve your learning?
  2. If so, what is the best means of finding out if you’re correct?

After reviewing the research on IBL, I couldn’t quite convince myself either way on the first question. If the answer were to be ‘no’, then I would feel entirely justified in omitting solutions without further argument. So, for the sake of playing devil’s advocate against myself, I assumed the answer was ‘yes’ and moved to the second question.

I firmly believe that an important part of mathematical education is learning how to communicate mathematics. Being able to articulate your approach to solving a problem is a skill in its own right, and acquiring this skill is of fundamental importance to succeeding as a mathematician.

(Can you see where I’m going with this?)

By forcing students to communicate with others in order to determine whether their approach to a problem is correct, I am covertly incorporating an IBL strategy into the design of my textbook.

Each time a student approaches another student, their course instructor, their teaching assistant, or even their pet guinea pig, and articulates their approach to a problem, they are strengthening their mathematical communication skills and are performing mathematical inquiry.

To include solutions would be to deny students, particularly those lacking the necessary self-control to avoid looking at the solutions, the opportunity for learning a key mathematical skill and, at that, a skill that is not often explicitly targeted in undergraduate mathematics curricula.

And this is why I did not include solutions in my textbook.

A compromise

Having given my argument that providing solutions to exercises is antithetical to inquiry-based learning, I am now ready to announce a compromise that I recently made. Namely, there are now hints to selected exercises in the appendices of the textbook.

These hints are not really intended for students who are using the textbook; instead, they are intended for teachers: in order for a teacher and a discussion to have a fruitful solution about the student’s approach to a problem, it is helpful (but certainly not necessary) if the teacher has some idea of how to solve it!

Whilst students may find it helpful to look at a hint in order to solve an exercise, it is likely that their long-term conceptual knowledge of pure mathematics will reap greater rewards from speaking to others about it.

Dependent type theory is a monad and an algebra

I recently uploaded a paper to the arXiv entitled Polynomial pseudomonads and dependent type theory, joint work with Steve Awodey. It is a rather technical paper—the goal of this post is to provide some intuition behind the ideas. I assume some familiarity with dependent type theory and category theory.

Our main result is that, in a suitable sense, dependent type theory is a monad and an algebra.

More specifically, given a dependent type theory $\mathbb{T}$, we can interpret $\mathbb{T}$ into a suitably structured category $\mathbb{C}$. Under this interpretation, each syntactic rule of $\mathbb{T}$ corresponds with a certain feature of $\mathbb{C}$. What we prove in the paper is that the rules for a unit type $\mathbb{1}$, dependent sum types $\sum_{x : A} B(x)$ and dependent product types $\prod_{x:A} B(x)$ together describe the structure of a polynomial (pseudo)monad which is, in a suitable sense, a (pseudo)algebra over itself.

Let’s see what this all means.

Natural models

In this paper, Steve introduces the notion of a natural model, which is a category $\mathbb{C}$ equipped with a representable natural transformation $p : \dot{\mathcal{U}} \to \mathcal{U}$ between presheaves $\mathcal{U}, \dot{\mathcal{U}} : \mathbb{C}^{\mathrm{op}} \to \mathbf{Set}$.

We think of $\mathbb{C}$ as being the category whose objects are the contexts $\Gamma, \Delta, \dots$ of a type theory $\mathbb{T}$ and whose morphisms are the substitutions $\sigma : \Delta \to \Gamma$.

The presheaf $\mathcal{U}$ can be thought of as the presheaf of types-in-context. Thus an element $A \in \mathcal{U}(\Gamma)$ is a type $\Gamma \vdash A \text{ type}$. The action of $\mathcal{U}$ on morphisms provides the behaviour of types under substitutions: given a type $\Gamma \vdash A \text{ type}$ and a substitution $\sigma : \Delta \to \Gamma$, we obtain a type $\Delta \vdash A\sigma \text{ type}$.

The presheaf $\dot{\mathcal{U}}$ can be thought of as the presheaf of terms-in-context. Thus an element $a \in \mathcal{U}(\Gamma)$ is a term $\Gamma \vdash a : \dots$ wait, what’s the type of $a$? Well that’s what $p$ tells you: to say $\Gamma \vdash a : A$ is to say that $a \in \dot{\mathcal{U}}(\Gamma)$ and $p_{\Gamma}(a) = A \in \mathcal{U}(\Gamma)$.

Naturality of $p$ tells us that subtitutions on terms match up with substitutions on types, namely if $\sigma : \Delta \to \Gamma$ and $\Gamma \vdash a : A$, then $\Delta \vdash a\sigma : A\sigma$.

Representability of $p$ tells us that we can extend a context by any type in that context: given $\Gamma \vdash A \text{ type}$, there is an extended context $\Gamma.A$, together with a projection $\mathsf{p}_A : \Gamma.A \to A$ and a term $\mathsf{q}_A \in \dot{\mathcal{U}}(\Gamma.A)$.

If you know the definition of a category with families, this may look familiar: that’s because the two notions are equivalent!

Polynomials

In any locally cartesian closed category $\mathcal{E}$, we can regard a morphism $f : B \to A$ as a polynomial, which induces a polynomial functor $P_f : \mathcal{E} \to \mathcal{E}$ defined by $$P_f(X) = \sum_{a : A} X^{B_a}$$

The map $p : \dot{\mathcal{U}} \to \mathcal{U}$ lives in the presheaf category $\mathcal{E} = [\mathbb{C}^{\mathrm{op}}, \mathbf{Set}]$, which is locally cartesian closed; regarding $p$ as a polynomial, we obtain a polynomial functor $P=P_p : \mathcal{E} \to \mathcal{E}$ as before.

The polynomial functor $P$ turns out to be useful for describing the features of the natural model $(\mathbb{C},p)$ corresponding with syntactic rules of a dependent type theory $\mathbb{T}$. For example, the object $P(\mathcal{U})$ is described in the internal language of $\mathcal{E}$ by $\sum_{A:\mathcal{U}} \mathcal{U}^{[A]}$, where I’m writing $[A]=\dot{\mathcal{U}}_A$ for readability. Internally, the elements of $P(\mathcal{U})$ are pairs $\langle A,B \rangle$, where $A$ is a type and $B : A \to \mathcal{U}$ is a dependent type over $A$. Thus morphisms $P(\mathcal{U}) \to \mathcal{U}$ correspond with formation rules in type theory of the form $$\cfrac{\Gamma \vdash A \text{ type} \qquad \Gamma, x:A \vdash B(x) \text{ type}}{\Gamma \vdash C(A,B) \text{ type}}$$

A monad and an algebra (kind of)

In Steve’s paper on natural models, he describes the conditions on a natural model $(\mathbb{C}, p)$ which correspond with the syntactic rules for a unit type $\mathbf{1}$, dependent sum types $\sum_{x:A} B(x)$ and dependent product types $\prod_{x:A} B(x)$.

In each case, the condition on the natural model is simply the existence of a pullback square. These conditions can be translated into the language of polynomial functors in the following way:

  • The rules for a unit type correspond with the existence of a cartesian, strong natural transformation $\eta : 1 \Rightarrow P$;
  • The rules for dependent sum types correspond with the existence of a cartesian, strong natural transformation $\mu : P \circ P \Rightarrow P$;
  • The rules for dependent product types correspond with the existence of a cartesian morphism of polynomials $\zeta : P(p) \Rightarrow p$.

Having made this observation, any self-respecting category theorist would ask the following questions:

  • Is $(P,\eta,\mu)$ a monad?
  • Is $(p,\zeta)$ an algebra for this monad in any sense?

The answer to each question is… almost.

Remember, everything we do in the natural model reflects some kind of syntactic rule of the type theory. As such, the monad and algebra laws should mean something in the syntax. The following table shows the translation from the monad and algebra laws to the syntax at the level of types:

Monad laws Type theory
Associativity $\sum\limits_{\langle x,y \rangle \sum\limits_{x:A}B(x)} C(x,y) = \sum\limits_{x:A} \sum\limits_{y:B(x)} C(x,y)$
Left unit $\mathbf{1} \times A = A$
Right unit $A \times \mathbf{1} = A$
Algebra laws Type theory
Associativity $\prod\limits_{\langle x,y \rangle \sum\limits_{x:A}B(x)} C(x,y) = \prod\limits_{x:A} \prod\limits_{y:B(x)} C(x,y)$
Unit $A^{\mathbf{1}} = A$

The rules in the column on the right do not hold strictly in general. Rather, they hold up to canonical isomorphism. Thus at best we may hope for $(P,\eta,\mu)$ to be a pseudomonad and for $(p,\zeta)$ to be a pseudoalgebra over this pseudomonad.

What we do in our paper is to define the appropriate notion of equivalence, called adjustment, to state what it means to be a pseudomonad and pseudoalgebra in this sense. With this done, we prove that this is indeed what we have. I will not go into the details in this post, since that would amount to repeating what you can already read in the paper!

Conclusion

Our result elucidates a structure in type theory that, to our knowledge, has not been noticed before. In part this is because what was needed was the right point of view: in this case, what was needed was to use natural models to view dependent type theory through the lens of polynomial functors.

The result is also an interesting and perhaps unexpected connection between logic and abstract algebra, which raises the question of what other connections there may be out there!

Talk about formalisation at the GSS

Yesterday, I gave a talk at the CMU Graduate Student Seminar entitled Formalisation or: How I Learned to Stop Worrying and Love the Computer.

Abstract: What distinguishes mathematics from the empirical sciences is that we prove stuff with complete certainty. Ha, just kidding! If you look at the mathematical literature in a bit more detail, you’ll see a body of work littered with errors, omissions, disagreements, leaps of faith, appeals to intuition and duplicated efforts. In this talk, I will describe some ongoing efforts to use computers to help us overcome these problems, including work currently being done right here in our beloved Steel City. Prerequisites: nothing but an open mind and a hungry stomach.

Slides: available here.

Video recording: