Open interactive version of the notebook

import marimo as mo
import polars as pl
from curryparty import L, o

Curryparty: a pinch of Lambda calculus

You're tired of Turing Machines ? Your functionnal programming friends make fun of you because you don't know what a ||(\beta||)-reduction is ? You think "currying" describes the action of adding spices to your dishes ? Then, you absolutely need to learn the basics of Lambda (||(\lambda||))-Calculus

What is this ?

This is a personnal project I created from scratch, to fulfil the very specific mission of teaching lambda-calculus in an interactive, visual way. The source code of the project is freely available on github, feel free to give me a star: https://github.com/rambip/curryparty Come with me, we're going on an adventure !

Prerequisites

1) To appreciate this journey into the world of pure functions, you need to have some prior knowledge of what a function is, or at least can do. I don't have a great resource for that. If you don't know what a function is, go learn some python

(or any programming language of your choice that is at least 5 character long. That's a good rule of thumb to avoid imperative and overcomplex languages: C, C++, Java, Rust)

2) You also need to have a sense of what "Computability" is. This can be a very alien concept if you did not study Computer Science, and I think it gives a good motivation to understand ||(\lambda||)-Calculus. As a teaser, I strongly recommend either:

reading the Church Turing thesis article in Standford Encyclopedia of Philosophy
watching this video on The boundaries of comuptation

Now that you're introduced to the subject, let's go !

A world of functions

Let's start with some rules:

No loop, no if / then / else
You can only use "0" and "1", and the only thing you can do on strings is concatenation: "a" + "b" to get "ab".
No list, no number, no operation (addition, substraction, multiplication and so on)
You cannot communicate with the external world inside the functions. You can only use "print" at the very end of your code.
You can use functions.

What do you think you can do with these rules ? Not much ? Let's see ...

"0" + "1"
# Not very interesting ...

'01'

Challenge 0

Print 0011 without using + more than 2 times.

Solution

def p0(x):
    # add "0" before
    return "0" + x

def p1(x):
    # add "1" before
    return "1" + x

p0(p0(p1(p1(""))))

This trick can seem strange, but you can do quite powerful things with it. Here is a new trick: pass functions as argument. Here is one example to show what it allows you do to:

def p0(x):
    return "0" + x

def p1(x):
    return "1" + x

# second trick: pass functions instead of values
def four_times(f):
    return f(f(f(f(""))))

print(four_times(p0))
print(four_times(p1))

0000
1111

Challenge 1

Write a function boo that returns either 0 or 1, depending on the argument. Remember, no "if" and no arithmetic !

Solution

def true(a, b):
    # left argument
    return a

def false(a, b):
    # right argument
    return b

def boo(side):
    return side("0", "1")

boo(false) # 0
boo(true) # 1

We could have called these functions left and right. So why did I call these functions true and false ? There is a good reason for that. Look at this mind-bending magic:

def true(a, b):
    return a

def false(a, b):
    return b

def boo(side):
    return side("0", "1")

def logical_not(x):
    return x(true, false)

print(boo(logical_not(false)))
print(boo(logical_not(true)))

1
0

Challenge 2

Write a logical_or and a logical_and using the same structure as above.

Solution

def logical_or(a, b):
    return a(a, b)

def logical_and(a, b):
    return b(a, b)

print(
    [
        boo(logical_or(false, false)),
        boo(logical_or(false, true)),
        boo(logical_or(true, false)),
        boo(logical_or(true, true)),
    ]
)

['1', '0', '0', '0']

print(
    [
        boo(logical_and(false, false)),
        boo(logical_and(false, true)),
        boo(logical_and(true, false)),
        boo(logical_and(true, true)),
    ]
)

['1', '1', '1', '0']

Challenge 3

The functions of church are defined in this way:

def one(f, x):
    return f(x)

def two(f, x):
    return f(f(x))

def three(f, x):
    return f(f(f(x)))

...

Write a function product(a, b) that takes as input two "church number functions" and returns as many ones as the product of a and b. Hint: you can use this function:

def n_ones(n):
    def result(x):
        return n(p1, x)
    return result

Solution

def three(f, x):
    return f(f(f(x)))

def four(f, x):
    return f(f(f(f(x))))


def n_ones(n):
    def result(x):
        return n(p1, x)
    return result

def mult(a, b):
    return a(n_ones(b), "\")

mult(four, five, "\")
# result: 111111111111

If you're not used to functionnal programming, you may have noticed something strange in the "n_ones" function:

def n_ones(n):
    def result(x):
        return n(p1, x)
    return result

We created a function on the fly, used some of the current context inside it (in this case, the variable n), and returned it immediatly. We gave it a name, but this name is completely arbitrary. We could have written it like this:

def n_ones(n):
    return lambda x: n(o, x)

What is this "lambda" keyword doing here ? Well, it allows to do exactly what we wanted: to create a function on the fly and to return it immediatly. That's the core idea of Lambda-calculus. I hope you start to have a an intuition of what lambda functions are, and what they allow to do. The next step is to define more formaly what these "lambdas" are. Personnaly, the first time I learned about lambda calculus was in this video from Computerphile. Highly recommend it.

Terms, lambdas and applications

What we call a "Lambda-function" or ||(\lambda||)-function, is a mathematical description of the functions we played with in the last part. A lambda-function is "something with variables" that "return something". Let's create one:

L("x").o("x")

Here is how to read it:

the blue block means "take a value"
the gray line means "pass the value"
the red block means "returns the value"

So what we created is the simplest possible lambda function: a machine that takes x and returns x. In python, this would be:

def _lambda(x):
    return x

But a lambda can have multiple variables:

return_first = L("x", "y", "z").o("x")
return_second = L("x", "y", "z").o("y")
return_third = L("x", "y", "z").o("z")
mo.hstack([return_first, return_second, return_third])

Let's focus on the first diagram. It has multiple blue blocks, which means we take multiple things. To track them, let's give them names:

the first blue block (from the top) takes x
the second blue block takes y
the third blue block takes z

The red block still means "return the value". But wich one ? To know that, we have to look at the gray line. Since the gray line connects the first blue block with the red block, that means it returns the first value, x From top to bottom, these functions are:

lambda x, y, z: x
lambda x, y, z: y
lambda x, y, z: z

But there is another way to think about the machines we created. Let's focus on the last one. It's like a machine that takes x, and returns a new machine. This machines takes y, and returns yet another machine. Thist last machine takes z, and return z. You may find it confusing to think about the lambda-functions we created in this way, but it's worth it. You can see the exact same phenomenon in python:

def add(a, b):
    return a+b

def add_curried(a):
    def add_to_a(b):
        return a+b
    return add_to_a

# or equivalently
def add_curried(a):
    return lambda b: a + b


add_curried(3)(4)

This process of transforming a function with 2 arguments into a function that returns a new function has a name. It's called Curryfication (in honor or Haskell Curry, another big name in the world of lambda-calculus) So we can take stuff and return stuff. We can't do much yet. We're missing something: applying (or calling) functions. Let's see that in action:

L("x", "f").o("f", "x")

Let's unpack it. This lambda function takes a value x (first blue block), then a function f (second blue bock), use the function f (left gray line) and pass the x value (right gray line) to it (black horizontal line). For readability, the function that is called is under a black dot, and with a yello border. That's it ! With these 3 ingredients (blocks that take arguments, blocks that return arguments, block that apply functinos), we can make any function we want. Before we go on, some terminology:

a "Lambda", or ||(\lambda||) , is a blue block. It corresponds to a single variable.
a "Variable" is a red block. It is "bound" to a specific lambda, indicated by a gray line
an "Application" is a horizontal black line. The function is the stuff inside the yellow border and the argument is at its right.
a "Term" is any combination of the above. If the thing at the top is a Lambda, it's called a lambda-function.

Let's define the ||(\lambda||)-terms we already saw in the python challenges:

do_nothing = L("x").o("x")
do_nothing  # it's also called the "identity" function

l_false = L("a", "b").o("a")  # takes the left (first) argument
l_true = L("a", "b").o("b")  # takes the right (second) argument
l_false

l_true

Remember the "Church" functions from the python challenges ? The concept comes from ||(\lambda||)-calculus, they are often called "Church numerals" (Church was the main creator of ||(\lambda||)-calculus). As a reminder:

zero = lambda f, x: x
one = lambda f, x: f(x)
two = lambda f, x: f(f(x))
...

Look at the graphical representation below (use arrows to navigate). It's the same structure each time:

take f (blue block)
take x (blue block)
pass x through the function f a given number of time.

zero = L("f", "x").o("x")
one = L("f", "x").o("f", "x")
two = L("f", "x").o("f", o("f", "x"))
three = L("f", "x").o("f", o("f", o("f", "x")))
four = L("f", "x").o("f", o("f", o("f", o("f", "x"))))
mo.hstack([zero, one, two, three])

Let's try something harder.

s_myst = L("n", "m", "f", "x").o("n", o("m", "f"), "x")
s_myst

This starts to look a bit abstract, but with a bit of practice, these diagrams can be read pretty easily. As an exercise, let's write the corresponding python function, just by looking at the diagram. The first step is to create 4 lambdas:

mysterious_function = lambda x1: lambda x2: lambda x3: lambda x4: ...

Then, the rule of thum is: read bottom to top. By looking at the 2 red squares at the right of the last line, we see that x2 is called with the argument x3. Then, x1 is called on the result. Let's call the expression ||(\alpha = x_1(x_2(x_3))||)

mysterious_function = lambda x1: lambda x2: lambda x3: lambda x4: ... x1((x2(x3))) ...

The last step is to call ||(\alpha||) with the argument x4. That is what the yellow border without a background means.

mysterious_function = lambda x1: lambda x2: lambda x3: lambda x4: x1((x2(x3)))(x4)

In a textbook, this function would be written as:

||(\lambda x_1. \lambda x_2 . \lambda x_3 . \lambda x_4 . \quad x_1 (x_2 \; x_3) x_4||)

Exact same idea, except we place parenthesis differently for readability.

Aside: composing and forwaring

If we take a step back, we see that they are 2 ways to assemble a sequence of variables: composing and forwarding. In python, you can think about them this way:

def compose(a, b, c, d, e, x):
    return a(b(c(d(e(x)))))

def forward(a, b, d, e, x):
    return a(b, c, d, e, x)

Below is an illustration. The "Application" nodes are represented with the ||(\circ||) symbol.

Composition

Forwarding

Now in our lambda calculus block representation:

compose = L("f", "g", "h", "x").o("f", o("g", o("h", "x")))
forward = L("f", "g", "h", "x").o("f", "g", "h", "x")

compose

forward

Cool right ?

mysterious function

Let's go back to our mysterious function:

s_myst

It turns out that this function implements the multiplication of church numbers. Let's try it out:

s_myst(three)(four)

Wait, what is this monstruosity ? Well, it's a term. It does not have a ||(\lambda||) at the top, so it's clearly not a lambda function. It's still a term, a term that bundles together s_myst, two and three. But this not what we want. We want to know what is the result of this operation. Right now, we can't. We did not specify how this is supposed to be transformed. We need a way to run it, like in python: a semantic. Don't worry, I have implemented it. Let's jump directly to the result:

s_myst(two)(three).reduce()

And indeed, we get 6. But you may wonder (and you should): how did we go from the monstruosity to the nice "6" result ? I spent weeks animating this stuff, so I hope you will enjoy it (for lack of understanding it, at least for now). Click on the left and right arrow to navigate, click on the figure to animate.

mo.carousel([x.show_beta() or x for x in s_myst(three)(four).reduction_chain()])

click to animate, move away and back to reset

The beta reduction

As one dives deeper into the formalism and the internals of lambda-calculus, it's easy to get lost. To make it easier, it's fundamental to hold back to what we know already. And what do we know already: how python functions work ! Or at least, we can have a pretty good idea. So let's take an example, and let's think about "how would python calculate it" Here is the term we will analyse:

||[ \lambda n \lambda f \lambda x. f (n f x) ||]

If you want, click below to know what this function compute. If you prefer trying to guess it as we detail each step of the calculation, don't.

Spoiler: what does this function compute ?

This function is the successor function. It takes a church numeral (e.g two = ||(\blue{\lambda f \lambda x. f(f(x))}||)) and returns the next church numeral (e.g three = ||(\blue{\lambda f \lambda x. f(f(f(x)))}||))

s = L("n", "f", "x").o("f", o("n", "f", "x"))
stuff_to_compute = s(one)
stuff_to_compute

and here is the python version:

s_py = lambda n: lambda f: lambda x: f(n(f)(x))
result_py = lambda f: lambda x: f(x)
# TODO: compute `s_py(one_py)`

Challenge 4

Try to find what the first step is for python to compute succ_py(one_py)

Solution

To compute succ_py(one_py), the key is to replace the variable n in succ_py Since we pass one_py as the argument, we must replace n by one_py. Let's do it:

# before
lambda n: lambda f: lambda x: f(n(f)(x))

# after
lambda f: lambda x: f((  lambda f: lambda x: f(x)    )(f)(x))

It starts to be hard to see, so let's use the more math-y representation:

||[ \big (\green{\lambda n} \lambda f \lambda x. f(\green{n} f x) \big)(\blue{\lambda f \lambda x .f(x)})\\ \downarrow \\ \lambda f \lambda x f((\blue{\lambda f \lambda x. f(x)}) f x) \\ ||]

When you see the pattern, it starts to make sense. The first agument was ||(\green n||), and since we have provided this argument with the value, ||(\blue{\lambda f \lambda x .f(x)}||), we replace each occurence of ||(\green n||) with ||(\blue{\lambda f \lambda x .f(x)}||)

Once you feel like you understand, I invite you to look at the animated version below. It's exactly the same thing, just with blocks instead of letters.

stuff_to_compute.show_beta()

click to animate, move away and back to reset

Now, we have a new expression. Let's rewrite it here:

||[ \lambda f \lambda x. f\big((\lambda f \lambda x. f(x)) f x\big) ||]

If you prefer the python version:

lambda f: lambda x: f(lambda f: lambda x: f(x))(f)(x)

There is something spooky going on here: lambda f twice and lambda x twice. We can apply a trick: renaming

lambda f: lambda x: f(lambda f1: lambda x1: f1(x1))(f)(x)

It's easy to get lost and to rename the wrong thing. Hopefully, our graphical representation do not have this problem. Instead of having to "rename", we have to "reconnect" the variables to the right block.

Aside: renaming

Let's consider this expression:

||[ \big(\lambda f. f(f(f))\big)(\blue{\lambda x \lambda y .x}) ||]

When we apply the first operation, we get this (quite long) expression:

||[ (\blue{\lambda x \lambda y .x})\big((\blue{\lambda x \lambda y .x})(\blue{\lambda x \lambda y .x})\big) ||]

But ||(x||) and ||(y||) don't have the same role in the first group, in the second group and in the third group. So we could rename it, just like the previous example:

||[ (\blue{\lambda x1 \lambda y1 .x1})\big((\blue{\lambda x2 \lambda y2 .x2})(\blue{\lambda x3 \lambda y3 .x3})\big) ||]

Now compare this with the graphical representation:

_a = L("f").o("f", "f", "f")
_b = L("x", "y").o("x")
_a(_b)

We don't have to rename anything, because the blue blocks (corresponding to ||(\lambda x \lambda y||)) are duplicated, and the new red blocks (the variables) are connected to the right blue blocks.

Challenge 5

Let's go back to our expression. After the first step, we arrived at:

lambda f: lambda x: (lambda f1: lambda x1: f1(x1))(f)(x)

# it's a bit hard to read, so let's rewrite it:
def lambda_(f):
    def lambda_(x):
        inside_block = lambda f1: lambda x1: f1(x1)
        return f(inside_block(f)(x))
    return lambda_

Do you see a way to simplify this function, meaning to write a simpler function that does exactly the same thing ?

Solution:

lambda f: lambda x: f(f(x))

The exterior block takes "f" and "x", and forward them to the interior block. The interior block takes "f", renames it as "f1", takes "x", renames it as "x1", and return "f(x)" Last step: the exterior block takes the result from the interior block "f(x)" and calls f a last time: "f(f(x))"

You can now look at the corresponding animations:

mo.carousel([x.show_beta() or x for x in stuff_to_compute.beta().reduction_chain()])

click to animate, move away and back to reset

Congratulations ! You just understood how lambda calculus works. Let's give a name to the operation used to go from one term to the next: beta-reduction. We note it like this:

||[ \big (\green{\lambda n} \lambda f \lambda x. f(\green{n} f x) \big)(\blue{\lambda f \lambda x .f(x)}) \rightarrow_{\beta} \lambda f \lambda x f((\blue{\lambda f \lambda x. f(x)}) f x) ||]

To discuss beta-reduction even more precisely, we need to introduce a little more terminology. Let's take this expression:

||[\big (\green{\lambda n} \lambda f \lambda x. f(\green{n} f x) \big)(\blue{\lambda f \lambda x .f(x)})||]

The entire formula is called a redex (the stuff we want to reduce). The ||(\green{\lambda n}||) is the root-lambda of the redex, and ||(\blue{\lambda f \lambda x. f(x)}||) is the argument of the redex. Each ||(\green n||) appearing in the term is an occurence "bound to the root-lambda". To recap, the algorithm is:

find a redex of the form ||((\green \lambda \green x E)(\blue{A})||)
replace every occurence of ||(E||) that is bound to the root lambda with a copy of ||(\blue A||)
each time you replace, rename all the variables that you need to rename (*)
remove ||(\green{\lambda x}||) and ||(\blue A||)

Note: I did not detail all the cases for the renaming step. It's subtle: you have to rename a variable only if it's bound to the root lambda of the argument ||(\blue A||), because you don't want to change variables that appear higher in the tree.

There is an important thing to note here: I said find a redex and not take the redex. We have already seen this in action, but you may not have paid attention. Look back at:

stuff_to_compute.beta().beta()

In the graphical representation, the redexes appear in blue with a yellow border. (this makes sense: it's a lambda (blue) with an argument (yellow)). Here, the redex is not at the top of the expression, it's somewhere inside. But it can be even worse, for example:

_e = L("y").o(L("x").o("x"), "y")
L("f").o("f", _e, _e)

Here, we have 2 potentials redex to chose from to do our lambda-reduction. Which one do we chose ? It does not matter. Well, it does matter, but ... it does not really matter. Let me introduce: termination and the Church-Rosser Theorem

Termination and the Church-Rosser theorem

All the terms I used from the beginig were really nice: they simplified (more precisely, beta-reduced) until they reached a very simple form: a form with no potential redex. Is it always the case ? It might surprise you, but no. There is even a simple expression that never stop reducing. Here it is:

||[ \bigg (\lambda f. f(f)\bigg)\bigg(\lambda g. g(g))\bigg) ||]

Challenge 6: Try to reduce this expression, and see what happens.

Solution

||[ \bigg (\green{\lambda f}. \green f(\green f)\bigg)\bigg(\blue{\lambda g. g(g)})\bigg) \rightarrow_{\beta} (\lambda g. g(g))(\lambda g. g(g))) \rightarrow (\lambda f. f(f))(\lambda g. g(g))) ||]

This expression reduces to itself !

Now for the animated version:

omega = L("f").o("f", "f")
omega(omega).show_beta()

click to animate, move away and back to reset

And it's not just something funny in our rules: the exact same thing appens in python (click on the button to run):

if bomb_button.value:
    (lambda f: f(f))(lambda g: g(g))

Time for a recap:

A lambda-term can have zero, one or multiple "redexes" of the form ||((\green{\lambda x}. E)(\blue A)||)
- if it has zero redex, we can't reduce it: we're done. Let's call it the "final form"
We can "beta-reduce" a term
- start by chosing a redex if they are multiple
- apply the procedure to transform the redex. You get a new term
Some terms are "nice". A "nice" term:
- has a single redex
- reduces to a term that has a single redex, and we can keep reducing ...
- reaches a "final form" if we reduce enough times
This means that:
- some terms are not nice because they have multiple redexes
- some terms are not nice because they never reduce to a final form.

How do we handle "not nice" tems ? How do we chose the right redex to reduce ? Will this change the result ? How do we know if a term will eventually reach a final form ? This is the kind of questions that the creators of lambda-calculus have asked themselves. And since they were mathematicians, they have worked very hard to get answers. Let's discuss three fundamental theorems of lambda-calculus

Church-Rosser theorem

The Church-Russer Theorem states that if you take a term, and start applying beta-reductions in two different ways, you can't find completely different results. More precisely, if Alice applies some number of reductions until she reaches A, and if Bob applies some other number of reductions until he reaches B, there is a way to keep applying reductions to A and B until they become exactly the same. It DOES NOT MEAN that you can pick the beta-reductions at random. It's possible that Alice is smart and find a way to reduce to a final form, and Bob is dumb and keep reducing forever, because he chose the wrong regex each time. But at each step, if Bob becomes smart, he can reach a final form. It DOES mean that there is only one possiblef final form. We call it the normal form Below is an example where the choice of redex matters.

Challenge 7

find a strategy to reduce to a normal form, and a strategy to keep reducing forever.

||[ (\lambda a \lambda b. b) \bigg((\lambda f. f(f))(\lambda g. g(g))\bigg) \big(\lambda f\lambda x. f(x)\big) ||]

If you prefer the graphical version:

l_true(omega(omega))(one)

Solution

If you start by reducing the outer redex two times, you will drop the big expression in the middle.

||[ (\green {\lambda a} \lambda b. b) \bigg(\blue{(\lambda f. f(f))(\lambda g. g(g))}\bigg) \big(\lambda f\lambda x. f(x)\big) \\ \downarrow_\beta \\ (\lambda b. b) \sout {\bigg((\lambda f. f(f))(\lambda g. g(g))\bigg)} \big(\lambda f\lambda x. f(x)\big) \\ \downarrow_\beta \\ \lambda f\lambda x. f(x) ||]

But if you reduce the inner redex, you get the same term. You can go on forever:

||[ (\lambda a \lambda b. b) \bigg((\green{\lambda f}. \green f(\green f))(\blue{\lambda g. g(g)})\bigg) \big(\lambda f\lambda x. f(x)\big) \\ \downarrow_\beta \\ (\lambda a\lambda b. b) \bigg((\lambda f. f(f))(\lambda g. g(g))\bigg) \big(\lambda f\lambda x. f(x)\big) \\ ||]

The leftmost-outermost reduction is normalizing

This looks scary, but it is simpler than the first theorem. This theorem states that the smart way to reduce is to take the first redex. leftmost-outermost is a fancy way to say "the first lambda you see" If you use this reduction each time, you know that you will get the final form IF IT EXISTS. This is the reduction I implemented in the graphical animations on this page. For example, we can see that the redution of the previous expression (challenge 7) reaches the final form:

mo.carousel(
    [x.show_beta() or x for x in l_true(omega(omega))(one).reduction_chain()]
)

click to animate, move away and back to reset

Turing-completeness and the halting problem

Now for the really cool stuff. How do we know if a term will reach its normal form (in this case, we say the reduction "terminates") ? Well, we can't ! The only way is to keep applying the ~~outermost-leftmost~~ smart reduction and hope we reach the final form one day. The reason why it's impossible to know is that lambda-calculus is Turing-complete. We can do absolutely any computation with the right term at the start. You can take a program written in C, and transform it into a lambda-term that will do exactly the same thing. This is not a joke ! Someone even created compiler from c to lambda-calculus 🤯 And since we can't know if a program will stop or not, we can't know if a lambda-term will reach a normal form. If you don't know already why it is the case, go watch this video on the Halting problem:

Bonus content

This part if for the lambda-calculus hypsters.

pred = L("n", "f", "x").o(
    "n",
    L("g", "h").o("h", o("g", "f")),
    L("u").o("x"),
    L("u").o("u"),
)
pred

pred(two).reduce()

pair = L("a", "b", "x").o("x", "a", "b")
first = L("x").o("x", l_false)
second = L("x").o("x", l_true)
pair

sum = L("n", "m", "f", "x").o(o("n", "f"), o("m", "f", "x"))
sum(four)(three).reduce()

fib_pair = (
    L("p").o(pair, o(sum, o(first, "p"), o(second, "p")), o(first, "p"))
).reduce()
fib_pair

five = succ(four)
five(fib_pair)(pair(one)(zero)).reduce()

fact = L("n", "f").o(
    "n",
    L("f", "n").o("n", o("f", L("f", "x").o("n", "f", o("f", "x")))),
    L("x").o("f"),
    L("x").o("x"),
)
fact

fact(four).reduce()

# the Y combinator
y = L("f").o(
    L("g").o("f", o("g", "g")),
    L("g").o("f", o("g", "g")),
)
y

mo.carousel([x for x in fact(three).reduction_chain()])

div2 = (
    L("n")
    .o(
        "n",
        L("p").o(
            o(second, "p"),
            o(pair, o(first, "p"), l_true),
            o(pair, o(succ, o(first, "p")), l_false),
        ),
        o(pair, zero, l_false),
    )
    .reduce()
)
div2

# divide three by 2 in 51 steps
mo.carousel([x for x in div2(three).reduction_chain()])

pack = L("f", "p").o("f", o(first, "p"), o(second, "p")).reduce()
unpack = L("f", "a", "b").o("f", o(pair, "a", "b"))
pack

len(list((div2(succ(succ((four))).reduce()).reduction_chain())))

154

div2(four).reduce()

mult = L("n", "m", "f", "x").o("n", o("m", "f"), "x")
succ = L("n", "f", "x").o("f", o("n", "f", "x"))
is_not_zero = L("n").o("n", L("f").o(l_true), l_false).reduce()

syracuse = y(
    L("F", "n").o(
        o(
            L("half_and_reminder").o(
                o(is_not_zero, o(first, "half_and_reminder")),
                one,
                o(
                    pair,
                    "n",
                    o(
                        "F",
                        o(
                            o(second, "half_and_reminder"),
                            # even: n / 2
                            o(first, "half_and_reminder"),
                            # odd: 3 * n + 1
                            o(succ, o(mult, three, "n")),
                        ),
                    ),
                ),
            ),
            o(div2, "n"),
        )
    )
)
syracuse

# run at your own risks ...
# syracuse(five).reduce()