Simplicity: Holes and Side Effects
Blockstream Research Simplicity Language

Simplicity: Holes and Side Effects

Andrew Poelstra

Unlike most programming languages, Simplicity expressions are constructed from a series of combinators, which build up expressions from smaller ones. Such expressions represent functions taking some input and mapping it to some output.

Here is the simplest Simplicity program:

main := iden

This program takes an empty input and produces an empty output. Let’s do something more interesting, taking a bit and inverting it:

   bit0 := injl unit : 1 * 1 -> 2
   bit1 := injr unit : 1 * 1 -> 2

   padR := pair iden unit     : 2 -> 2 * 1 
   bitflip := case bit1 bit0  : 2 * 1 -> 2
   main := comp padR bitflip  : 2 -> 2

Here the final Simplicity expression is our main function. By reading from bottom to top, we can see that it is the composition of several smaller functions, eventually grounding out in unit combinators. Unit combinators produce an empty output. Along with iden, which takes an arbitrary input and returns it unchanged, and witness, which outputs an externally-provided value, unit combinators form the basis for every Simplicity expression.

In this code snippet, we’ve provided type annotations to help indicate what data is flowing from each part of the expression to the next. Here “1” represents the unit type, which is what we’ve been referring to as an “empty output”. “2” represents a bit, which may take the value 0 or 1.

Don’t worry if you didn’t follow this snippet in any detail. It’s only here to show visually what Simplicity code looks like. Later in this post, we’ll analyze a more interesting program more carefully.

A Specification Language


Ultimately, all programming languages express the idea of combining computations to build larger ones. Bitcoin Script (like Forth, or PostScript) does this by means of a stack machine, in which each computation manipulates a list of data objects in turn. Ethereum’s EVM uses a function call architecture where code can call other blocks of code (possibly in other contracts), which take data as input and return data as output. This is also the model used by pretty-much all mainstream programming languages today.

Programming languages use side effects to interact with the outside world, such as the transaction being spent. In EVM, contracts also have access to key-value data stores, which allow them to maintain state across transactions without explicitly passing it.

In both Bitcoin Script and EVM, bit inversion can be done with a single opcode (though the Bitcoin version can take many different inputs, most of which do not resemble bits, while the EVM version acts bitwise but always on 256 bits at once). It’s reasonable to ask why the Simplicity model leads to such verbosity for such simple things.

Our goal with Simplicity is to formally specify the semantics of programs, both in terms of what functions they represent, and what computational resources are involved in evaluating them. It would be possible, in principle, to formally specify a language which operates like Bitcoin Script or EVM, but the task would be extremely difficult, and the result would be nearly as difficult for anybody to verify. Instead, with Simplicity, we defined a small set of basic combinators, each of which reflects a mathematically fundamental way of combining functions. The resulting language’s semantics are so small that they fit on a T-shirt.

In practice, for things like bit inversion, this code will only be written once, then reused. In fact, we’ve done the writing for you and built it into the language. When writing your own code, you can simply use the shortcut jet_not in place of the above code. We will talk more about jets in the next section.

Eventually, in practice, almost nobody will be writing Simplicity code at all. Simplicity provides a well-defined foundation for blockchain programs, but it is just that: a foundation. The real work will be done in higher-level languages, which compile down to Simplicity code alongside proofs of their correct operation.

Programs Versus Expressions


Expressions that have trivial input and output values have a special name. We call these expressions programs. Mathematically speaking, only one function exists mapping a trivial input to a trivial output, so it’s not obvious that such Simplicity expressions would be useful at all. But not only are they useful, they’re the only such expressions allowed on the blockchain!

So, what is the strategy? There are actually two: one is that the Simplicity programs committed in addresses have holes in them, which are only filled in when coins are actually spent; the other is that Simplicity programs have side effects, which allow them to access blockchain data or abort early, thereby stepping outside the bounds of pure mathematical functions.

There are two ways to specify holes in Simplicity programs: the disconnect and witness combinators. disconnect is a bit subtle, and we have talked about it in the past. But witnesses are very simple: the witness combinator returns a value of a given type, called a witness, (such as a digital signature, hash preimage, or specific choice of signing keys).

There are also two ways that Simplicity supports side effects: one is by introspecting transaction data (which we will also discuss more in a later post), and the other is by assertions. Assertions halt the program execution. In Bitcoin Script, the VERIFY opcode is used for assertions, and allows the program to fail if it encounters an invalid signature or other data. EVM uses the STOP opcode for the same purpose.

We can now see how a program whose input and output types are nominally trivial, can actually effect a useful computation: witnesses serve the purpose of program inputs, while assertions provide a “pass/fail” program output.

Let’s consider another example. Here is a program that takes two witnesses and checks them for equality:

-- Witnesses
wit1 := witness : 1 -> 2^32 
wit2 := witness : 1 -> 2^32 

-- Program code
pr1 := pair wit1 wit2 : 1 -> 2^64
jt2 := jet_eq_32      : 2^64 -> 2
cp3 := comp pr1 jt2   : 1 -> 2
jt4 := jet_verify     : 2 -> 1

main := comp cp3 jt4  : 1 -> 1

We can read this program from bottom to top: the program itself is a single expression called main, of type 1->1. Here the “type” of an expression indicates what kind of values it takes as input and output. The 1 type has only a single value, so describes no information. The 2 type has two possible values, so it can represent a single bit, 2^32 is a 32-bit string, and so on.

From the bottom equality, we see main is the composition of the expressions with names cp3 and jt4. Then cp3 is itself the composition of pr1 and jt2, and so on. There are five combinators used in this program:

  • comp, or composition, takes two expressions and evaluates the second with the first’s output as input.

  • pair takes two expressions and produces a new expression whose output is a pair of values: the left part is the output of the first expression, and the right part the output of the second.

  • witness takes no input but has a polymorphic type as its output: whatever type is needed for the program to make sense, that’s its type. And the value it yields is one provided at evaluation time.

  • jet_eq_32 is a jet, which means it is a single combinator that replaces a larger Simplicity expression. The replaced expression is called its specification, and a Simplicity interpreter can evaluate the jet either by evaluating the specification, or by using optimized machine code which has the same effect. This particular jet takes two 32-bit values and outputs a single bit, indicating whether the two values were equal (1) or unequal (0).

  • jet_verify is also a jet. This one takes a single bit as input, and if that bit is 0, it aborts the program. It has no output.

Conclusion


This was a dense blog post in which we saw what Simplicity code looks like, and observed that even simple-seeming functionality can look pretty complex when it’s written out explicitly. But this complexity reflects what’s actually going on inside. As a specification language, Simplicity allows us to make everything explicit, so that we can mathematically model it, prove the essential properties that we need for our applications, and then encapsulate it, confident that we have a rock-solid foundation for the things we believe about our code.

Simplicity has several more combinators and many jets, which we’ll talk about in future posts. As we build up the language, we’ll explore more practical examples and look at what it means to “prove” properties of programs.

My next post will introduce the concept of sharing, which allows two identical expressions to be merged into one, and introduces some subtleties related to what it means to be “identical”.

Join the Simplicity discussions on GitHub to ask questions, connect with the community, and share your insights. And follow us at @blksresearch on Twitter to stay up to date with the latest in Simplicity blog post releases.


If you have specific preferences, please, mark the topic(s) you would like to read: