Not all compilers are created equal
Types, restrictions and guarantees
Coercing string and numbers can be really handy when creating messages, but as the epic talk WAT by Gary Bernhardt shows, it can get weird really fast which can lead to unexpected errors.
In contrast, statically typed languages such as TypeScript or PureScript make us think about types explicitly. Most languages will infer most of the types so we don’t have to be too verbose, but at some point, we’ll have to provide some information about the data we want to compute, and how we are going to compute it. That information will help other programmers (or even our future self) understand the code, and it will allow our tooling to give us information and warnings, apply automatic fixes and even assist with refactoring. If there is a problem with the program, we’ll have an error at compile time, so the feedback loop will be shorter.
Each language can introduce different restrictions that impact the way we program. These restrictions will give us certain guarantees that will increase our confidence in the code. For example, if the language doesn’t allow us to use null, we’ll have a guarantee that we won’t have NullPointerExceptions, the billion dollar mistake, and we’ll probably need a different concept to represent failure or emptiness.
TypeScript vs PureScript
TypeScript took off in 2015 when Angular announced that it was building its second version with it. The decision to closely follow JS, the developer experience from using tools like VSCode and the confidence given by embracing its restrictions, encouraged other teams to rewrite big projects like Vue, Jest and Yarn. According to the State of JS 2018, TypeScript adoption doubled from 2016 to 2018. All of this resulted in an explosion of learning resources and a big, healthy ecosystem.
PureScript is not that popular in comparison, but functional programming, in general, has caught the eyes of many developers. Languages like PHP or Java added lambda expressions which enable the use of higher-order patterns, and the popularity of libraries like React or Redux helped people adopt pure functions and immutability. Other languages such as Elm have bigger communities and are a really good starting point in the functional world, but PS has some nice features that we’ll analyze in the post. Despite being small, the PureScript community is very active in the functional programming slack (#purescript channel) and in its discourse page.
Dissecting the output
In this section, we’ll analyze the result of the compiled output of a simple hello world module. In both cases, we’ll export a main function that prints two lines in the console and uses a helper private function. The TypeScript source pretty much resembles the compiled output. Notice that the type information is removed and some module code is added, but other than that, the code is the same.
TypeScript has many compiler options that can increase or decrease the strictness level and change how the output is constructed. For example, the target option, which defaults to es5, allows us to use newer language features such as arrow functions, async-await and destructuring in older browsers. Another option is module, which we can use to best suit our build process. By default, it uses commonjs, which is the default module loader in Node and it can also serve as the input for Browserify, Webpack or Parcel. If we set the option to es6, then the output will resemble the input even more because we are using es6 imports, that can be later fed to tools like rollup.
The compiled code doesn’t look like something we’d write as our first choice. It has a lot of weird words like Semigroup, Bind and Effect and we can see it has an extra level of indirection inside the main function, where we first create a computation using Effect_Console.log(“Hello”), and then immediately execute it using (). This indirection is due to a restriction imposed by the language. As its name implies, PureScript code must be pure. It’s not obvious here, but this restriction will allow us to compose and extend our computations, building complex features out of simpler ones.
The pureness restriction gives us powerful guarantees. We said both examples do exactly the same, and at this moment they do nothing (at least not by themselves). In both cases, we are creating a module that exports a main function, and that’s it. If we want the code to actually run we should, at some point, call main(). In TypeScript we could’ve added the invocation in the same file, after all, it doesn’t impose us the purity restriction. PureScript, on the other hand, forbids us from doing it, thus it assures us that importing a module can’t result in executing unknown side effects, such as connecting to a database. A library such as colors could use the freedom JS/TS gives to “improve its syntax” by automatically patching the String.prototype when you import the library. Introducing new properties to String.prototype could seem innocuous at first, but as the smoosh gate showed us, it could become a problem.
Another common C convention is that identifiers such as PI, SOME_REGEX or API_URL are written in uppercase to indicate they are constant values (as if the const keyword wasn’t enough). Keep in mind that for complex types, constant values are not the same as immutable values. This example is overly verbose and could be simplified. The compiler can infer from the value that the type is number, so there is no need to be explicit, in here we’re just showing the complete syntax.
If we recall the exclaim function, we can notice that only the input was typed. It’s common in simple cases like this to omit the return type and let the inference system save our precious keystrokes. But we could add the type explicitly to work as a post-condition, making sure the compiler fails if we have some discrepancy.
We need to provide explicit types for the input of a top-level function, if we don’t, the compiler will infer the unsafe type any. This can lead to errors as the any silently propagates, which is why TS added a strictness option called no-implicit-any that will throw an error. To increase developer productivity through tooling, in version 3.2 TypeScript added a quick fix to its language services to suggest a type from the function usage.
Given its rationale, TypeScript has a lot of flexibility in the ways we can write functions and express their types. In the following example, exclaim1 and exclaim2 are analogous. There are many places where you have to add a function type definition, and it can be confusing knowing which syntax to use.
In the following example, functions sub and div are also analogous, but the later is written using arrow functions which is more concise. Receiving two parameters makes these functions harder to compose. So for mul we decided to take one argument at a time, which enable us to create new functions like times2 from it.
The downside of having mul written like this is that it seems weird when we want to call it with both arguments: mul(2)(4). If we want the best of both worlds, we can use a curry function like ramda’s, but it also has some limitations in TS, as it does not work with generic functions.
PureScript, like Elm and Haskell, has a Hindley-Milner based type system which is well suited for a functional language, and makes easier the transition between them. We can notice that the type information is placed above using “::” to separate the identifier from its type, and in a new line we use “=” to separate the identifier from its value. Even if the compiler can infer the type correctly, PS will warn us if we don’t provide explicit information for all top-level expressions.
Being focussed on correctness, the primitive types makes the distinction between float numbers and integers. Also, notice that we don’t need the const or let keyword and that we write pi in lowercase as we have the guarantee that all data is immutable.
When we describe functions, the types are also written above the function implementation, decoupling the parameter name from its type. We use an arrow to separate the input from the output, so a type like “String → String” means “A function that given a string, returns a string”. If we don’t know the output type we can use an underscore to produce a warning like “Wildcard type definition has the inferred type String”
Unlike TypeScript, there is only one way to define a function type, which resembles the arrow function way in TS. All functions are automatically curried without the generic limitation, so we can create times2 just like before. By partially applying the number 2 to mul we change the signature “Number → Number → Number” into “Number → Number”.
A big syntax difference from C–family languages is that the function application is not done surrounding the parameters with parenthesis, it is done by separating them with a space, so the PS expression “mul 2 4” its the same as the TS expression “mul(2)(4)”. It can be confusing at first, but it enables clearer syntax, as we’ll see in the next section.
Also notice that in both versions of “times2”, the b parameter is implicit. This technique is called point-free programming, which can save us the keystrokes of doing something like “const times2 = b => mul(2)(b)”. This is a powerful technique, but it shouldn’t be abused as there are times where it can reduce the legibility.
A language made for composition
In this section, we’ll leave TypeScript to rest for a bit and focus on what makes PureScript a language made with composition in mind. Let’s recall the main function from the section “dissecting the output”. There are three things we haven’t talked about: A special symbol “do”, a not so special symbol “$”, and the type of main, which doesn’t look like a function.
PureScript has a language feature called do notation that does different things depending on the underlying type. We could write a whole post describing it in detail, but for now, let’s just say it’s a way for us to call one effectful computation after the other in a way that resembles imperative programming.
To help us investigate $ and Effect Unit we’ll use the REPL to see the type of an expression and the kind of type. We need to have pulp installed and then execute “pulp repl”. Using the :t command we can see that log is a function that receives a String and returns an Effect Unit, the type of our main “function”.
All the expressions inside “do” must return an Effect Unit. The first call to log is trivial but the second poses a problem, as we want to log the exclaimed string. Given that function application is done using a space, if we write the expression log exclaim “World”, the compiler will throw an error because it understands that we are passing two arguments to a function that only accepts one. There are three common ways to write the expression that we want: With parenthesis, with apply ($) and with applyFlipped (#).
> :t log "Hello" Effect Unit > :t log exclaim "World" Error found: Could not match type String -> String with type String > :t log (exclaim "World") Effect Unit > :t log $ exclaim "World" Effect Unit > :t exclaim "World" # log Effect Unit
The symbols $ and # are not language features, they are just normal functions called apply and applyFlipped respectively and they are defined in the standard library Prelude. The special feature is that we can define an infix operator for any function of two arguments. As the documentation says, applylets you omit parenthesis in some cases, making the code easier to read.
Looking at the source code, the implementation is pretty straight forward, but the types could use some explanation as these are the first abstract functions that we see. If we look at apply, the first part declares two type variables “a”and “b” that could be any concrete type. Then we receive two arguments, a function “f” that goes from (a → b) and a value “x” of type “a”. If we use log as our “f”, we can substitute the types to see that “a” will be of type String, and “b” will be Effect Unit. The implementation is just applying the argument “x” to the function “f”. Notice that applyFlipped is the same, but it first receives the value and then the function.
apply :: forall a b. (a -> b) -> a -> b apply f x = f x infixr 0 apply as $ applyFlipped :: forall a b. a -> (a -> b) -> b applyFlipped x f = f x infixl 1 applyFlipped as #
Once again, there is nothing special with $ and #, the language decisions that make this possible are: function application is done with a space, parenthesis only serve to define precedence and any function of two arguments can be infixed. This is a very powerful concept that Guy Steele describes in his talk growing a language, it involves well-thought syntax primitives that can compose into more complex constructs and it can be eventually be used to define a Domain Specific Language.
Now that all the expressions inside do return the same type, let’s go back to the REPL to understand what the type means. We can use the :k command to inspect the kind of a type. For example, Unit and Number are regular types, but Effect and Array are type constructors. A type constructor is a function for types instead of values, hence the similar syntax “Type → Type”. The constructor can be applied to a type using a space (just like normal function application), so Array Number and Effect Unit will have the same kind “Type”. The type Unit comes from the word “unit of work” and it’s analogous to void in TypeScript.
> :k Number Type > :k Unit Type > :k Effect Type -> Type > :k Array Type -> Type > :k Effect Unit Type > :k Array Number Type
We can think of Array as a simple data structure or we can think of it as a way to express a computation of multiple values. In the same way, we can think of Effect as a computation that modifies the world. Strict functional languages have the restriction to be pure, which enables a whole set of guarantees, but a programs main goal is to modify the world in some way, either by reading a file, mutating the DOM, etc. We can cope with this limitation by working with types that represent the effectful computations.
As we saw in the section “dissecting the output”, all Effects were compiled to functions, adding an extra level of indirection. This allows us to compose those computations before actually running them. In the first eight minutes of his talk “Constraints Liberate, Liberties Constrain”, Runar Bjarnason gives one of the best explanations of this concept that I’ve seen.
If we are gonna work with explosives, it’s easier to work with the TNT than the exploded pieces.
and it also has this quote from David J. Wheeler
We can solve any problem by introducing an extra level of indirection.
A nice thing about expressing your computations this way is that you can encode what you want to do and some notion of how you want to do it, all in the type system. And we can create our programs as a combination of multiple computations like this:
- Effect Unit: An effectful computation that changes the world in some way, synchronous writing a file, mutating the DOM, etc
- Array Student: A computation of multiple Students
- Maybe User: A computation that may resolve in a user or may be empty
- Either String Prime: A synchronous computation that can resolve to a prime number or fail with a string message
- Aff BlogPost: An asynchronous effectful computation that can resolve to a blog post
- State AST Number: An stateful computation that works with an AST and returns a Number
Plug: LogRocket, a DVR for web apps
LogRocket is a frontend logging tool that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.