Making my life easier with GADTs
In which I showcase one concrete example where dependent types — and more precisely, Generalized Algebraic Datatypes (GADTs) — allow me to turn the handling of different but similar compiler targets into a breeze of fresh air.
Introduction
Dependent types and assimilated type-level features get a bad rep.
They are often misrepresented presented as a futile toy for
“galaxy-brain people”, providing no benefit to the regular programmer. I
think this opinion can stem from a severe
misconception about the presumed complexity of dependent type systems.
Not only can simple dependent type theories be very intuitive, they are
fully opt in. You can write regular “simply-typed” programs in them, and
use dependent types only when necessary. Perhaps another explanation for
the confusion could be the lack of resources demonstrating the
usefulness of such features — when used sparingly — on
day-to-day programming tasks.
(I've toned down the wording a bit, as I think it was being unjustifiably harsh on people having reservations about dependent types. There are very valid reasons to stay cautious about this style of programming. Especially in languages with partial support. See this post from Matt Parsons in reply to my article.)
I am by no means a gifted programmer. Competent at the very best. Ever since I’ve been introduced to dependent type systems, it has completely changed the relationship I entertain with my compilers. Types are the language with which I hold conversations with them. I tell them my intentions, and they reply back! Their job is to keep me in check and ensure that I am a man of my word. And it does it pretty well, when I’m talking clear enough.
Today I want to showcase one such time when my typechecker became a very helpful assistant, reminding me to take care of corner cases without fault.
Let’s set the scene. This January, I am interning at IOG, working on a new Agda backend targetting the Lambda Box (λ□) intermediate language. To the uninitiated,
- Agda is a dependently-typed programming language.
- λ□ is an intermediate language very similar to the Lambda calculus. It was introduced in Pierre Letouzey’s PhD thesis, and is now formally defined as part of the MetaCoq project.
The backend I’m writing is just a program — written in Haskell — that takes as input the internal representation of Agda programs, and outputs λ□ programs. A compiler of sorts.
Given an AgdaProgram
, my backend generates a λ□
environment. Simplifying a bit, an environment contains named
function and datatype declarations. Here is how I
encode those in Haskell.
data GlobalEnv = GlobalEnv [(Name, GlobalDecl)]
data GlobalDecl
= DataDecl DataBody
| FunDecl FunBody
data DataBody = DataBody
indConstructors :: [ConstructorBody]
{
}
data ConstructorBody = ConstructorBody
ctorName :: Name
{ ctorArgs :: Int
,
}
data FunBody = FunBody
funBody :: LamBox.Term
{ }
So a function in the environment stores its expression body — a
LamBox.Term
— and a datatype declaration tells which
constructors come with it, and their respective number of arguments.
Morally, the heart of the backend lies in a function like this:
compile :: AgdaProgram -> IO GlobalEnv
Introducing Typed λ□
There is also a typed variant of λ□, whose environments contain type information. This typed intermediate language enables compilation down to targets such as Rust. In the official formalization of this variant, typed environments are completely distinct from untyped ones. If we did something similar, we would get the following:
data TGlobalEnv = TGlobalEnv [(Name, TGlobalDecl)]
data TGlobalDecl
= TDataDecl TDataBody
| TFunDecl TFunBody
| TTypeAliasDecl TTypeAliasBody
data TDataBody = TDataBody
indConstructors :: [TConstructorBody]
{
}
data TConstructorBody = TConstructorBody
ctorName :: Name
{ ctorArgs :: Int
, ctorTypes :: [LamBox.Type]
,
}
data TFunBody = TFunBody
funBody :: LamBox.Term
{ funType :: LamBox.Type
,
}
type TTypeAliasBody = Lambox.Type
Now the big question: how do I structure the backend to support both compilation targets — typed and untyped λ□ — without hurting my brain?
Pick your poison
I could indeed keep typed environments completely separate. Different datatypes, different information. But this would lead to a lot of code duplication. Given that the compilation logic will be mostly identical for these two targets, I don’t want to be responsible for the burden of keeping both implementations in sync.
compileUntyped :: AgdaProgram -> IO GlobalEnv
compileTyped :: AgdaProgram -> IO TGlobalEnv
Another way around would be to only define the typed environment — as
it contains strictly more information than its untyped
counterpart — but make all type information optional. One
simple way to do so is to wrap all the type annotations in
Maybe
s. Then, both compilation targets can be handled by
the same program.
data GlobalEnv = GlobalEnv [(Name, GlobalDecl)]
data GlobalDecl
= DataDecl DataBody
| FunDecl FunBody
| TypeAliasDecl TypeAliasBody
data DataBody = DataBody
indConstructors :: [ConstructorBody]
{
}
data ConstructorBody = ConstructorBody
ctorName :: Name
{ ctorArgs :: Int
, ctorTypes :: Maybe [LamBox.Type]
,
}
data FunBody = FunBody
funBody :: LamBox.Term
{ funType :: Maybe LamBox.Type
,
}
type TypeAliasBody = Lambox.Type
But here’s the problem: I — the programmer — need to be careful that
I don’t forget to put the type information when targetting typed λ□. If
I put Nothing
where a type signature is expected, my
backend will proceed happily. But the generated programs will be
incomplete programs, and thus invalid.
How do I make sure that my compiler gets in the way to counter my lack of discipline? How can I have it force me to consistently put the type annotations when we are targetting typed λ□?
Dependent types to the rescue.
Type discipline
I begin by defining the two typing modes my backend supports:
{-# LANGUAGE DataKinds, GADTs #-}
data Typing = Typed | Untyped
Then, I introduce the two targets of the backend. A
Target
is indexed by the typing mode, which makes
the mode we’re in appear at the type level.
import Data.Kind (Type)
data Target :: Typing -> Type where
ToTyped :: Target Typed
ToUntyped :: Target Untyped
I can define a wrapper for values that should be available during the
Typed
compilation mode.
data WhenTyped :: Typing -> Type -> Type where
None :: WhenTyped Untyped a
Some :: a -> WhenTyped Typed a
Again, this datatype is indexed by the mode, which means:
None
can only construct a value of typeWhenTyped t a
whent
is equal toUntyped
.Some x
can only construct a value of typeWhenTyped t a
whent
is equal toTyped
.
Or the other way around:
WhenTyped Untyped a
has exactly one inhabitant:None
.WhenTyped Typed a
has for inhabitants values of the formSome x
forx :: a
.
When we are in Typed
mode, we can always
retrieve the wrapped value.
getTyped :: WhenTyped Typed a -> a
Some x) = x getTyped (
Note that the type system is already helping us here! It knows for
sure that the function is total. If we add a clause for the
None
case, it throws a warning:
None = undefined getTyped
Pattern match is redundant
Finally, let’s have a way to run a computation if and only if the target is in type mode:
whenTyped :: Applicative m => Target t -> m a -> m (WhenTyped t a)
ToUntyped _ = pure None
whenTyped ToTyped x = Some <$> x whenTyped
Now we can go back to defining environments.
A function in the environment always has a body. But it only has a
type annotation in Typed
mode.
data FunBody t = FunBody
funBody :: LamBox.Term
{ funType :: WhenTyped t LamBox.Type
, }
Same thing for datatype constructors:
data DataBody t = DataBody
indConstructors :: [ConstructorBody t]
{
}
data ConstructorBody t = ConstructorBody
ctorName :: Name
{ ctorArgs :: Int
, ctorTypes :: WhenTyped t [LamBox.Type]
, }
Finally, we get environments:
type TypeAliasBody = LamBox.Type
data GlobalEnv t = TGlobalEnv [(Name, GlobalDecl t)]
data GlobalDecl :: Typing -> Type where
FunDecl :: FunBody t -> GlobalDecl t
DataDecl :: DataBody t -> GlobalDecl t
TypeAliasDecl :: TypeAliasBody -> GlobalDecl Typed
Note that type alias declarations are only allowed in
Typed
mode.
Then, our only compile function is updated accordingly:
compile :: Target t -> AgdaProgram -> IO (GlobalEnv t)
The kind of environment we get out is fully determined by the kind of target we get in! If we immediately pattern-match on the target, the Haskell typechecker kindly informs us of the expected return type in each branch:
compile :: Target t -> AgdaProgram -> IO (GlobalEnv t)
=
compile target prog case target of
ToUntyped -> _untyped
ToTyped -> _typed
• Found hole: _untyped :: IO (GlobalEnv Untyped)
• Found hole: _typed :: IO (GlobalEnv Typed)
And that’s exactly what I was looking for! It immediately prevents common mistakes, say, like generating a type alias declaration when targetting untyped λ□.
compileAlias :: AgdaDefinition -> IO (GlobalDecl Untyped)
= return $ TypeAliasDecl ... compileAlias def
• Couldn't match type ‘Typed’ with ‘Untyped’
Expected: IO (GlobalDecl Untyped)
Actual: IO (GlobalDecl Typed)
What’s more, it allows me to keep most of the compilation logic the same, and only conditionally do more work for the typed target.
For example, when compiling an Agda function to a λ□ function body:
compileTerm :: AgdaTerm -> IO LamBox.Term
compileType :: AgdaTerm -> IO LamBox.Type
compileFunction :: Target t -> AgdaFunction -> IO (FunBody t)
= do
compileFunction target fn <- compileTerm (funBody fn)
bdy <- whenTyped target $ compileType (funType fn)
typ return (FunBody bdy typ)
I never explicitely match on the target, yet compilation of
the function type only happens when targetting typed λ□, thanks to our
convenient whenTyped
utility.
Recall that typ
must have the following type:
typ :: WhenTyped t LamBox.Type
When targetting untyped λ□, its value is None
and no
type compilation happens. Simply wonderful.
Note that using a full-fletched dependently-typed language like Idris2 would make the
implementation even simpler! There would be no need for
Target
, we could use Typing
directly at the
value level and the type level.
And that’s it for today! I’m not breaking new grounds here, but I feel like the world is missing short and sweet practical examples in which sprinkles of dependent types help you avoid mistakes — for free!
Any feedback appreciated, as always.
I actually do have one thing to add! One common misconception about GADTs is that it’s tedious if not impossible to construct their values, by coming up with the type index out of thin air. But that’s what existential types are for!
In my backend, the target is chosen from a command line flag. Here is the datatype for compilation options:
data Options = forall t. Options
optTarget :: Target t
{-- ...
}
The constructor Options
is existentially
quantified over t
, the target mode. This means that even
though the target mode does not show up in the type
Options
, any time we pattern-match on the
constructor Options
, the target mode will suddenly
become available.
The rest is fairly standard:
defaultOptions :: Options
= Options { optTarget = ToUntyped }
defaultOptions
typedOpt :: Flag Options
= pure opts { optTarget = ToTyped }
typedOpt opts
run :: Options -> AgdaProgram -> IO ()
Options optTarget) prog = do
run (<- compile optTarget prog
env
-- do something with the environment
-- like write it to file
return ()
Inside run
, env
has type
GlobalEnv t
, where t
comes from matching on
Options
. optTarget
itself has type
Target t
. What we cannot do is return the
environment, as run
couldn’t be typed as is, but
that doesn’t prevent us from doing useful things with the compiled
environment before exiting!
Now that’s really it. So long! Best.