Making my life easier with GADTs
In which I showcase one concrete example where dependent types — and more precisely, Generalized Algebraic Datatypes (GADTs) — allow me to turn the handling of different but similar compiler targets into a breeze of fresh air.
Introduction
Dependent types and assimilated type-level features get a bad rep.
They are often misrepresented presented as a futile toy for
“galaxy-brain people”, providing no benefit to the regular programmer. I
think this opinion can stem from a severe
misconception about the presumed complexity of dependent type systems.
Not only can simple dependent type theories be very intuitive, they are
fully opt in. You can write regular “simply-typed” programs in them, and
use dependent types only when necessary. Perhaps another explanation for
the confusion could be the lack of resources demonstrating the
usefulness of such features — when used sparingly — on
day-to-day programming tasks.
(I've toned down the wording a bit, as I think it was being unjustifiably harsh on people having reservations about dependent types. There are very valid reasons to stay cautious about this style of programming. Especially in languages with partial support. See this post from Matt Parsons in reply to my article.)
I am by no means a gifted programmer. Competent at the very best. Ever since I’ve been introduced to dependent type systems, it has completely changed the relationship I entertain with my compilers. Types are the language with which I hold conversations with them. I tell them my intentions, and they reply back! Their job is to keep me in check and ensure that I am a man of my word. And it does it pretty well, when I’m talking clear enough.
Today I want to showcase one such time when my typechecker became a very helpful assistant, reminding me to take care of corner cases without fault.
Let’s set the scene. This January, I am interning at IOG, working on a new Agda backend targetting the Lambda Box (λ□) intermediate language. To the uninitiated,
- Agda is a dependently-typed programming language.
- λ□ is an intermediate language very similar to the Lambda calculus. It was introduced in Pierre Letouzey’s PhD thesis, and is now formally defined as part of the MetaCoq project.
The backend I’m writing is just a program — written in Haskell — that takes as input the internal representation of Agda programs, and outputs λ□ programs. A compiler of sorts.
Given an AgdaProgram, my backend generates a λ□
environment. Simplifying a bit, an environment contains named
function and datatype declarations. Here is how I
encode those in Haskell.
data GlobalEnv = GlobalEnv [(Name, GlobalDecl)]
data GlobalDecl
= DataDecl DataBody
| FunDecl FunBody
data DataBody = DataBody
{ indConstructors :: [ConstructorBody]
}
data ConstructorBody = ConstructorBody
{ ctorName :: Name
, ctorArgs :: Int
}
data FunBody = FunBody
{ funBody :: LamBox.Term
}So a function in the environment stores its expression body — a
LamBox.Term — and a datatype declaration tells which
constructors come with it, and their respective number of arguments.
Morally, the heart of the backend lies in a function like this:
compile :: AgdaProgram -> IO GlobalEnvIntroducing Typed λ□
There is also a typed variant of λ□, whose environments contain type information. This typed intermediate language enables compilation down to targets such as Rust. In the official formalization of this variant, typed environments are completely distinct from untyped ones. If we did something similar, we would get the following:
data TGlobalEnv = TGlobalEnv [(Name, TGlobalDecl)]
data TGlobalDecl
= TDataDecl TDataBody
| TFunDecl TFunBody
| TTypeAliasDecl TTypeAliasBody
data TDataBody = TDataBody
{ indConstructors :: [TConstructorBody]
}
data TConstructorBody = TConstructorBody
{ ctorName :: Name
, ctorArgs :: Int
, ctorTypes :: [LamBox.Type]
}
data TFunBody = TFunBody
{ funBody :: LamBox.Term
, funType :: LamBox.Type
}
type TTypeAliasBody = Lambox.TypeNow the big question: how do I structure the backend to support both compilation targets — typed and untyped λ□ — without hurting my brain?
Pick your poison
I could indeed keep typed environments completely separate. Different datatypes, different information. But this would lead to a lot of code duplication. Given that the compilation logic will be mostly identical for these two targets, I don’t want to be responsible for the burden of keeping both implementations in sync.
compileUntyped :: AgdaProgram -> IO GlobalEnv
compileTyped :: AgdaProgram -> IO TGlobalEnvAnother way around would be to only define the typed environment — as
it contains strictly more information than its untyped
counterpart — but make all type information optional. One
simple way to do so is to wrap all the type annotations in
Maybes. Then, both compilation targets can be handled by
the same program.
data GlobalEnv = GlobalEnv [(Name, GlobalDecl)]
data GlobalDecl
= DataDecl DataBody
| FunDecl FunBody
| TypeAliasDecl TypeAliasBody
data DataBody = DataBody
{ indConstructors :: [ConstructorBody]
}
data ConstructorBody = ConstructorBody
{ ctorName :: Name
, ctorArgs :: Int
, ctorTypes :: Maybe [LamBox.Type]
}
data FunBody = FunBody
{ funBody :: LamBox.Term
, funType :: Maybe LamBox.Type
}
type TypeAliasBody = Lambox.TypeBut here’s the problem: I — the programmer — need to be careful that
I don’t forget to put the type information when targetting typed λ□. If
I put Nothing where a type signature is expected, my
backend will proceed happily. But the generated programs will be
incomplete programs, and thus invalid.
How do I make sure that my compiler gets in the way to counter my lack of discipline? How can I have it force me to consistently put the type annotations when we are targetting typed λ□?
Dependent types to the rescue.
Type discipline
I begin by defining the two typing modes my backend supports:
{-# LANGUAGE DataKinds, GADTs #-}
data Typing = Typed | UntypedThen, I introduce the two targets of the backend. A
Target is indexed by the typing mode, which makes
the mode we’re in appear at the type level.
import Data.Kind (Type)
data Target :: Typing -> Type where
ToTyped :: Target Typed
ToUntyped :: Target UntypedI can define a wrapper for values that should be available during the
Typed compilation mode.
data WhenTyped :: Typing -> Type -> Type where
None :: WhenTyped Untyped a
Some :: a -> WhenTyped Typed aAgain, this datatype is indexed by the mode, which means:
Nonecan only construct a value of typeWhenTyped t awhentis equal toUntyped.Some xcan only construct a value of typeWhenTyped t awhentis equal toTyped.
Or the other way around:
WhenTyped Untyped ahas exactly one inhabitant:None.WhenTyped Typed ahas for inhabitants values of the formSome xforx :: a.
When we are in Typed mode, we can always
retrieve the wrapped value.
getTyped :: WhenTyped Typed a -> a
getTyped (Some x) = xNote that the type system is already helping us here! It knows for
sure that the function is total. If we add a clause for the
None case, it throws a warning:
getTyped None = undefinedPattern match is redundant
Finally, let’s have a way to run a computation if and only if the target is in type mode:
whenTyped :: Applicative m => Target t -> m a -> m (WhenTyped t a)
whenTyped ToUntyped _ = pure None
whenTyped ToTyped x = Some <$> xNow we can go back to defining environments.
A function in the environment always has a body. But it only has a
type annotation in Typed mode.
data FunBody t = FunBody
{ funBody :: LamBox.Term
, funType :: WhenTyped t LamBox.Type
}Same thing for datatype constructors:
data DataBody t = DataBody
{ indConstructors :: [ConstructorBody t]
}
data ConstructorBody t = ConstructorBody
{ ctorName :: Name
, ctorArgs :: Int
, ctorTypes :: WhenTyped t [LamBox.Type]
}Finally, we get environments:
type TypeAliasBody = LamBox.Type
data GlobalEnv t = TGlobalEnv [(Name, GlobalDecl t)]
data GlobalDecl :: Typing -> Type where
FunDecl :: FunBody t -> GlobalDecl t
DataDecl :: DataBody t -> GlobalDecl t
TypeAliasDecl :: TypeAliasBody -> GlobalDecl TypedNote that type alias declarations are only allowed in
Typed mode.
Then, our only compile function is updated accordingly:
compile :: Target t -> AgdaProgram -> IO (GlobalEnv t)The kind of environment we get out is fully determined by the kind of target we get in! If we immediately pattern-match on the target, the Haskell typechecker kindly informs us of the expected return type in each branch:
compile :: Target t -> AgdaProgram -> IO (GlobalEnv t)
compile target prog =
case target of
ToUntyped -> _untyped
ToTyped -> _typed• Found hole: _untyped :: IO (GlobalEnv Untyped)
• Found hole: _typed :: IO (GlobalEnv Typed)
And that’s exactly what I was looking for! It immediately prevents common mistakes, say, like generating a type alias declaration when targetting untyped λ□.
compileAlias :: AgdaDefinition -> IO (GlobalDecl Untyped)
compileAlias def = return $ TypeAliasDecl ...• Couldn't match type ‘Typed’ with ‘Untyped’
Expected: IO (GlobalDecl Untyped)
Actual: IO (GlobalDecl Typed)
What’s more, it allows me to keep most of the compilation logic the same, and only conditionally do more work for the typed target.
For example, when compiling an Agda function to a λ□ function body:
compileTerm :: AgdaTerm -> IO LamBox.Term
compileType :: AgdaTerm -> IO LamBox.Type
compileFunction :: Target t -> AgdaFunction -> IO (FunBody t)
compileFunction target fn = do
bdy <- compileTerm (funBody fn)
typ <- whenTyped target $ compileType (funType fn)
return (FunBody bdy typ)I never explicitely match on the target, yet compilation of
the function type only happens when targetting typed λ□, thanks to our
convenient whenTyped utility.
Recall that typ must have the following type:
typ :: WhenTyped t LamBox.TypeWhen targetting untyped λ□, its value is None and no
type compilation happens. Simply wonderful.
Note that using a full-fletched dependently-typed language like Idris2 would make the
implementation even simpler! There would be no need for
Target, we could use Typing directly at the
value level and the type level.
And that’s it for today! I’m not breaking new grounds here, but I feel like the world is missing short and sweet practical examples in which sprinkles of dependent types help you avoid mistakes — for free!
Any feedback appreciated, as always.
I actually do have one thing to add! One common misconception about GADTs is that it’s tedious if not impossible to construct their values, by coming up with the type index out of thin air. But that’s what existential types are for!
In my backend, the target is chosen from a command line flag. Here is the datatype for compilation options:
data Options = forall t. Options
{ optTarget :: Target t
-- ...
}The constructor Options is existentially
quantified over t, the target mode. This means that even
though the target mode does not show up in the type
Options, any time we pattern-match on the
constructor Options, the target mode will suddenly
become available.
The rest is fairly standard:
defaultOptions :: Options
defaultOptions = Options { optTarget = ToUntyped }
typedOpt :: Flag Options
typedOpt opts = pure opts { optTarget = ToTyped }
run :: Options -> AgdaProgram -> IO ()
run (Options optTarget) prog = do
env <- compile optTarget prog
-- do something with the environment
-- like write it to file
return ()Inside run, env has type
GlobalEnv t, where t comes from matching on
Options. optTarget itself has type
Target t. What we cannot do is return the
environment, as run couldn’t be typed as is, but
that doesn’t prevent us from doing useful things with the compiled
environment before exiting!
Now that’s really it. So long! Best.