LambdaBuffers Compiler
The Compiler component sits between the Frontend component and the code generation component named Codegen. The purpose of the Compiler is to perform various checks on the Compiler input provided by the Frontend ensuring that supplied type, class and instance clause definitions are valid or otherwise communicate any error conditions determined during processing.
The end goal of the Compiler is to ensure that the Codegen component is capable of processing the Compiler Output by providing correct and complete information.
Compiler Interface
The Compiler operates on the Compiler Input message specified in the compiler.proto Google Protocol Buffers schema - enabling, any Frontend to interface with the Compiler in a language agnostic manner.
Similarly, the Compiler Output message is also specified in the compiler.proto Google Protocol Buffers schema that is then consumed by Codegen modules, able to be written in any programming environment capable of communicating via Google Protocol Buffers.
Refer to the Compiler Proto documentation for more information.
Checking Type Definitions
The first step the Compiler performs is kind checking and inference on type definitions provided by the Frontend and otherwise raises kind checking errors.
When successful, the Compiler outputs a Compiler Output that annotates each type term with kind information.
In standard type checking terminology LambdaBuffers 'terms' are abstract data type declarations and their 'types' are kinds.
Currently, the Compiler accepts:
-
type terms of kind
Type
(such asInt
orBool
), -
type function terms of kind
Type → Type
(such asMaybe
orEither
- though note that type functions are not "first class" in the sense that they cannot be passed as arguments to other type functions).
There are future plans to expand this to Higher Kinded Types (such as MaybeT
,
StateT
etc - i.e. types parameterized on type function terms) - subject to
research into Codegen of such types in the target languages.
The Compiler supports recursive types.
All LambdaBuffers type
variables
terms must be monomorphically kinded, with polymorphic kinds defaulting to
monomorphic ones. For example Phantom a = Phantom
would resolve to the
monomorphic kind Type → Type
rather than the polymorphic kind ∀a. a → Type
.
Kind Checker
The kind checker is designed around a simply typed lambda calculus type system
with inference via unification (John Alan Robinson's unification algorithm is
being currently used). We've also introduced a let
binding rule that allows
later additions to the typing context. The typing rules are the following:
x : σ ∈ Γ
----------- [Variable]
Γ ⊢ x : σ
Γ,x:σ ⊢ e:τ
------------------------ [Abstraction]
Γ ⊢ (λ x:σ. e):(σ → τ)
Γ ⊢ x:(σ → τ) Γ ⊢ y:σ
------------------------- [Application]
Γ ⊢ x y : τ
Γ ⊢ e₁:σ Γ, e₁ : σ ⊢ e₂:τ
----------------------------- [Let]
Γ ⊢ let x = e₁ in e₂ : τ
The type checking strategy is as follows:
-
A context is built from the type definitions. Type variables which are not annotated with any further kind information are defaulted to be of kind
Type
. -
The RHS of the type terms are converted into their Sums of Products canonical representation. The variables from the left hand side of the term are introduced to the right hand side as abstractions.
-
The derivation for each term is built.
-
Then the unification tries to find a solution.
Terms for which unification cannot find a consistent derivation are deemed incorrect and a kind checking error is thrown. Note that currently all of the inferred kinds have a restriction to be monomorphic - therefore no free or universally quantified variables can appear in the final kind signature.
Checking Type Cardinality
In addition to kind checking, the Compiler could perform a special check for types to determine their cardinality. This is especially useful to catch and report on non inhabited types that users might define.
For example, data F a = F (F a)
declares a non-inhabited recursive type that
can't be constructed. LambdaBuffers Compiler SHOULD reject such types as they
can't possibly be constructed and generated typeclass instances would in turn be
ill-defined.
This problem is equivalent to a problem of calculating graph components.
Normalizing Type Definitions
Finally, the compiler should be able to normalize expressions. For example, it
may be possible to define a data type in the schema language in a form similar
to: data G a = G ((Either) ((Maybe) a) Int)
, where the bracketing indicates
the order of application within the term. The example term would normalize to
data G a = G (Either (Maybe a) Int)
- resulting in a cleaner (and more
performant) code generation.
Checking Typeclass Definitions and Instance Clauses
The Compiler should, if possible, ensure that all instance declarations for schemata are derivable using hard-coded derivation axioms.
Other schema languages support generating type definitions in many languages from a single definition in the schema language. One key feature that sets LambdaBuffers apart from these alternatives is support for typeclasses, which enable the generation of ad-hoc polymorphic functions that operate on types generated from LambdaBuffers schemata.
The LambdaBuffers schema language doesn't allow users to specify typeclass instance implementations themselves. Users, instead, will write instance clauses as part of the schema definition, and the LambdaBuffers code generator will derive these declared instances when generating code.
Two important consequences of this design decision are:
-
All instances must be derived structurally. As an example, consider the arbitrary product type
data P a b = P a b
. The semantics of the generated instance (i.e. the behavior of the generated code) must be determinable from the structure of the type - that it is a product - and the instances for its argumentsa
andb
, and by those features alone. (Sincea
andb
are type variables here, writing a direct instance for any interesting class is likely impossible, so LambdaBuffers supports constrained instances such asinstance (C a, C b) => C (P a b)
) -
All instances must be uniform across supported languages. Because the LambdaBuffers Codegen component (and not the user) is responsible for generating instances, we must ensure that the codegen component is suitably equipped to generate instances in each language that exhibit behavior which is, to the greatest extent possible, equivalent to the behavior of generated instances in any other language. We must have an extensive test quite to verify uniform behavior of generated instances.
In languages with a typeclass system (Haskell, PureScript) or equivalent (Rust's Traits), we will utilize the existing system and should (to the extent that doing so is feasible) utilize existing typeclasses and instances from commonly used or standard libraries. In languages lacking a type system that is sufficiently rich to express typeclass relations, we will generate instances using idiomatic language features. (The details will depend on the particular language.)