published on July 19th, 2016
So after looking at the invocation of the elaborator in the last post, I now want to look into how the resulting TT terms are further processed and simplified into the other intermediary languages. The cross-platform compilers paper by Edwin Brady gives a high level description of the later stage intermediate languages of Idris. This post will take a look at where the intermediate languages are located in the code and what functions are invoked to simplify TT further.
After converting Idris to TT
, there are a number of additional
intermediate languages, which get produced in subsequent steps. The
idea is that these languages are progressively simpler and that each
code generator can pick a lanugage which it is easiest to generate the
desired code from. The cross-platform compilers paper by Edwin Brady
ilustrates the intermediate languages used in Idris. After Idris and TT,
the paper lists the following:
TT_case in which pattern matching is converted to simple case trees via a pattern match compiler.
IR_case in which all types are erased, as well as any values which are provably not used at run-time.
IR_lift in which there are no lambda bindings other than at the top level.
IR_defunc in which all functions are fully applied. All functions are first-order, by defunctionalisation.
IR_ANF in which all functions are in Applicative Normal Form (ANF), that is all arguments to functions are trivial. By “trivial”, we mean that evaluation of an argument must terminate immediately, that is arguments are either constant or variables.
Looking at the idrisMain
function in Idris/REPL.hs
again, we first
loadInputs
(which ends up calling either loadSource
or reloading
the equivalent from ibc files). As discussed in the last post, the
loadSource
function from Idris/Parser.hs
does a lot of work,
namely parsing source files, invoking the elaborator via the
elabDecls
function from Idris/ElabDecls.hs
, simplifying case
definitions via the simplifyCasedef
function (this converts TT
to
TT_case
), totality checking (first building a size change graph via
buildSCG
, then checkDeclTotality
and verifyTotality
), and
finally writing out the ibc file.
Next we call the process
function with a Compile
command. process
(also in REPL.hs
) calls the compile
function
in IRTS/Compiler.hs
, so this seems to be the main entry point
for further processing of the TT code.
compile
does some usage analysis to find reachable names, then calls
mkDecls
on a list of these names. mkDecls
has a type of [Name] ->
Idris [(Name, LDecl)]
, so outputs a list of declarations of type
LDecl
. LDecl
is the first post-TT intermediate language, it is
defined in IRTS/Lang.hs
. If I am not mistaken, this should
correspond to IR_case
in the above list (TT_case
is a special case
of TT which is represented in the same data structure as TT
).
LDecl
just has two constructors, LFun
and LConstructor
. The
latter just has a Name
, an Int
(tag) and another Int
(arity),
the former has options, names (name and argument names) and an
expression of type LExp
, which is the actual definition of the
function (defined in the same file).
mkLDecls
is basically a wrapper around the build
function, which
creates a (Name, LDecl)
pair for each (Name, Def)
declaration that
it gets, so LDecls
seem to correspond directly to Defs
. If the
provided name is a primitive operation build
directly constructs an
LFun
which contains an LOp
with some machine generated names for
the arguments, otherwise it calls mkLDecl
to do this.
mkLDecl
handles the constructors Function
, CaseOp
, TyDecl DCon
and TyDecl TCon
explicitly, for everything else it creates an LFun
[Inline] name [] LNothing
construction, this seems to be a function
which does a no-op. For the Function
and the CaseOp
constructors,
the irTerm
and irTree
seem to be the real work horses, we'll check
these out next.
Of the two irTree
itself is short, handing of its work to irSC
,
which pattern matches on a variable of type SC
, this seems to be the
case tree representation of the TT terms generated by
simplifyCasedef
earlier. irSC
is pretty involved and looks like it
is handeling a lot of special cases. A better understanding of the
case trees is probably necessary to understand what is happening in
irSC
and why.
irTerm
on the other hand handles conversions of TT terms, handling
App
, P
, V
, Bind
, Proj
, Constant
, TType
, Erased
and
Impossible
constructors. The latter three are mapped directly to
LNothing
, transformations for P
, V
, Bind
, Proj
and
Constant
seem pretty straight-forward, however, the App
case
is very involved, it unApply
s the application and has about 20
different pattern matches on the P
constructor alone.
Nevertheless mkDecls
and the functions it calls handle the
transformation from TT
to LDecl
- some complications along
these lines should have been expected ;).
Returning to the compile
function, the inlineAll
function
from IRTS/LangOpts.hs
is invoked on our LDecls
, which, for
all LDecls
basically searches for LApp
constructs (so function
applications), checks a number of conditions to see that these are
inlineable, invents fresh names for the variables and replaces
the application with the function body.
Next allocUnique
and addTags
make some tranforms to maintain
unique names (?) and tag constructors(?). In there liftAll
does lambda lifting for all LDecls, the work horse here seems
to be the lift
function in IRTS/Lang.hs
. So this innocent
looking step does the transformation from IR_case
to IR_lift
,
all the while the data type is still LDecl
.
Next up is a call to defunctionalise
which turns LDefs
into
DDefs
, this is the representation of IR_defunc
where all function
applications are fully applied and first order. DDefs
is a Ctxt
around DDecl
, which looks pretty similar to LDecl
(some cases have
been simplified, I suppose). Next, DDefs
are inline
ed, as far as I
can tell this does not do anything at the moment.
The next call is to simplifyDefs
which turns the DDefs
into
SDefs
, where SDefs
are definitions in the simplified language
defined in IRTS/Simplified.hs
, I think this corresponds to IR_ANF
,
xthe simplest of the intermediate languages.
With this we're pretty much through the intermediate language
generation and return a CodegenInfo
, which contains the simplified
(c
, IR_ANF), defunctionalized (defuns
IR_defun) and lifted
(tagged
, IR_lift) representations.