published on July 19th, 2016
So after looking at the invocation of the elaborator in the last post, I now want to look into how the resulting TT terms are further processed and simplified into the other intermediary languages. The cross-platform compilers paper by Edwin Brady gives a high level description of the later stage intermediate languages of Idris. This post will take a look at where the intermediate languages are located in the code and what functions are invoked to simplify TT further.
After converting Idris to TT, there are a number of additional
intermediate languages, which get produced in subsequent steps. The
idea is that these languages are progressively simpler and that each
code generator can pick a lanugage which it is easiest to generate the
desired code from. The cross-platform compilers paper by Edwin Brady
ilustrates the intermediate languages used in Idris. After Idris and TT,
the paper lists the following:
TT_case in which pattern matching is converted to simple case trees via a pattern match compiler.
IR_case in which all types are erased, as well as any values which are provably not used at run-time.
IR_lift in which there are no lambda bindings other than at the top level.
IR_defunc in which all functions are fully applied. All functions are first-order, by defunctionalisation.
IR_ANF in which all functions are in Applicative Normal Form (ANF), that is all arguments to functions are trivial. By “trivial”, we mean that evaluation of an argument must terminate immediately, that is arguments are either constant or variables.
Looking at the idrisMain function in Idris/REPL.hs again, we first
loadInputs (which ends up calling either loadSource or reloading
the equivalent from ibc files). As discussed in the last post, the
loadSource function from Idris/Parser.hs does a lot of work,
namely parsing source files, invoking the elaborator via the
elabDecls function from Idris/ElabDecls.hs, simplifying case
definitions via the simplifyCasedef function (this converts TT to
TT_case), totality checking (first building a size change graph via
buildSCG, then checkDeclTotality and verifyTotality), and
finally writing out the ibc file.
Next we call the process function with a Compile
command. process (also in REPL.hs) calls the compile function
in IRTS/Compiler.hs, so this seems to be the main entry point
for further processing of the TT code.
compile does some usage analysis to find reachable names, then calls
mkDecls on a list of these names. mkDecls has a type of [Name] ->
Idris [(Name, LDecl)], so outputs a list of declarations of type
LDecl. LDecl is the first post-TT intermediate language, it is
defined in IRTS/Lang.hs. If I am not mistaken, this should
correspond to IR_case in the above list (TT_case is a special case
of TT which is represented in the same data structure as TT).
LDecl just has two constructors, LFun and LConstructor. The
latter just has a Name, an Int (tag) and another Int (arity),
the former has options, names (name and argument names) and an
expression of type LExp, which is the actual definition of the
function (defined in the same file).
mkLDecls is basically a wrapper around the build function, which
creates a (Name, LDecl) pair for each (Name, Def) declaration that
it gets, so LDecls seem to correspond directly to Defs. If the
provided name is a primitive operation build directly constructs an
LFun which contains an LOp with some machine generated names for
the arguments, otherwise it calls mkLDecl to do this.
mkLDecl handles the constructors Function, CaseOp, TyDecl DCon
and TyDecl TCon explicitly, for everything else it creates an LFun
[Inline] name [] LNothing construction, this seems to be a function
which does a no-op. For the Function and the CaseOp constructors,
the irTerm and irTree seem to be the real work horses, we'll check
these out next.
Of the two irTree itself is short, handing of its work to irSC,
which pattern matches on a variable of type SC, this seems to be the
case tree representation of the TT terms generated by
simplifyCasedef earlier. irSC is pretty involved and looks like it
is handeling a lot of special cases. A better understanding of the
case trees is probably necessary to understand what is happening in
irSC and why.
irTerm on the other hand handles conversions of TT terms, handling
App, P, V, Bind, Proj, Constant, TType, Erased and
Impossible constructors. The latter three are mapped directly to
LNothing, transformations for P, V, Bind, Proj and
Constant seem pretty straight-forward, however, the App case
is very involved, it unApplys the application and has about 20
different pattern matches on the P constructor alone.
Nevertheless mkDecls and the functions it calls handle the
transformation from TT to LDecl - some complications along
these lines should have been expected ;).
Returning to the compile function, the inlineAll function
from IRTS/LangOpts.hs is invoked on our LDecls, which, for
all LDecls basically searches for LApp constructs (so function
applications), checks a number of conditions to see that these are
inlineable, invents fresh names for the variables and replaces
the application with the function body.
Next allocUnique and addTags make some tranforms to maintain
unique names (?) and tag constructors(?). In there liftAll
does lambda lifting for all LDecls, the work horse here seems
to be the lift function in IRTS/Lang.hs. So this innocent
looking step does the transformation from IR_case to IR_lift,
all the while the data type is still LDecl.
Next up is a call to defunctionalise which turns LDefs into
DDefs, this is the representation of IR_defunc where all function
applications are fully applied and first order. DDefs is a Ctxt
around DDecl, which looks pretty similar to LDecl (some cases have
been simplified, I suppose). Next, DDefs are inlineed, as far as I
can tell this does not do anything at the moment.
The next call is to simplifyDefs which turns the DDefs into
SDefs, where SDefs are definitions in the simplified language
defined in IRTS/Simplified.hs, I think this corresponds to IR_ANF,
xthe simplest of the intermediate languages.
With this we're pretty much through the intermediate language
generation and return a CodegenInfo, which contains the simplified
(c, IR_ANF), defunctionalized (defuns IR_defun) and lifted
(tagged, IR_lift) representations.