published on July 19, 2016
So after looking at the invocation of the elaborator in the last post, I now want to look into how the resulting TT terms are further processed and simplified into the other intermediary languages. The cross-platform compilers paper by Edwin Brady gives a high level description of the later stage intermediate languages of Idris. This post will take a look at where the intermediate languages are located in the code and what functions are invoked to simplify TT further.
After converting Idris to
TT, there are a number of additional intermediate languages, which get produced in subsequent steps. The idea is that these languages are progressively simpler and that each code generator can pick a lanugage which it is easiest to generate the desired code from. The cross-platform compilers paper by Edwin Brady ilustrates the intermediate languages used in Idris. After Idris and TT, the paper lists the following:
TT_case in which pattern matching is converted to simple case trees via a pattern match compiler.
IR_case in which all types are erased, as well as any values which are provably not used at run-time.
IR_lift in which there are no lambda bindings other than at the top level.
IR_defunc in which all functions are fully applied. All functions are first-order, by defunctionalisation.
IR_ANF in which all functions are in Applicative Normal Form (ANF), that is all arguments to functions are trivial. By “trivial”, we mean that evaluation of an argument must terminate immediately, that is arguments are either constant or variables.
Looking at the
idrisMain function in
Idris/REPL.hs again, we first
loadInputs (which ends up calling either
loadSource or reloading the equivalent from ibc files). As discussed in the last post, the
loadSource function from
Idris/Parser.hs does a lot of work, namely parsing source files, invoking the elaborator via the
elabDecls function from
Idris/ElabDecls.hs, simplifying case definitions via the
simplifyCasedef function (this converts
TT_case), totality checking (first building a size change graph via
verifyTotality), and finally writing out the ibc file.
Next we call the
process function with a
process (also in
REPL.hs) calls the
compile function in
IRTS/Compiler.hs, so this seems to be the main entry point for further processing of the TT code.
compile does some usage analysis to find reachable names, then calls
mkDecls on a list of these names.
mkDecls has a type of
[Name] -> Idris [(Name, LDecl)], so outputs a list of declarations of type
LDecl is the first post-TT intermediate language, it is defined in
IRTS/Lang.hs. If I am not mistaken, this should correspond to
IR_case in the above list (
TT_case is a special case of TT which is represented in the same data structure as
LDecl just has two constructors,
LConstructor. The latter just has a
Int (tag) and another
Int (arity), the former has options, names (name and argument names) and an expression of type
LExp, which is the actual definition of the function (defined in the same file).
mkLDecls is basically a wrapper around the
build function, which creates a
(Name, LDecl) pair for each
(Name, Def) declaration that it gets, so
LDecls seem to correspond directly to
Defs. If the provided name is a primitive operation
build directly constructs an
LFun which contains an
LOp with some machine generated names for the arguments, otherwise it calls
mkLDecl to do this.
mkLDecl handles the constructors
TyDecl DCon and
TyDecl TCon explicitly, for everything else it creates an
LFun [Inline] name  LNothing construction, this seems to be a function which does a no-op. For the
Function and the
CaseOp constructors, the
irTree seem to be the real work horses, we’ll check these out next.
Of the two
irTree itself is short, handing of its work to
irSC, which pattern matches on a variable of type
SC, this seems to be the case tree representation of the TT terms generated by
irSC is pretty involved and looks like it is handeling a lot of special cases. A better understanding of the case trees is probably necessary to understand what is happening in
irSC and why.
irTerm on the other hand handles conversions of TT terms, handling
Impossible constructors. The latter three are mapped directly to
LNothing, transformations for
Constant seem pretty straight-forward, however, the
App case is very involved, it
unApplys the application and has about 20 different pattern matches on the
P constructor alone.
mkDecls and the functions it calls handle the transformation from
LDecl - some complications along these lines should have been expected ;).
Returning to the
compile function, the
inlineAll function from
IRTS/LangOpts.hs is invoked on our
LDecls, which, for all
LDecls basically searches for
LApp constructs (so function applications), checks a number of conditions to see that these are inlineable, invents fresh names for the variables and replaces the application with the function body.
addTags make some tranforms to maintain unique names (?) and tag constructors(?). In there
liftAll does lambda lifting for all LDecls, the work horse here seems to be the
lift function in
IRTS/Lang.hs. So this innocent looking step does the transformation from
IR_lift, all the while the data type is still
Next up is a call to
defunctionalise which turns
DDefs, this is the representation of
IR_defunc where all function applications are fully applied and first order.
DDefs is a
DDecl, which looks pretty similar to
LDecl (some cases have been simplified, I suppose). Next,
inlineed, as far as I can tell this does not do anything at the moment.
The next call is to
simplifyDefs which turns the
SDefs are definitions in the simplified language defined in
IRTS/Simplified.hs, I think this corresponds to
IR_ANF, xthe simplest of the intermediate languages.
With this we’re pretty much through the intermediate language generation and return a
CodegenInfo, which contains the simplified (
c, IR_ANF), defunctionalized (
defuns IR_defun) and lifted (
tagged, IR_lift) representations.