published on July 19th, 2016
So after looking at the invocation of the elaborator in the last post, I now want to look into how the resulting TT terms are further processed and simplified into the other intermediary languages. The cross-platform compilers paper by Edwin Brady gives a high level description of the later stage intermediate languages of Idris. This post will take a look at where the intermediate languages are located in the code and what functions are invoked to simplify TT further.
After converting Idris to
TT, there are a number of additional
intermediate languages, which get produced in subsequent steps. The
idea is that these languages are progressively simpler and that each
code generator can pick a lanugage which it is easiest to generate the
desired code from. The cross-platform compilers paper by Edwin Brady
ilustrates the intermediate languages used in Idris. After Idris and TT,
the paper lists the following:
TT_case in which pattern matching is converted to simple case trees via a pattern match compiler.
IR_case in which all types are erased, as well as any values which are provably not used at run-time.
IR_lift in which there are no lambda bindings other than at the top level.
IR_defunc in which all functions are fully applied. All functions are first-order, by defunctionalisation.
IR_ANF in which all functions are in Applicative Normal Form (ANF), that is all arguments to functions are trivial. By “trivial”, we mean that evaluation of an argument must terminate immediately, that is arguments are either constant or variables.
Looking at the
idrisMain function in
Idris/REPL.hs again, we first
loadInputs (which ends up calling either
loadSource or reloading
the equivalent from ibc files). As discussed in the last post, the
loadSource function from
Idris/Parser.hs does a lot of work,
namely parsing source files, invoking the elaborator via the
elabDecls function from
Idris/ElabDecls.hs, simplifying case
definitions via the
simplifyCasedef function (this converts
TT_case), totality checking (first building a size change graph via
finally writing out the ibc file.
Next we call the
process function with a
process (also in
REPL.hs) calls the
IRTS/Compiler.hs, so this seems to be the main entry point
for further processing of the TT code.
compile does some usage analysis to find reachable names, then calls
mkDecls on a list of these names.
mkDecls has a type of
Idris [(Name, LDecl)], so outputs a list of declarations of type
LDecl is the first post-TT intermediate language, it is
IRTS/Lang.hs. If I am not mistaken, this should
IR_case in the above list (
TT_case is a special case
of TT which is represented in the same data structure as
LDecl just has two constructors,
latter just has a
Int (tag) and another
the former has options, names (name and argument names) and an
expression of type
LExp, which is the actual definition of the
function (defined in the same file).
mkLDecls is basically a wrapper around the
build function, which
(Name, LDecl) pair for each
(Name, Def) declaration that
it gets, so
LDecls seem to correspond directly to
Defs. If the
provided name is a primitive operation
build directly constructs an
LFun which contains an
LOp with some machine generated names for
the arguments, otherwise it calls
mkLDecl to do this.
mkLDecl handles the constructors
TyDecl TCon explicitly, for everything else it creates an
[Inline] name  LNothing construction, this seems to be a function
which does a no-op. For the
Function and the
irTree seem to be the real work horses, we'll check
these out next.
Of the two
irTree itself is short, handing of its work to
which pattern matches on a variable of type
SC, this seems to be the
case tree representation of the TT terms generated by
irSC is pretty involved and looks like it
is handeling a lot of special cases. A better understanding of the
case trees is probably necessary to understand what is happening in
irSC and why.
irTerm on the other hand handles conversions of TT terms, handling
Impossible constructors. The latter three are mapped directly to
LNothing, transformations for
Constant seem pretty straight-forward, however, the
is very involved, it
unApplys the application and has about 20
different pattern matches on the
P constructor alone.
mkDecls and the functions it calls handle the
LDecl - some complications along
these lines should have been expected ;).
Returning to the
compile function, the
IRTS/LangOpts.hs is invoked on our
LDecls, which, for
LDecls basically searches for
LApp constructs (so function
applications), checks a number of conditions to see that these are
inlineable, invents fresh names for the variables and replaces
the application with the function body.
addTags make some tranforms to maintain
unique names (?) and tag constructors(?). In there
does lambda lifting for all LDecls, the work horse here seems
to be the
lift function in
IRTS/Lang.hs. So this innocent
looking step does the transformation from
all the while the data type is still
Next up is a call to
defunctionalise which turns
DDefs, this is the representation of
IR_defunc where all function
applications are fully applied and first order.
DDefs is a
DDecl, which looks pretty similar to
LDecl (some cases have
been simplified, I suppose). Next,
inlineed, as far as I
can tell this does not do anything at the moment.
The next call is to
simplifyDefs which turns the
SDefs are definitions in the simplified language
IRTS/Simplified.hs, I think this corresponds to
xthe simplest of the intermediate languages.
With this we're pretty much through the intermediate language
generation and return a
CodegenInfo, which contains the simplified
c, IR_ANF), defunctionalized (
defuns IR_defun) and lifted
tagged, IR_lift) representations.