A Look at the Idris Internals, Part IV - From TT Onward

published on July 19th, 2016

So after looking at the invocation of the elaborator in the last post, I now want to look into how the resulting TT terms are further processed and simplified into the other intermediary languages. The cross-platform compilers paper by Edwin Brady gives a high level description of the later stage intermediate languages of Idris. This post will take a look at where the intermediate languages are located in the code and what functions are invoked to simplify TT further.

Idris's later stage intermediate languages

After converting Idris to TT, there are a number of additional intermediate languages, which get produced in subsequent steps. The idea is that these languages are progressively simpler and that each code generator can pick a lanugage which it is easiest to generate the desired code from. The cross-platform compilers paper by Edwin Brady ilustrates the intermediate languages used in Idris. After Idris and TT, the paper lists the following:

• TT_case in which pattern matching is converted to simple case trees via a pattern match compiler.

• IR_case in which all types are erased, as well as any values which are provably not used at run-time.

• IR_lift in which there are no lambda bindings other than at the top level.

• IR_defunc in which all functions are fully applied. All functions are first-order, by defunctionalisation.

• IR_ANF in which all functions are in Applicative Normal Form (ANF), that is all arguments to functions are trivial. By “trivial”, we mean that evaluation of an argument must terminate immediately, that is arguments are either constant or variables.

Where are the intermediate languages produced?

Looking at the idrisMain function in Idris/REPL.hs again, we first loadInputs (which ends up calling either loadSource or reloading the equivalent from ibc files). As discussed in the last post, the loadSource function from Idris/Parser.hs does a lot of work, namely parsing source files, invoking the elaborator via the elabDecls function from Idris/ElabDecls.hs, simplifying case definitions via the simplifyCasedef function (this converts TT to TT_case), totality checking (first building a size change graph via buildSCG, then checkDeclTotality and verifyTotality), and finally writing out the ibc file.

Next we call the process function with a Compile command. process (also in REPL.hs) calls the compile function in IRTS/Compiler.hs, so this seems to be the main entry point for further processing of the TT code.

compile does some usage analysis to find reachable names, then calls mkDecls on a list of these names. mkDecls has a type of [Name] -> Idris [(Name, LDecl)], so outputs a list of declarations of type LDecl. LDecl is the first post-TT intermediate language, it is defined in IRTS/Lang.hs. If I am not mistaken, this should correspond to IR_case in the above list (TT_case is a special case of TT which is represented in the same data structure as TT). LDecl just has two constructors, LFun and LConstructor. The latter just has a Name, an Int (tag) and another Int (arity), the former has options, names (name and argument names) and an expression of type LExp, which is the actual definition of the function (defined in the same file).

mkLDecls is basically a wrapper around the build function, which creates a (Name, LDecl) pair for each (Name, Def) declaration that it gets, so LDecls seem to correspond directly to Defs. If the provided name is a primitive operation build directly constructs an LFun which contains an LOp with some machine generated names for the arguments, otherwise it calls mkLDecl to do this.

mkLDecl handles the constructors Function, CaseOp, TyDecl DCon and TyDecl TCon explicitly, for everything else it creates an LFun [Inline] name [] LNothing construction, this seems to be a function which does a no-op. For the Function and the CaseOp constructors, the irTerm and irTree seem to be the real work horses, we'll check these out next.

Of the two irTree itself is short, handing of its work to irSC, which pattern matches on a variable of type SC, this seems to be the case tree representation of the TT terms generated by simplifyCasedef earlier. irSC is pretty involved and looks like it is handeling a lot of special cases. A better understanding of the case trees is probably necessary to understand what is happening in irSC and why.

irTerm on the other hand handles conversions of TT terms, handling App, P, V, Bind, Proj, Constant, TType, Erased and Impossible constructors. The latter three are mapped directly to LNothing, transformations for P, V, Bind, Proj and Constant seem pretty straight-forward, however, the App case is very involved, it unApplys the application and has about 20 different pattern matches on the P constructor alone.

Nevertheless mkDecls and the functions it calls handle the transformation from TT to LDecl - some complications along these lines should have been expected ;).

Further processing

Returning to the compile function, the inlineAll function from IRTS/LangOpts.hs is invoked on our LDecls, which, for all LDecls basically searches for LApp constructs (so function applications), checks a number of conditions to see that these are inlineable, invents fresh names for the variables and replaces the application with the function body.

Next allocUnique and addTags make some tranforms to maintain unique names (?) and tag constructors(?). In there liftAll does lambda lifting for all LDecls, the work horse here seems to be the lift function in IRTS/Lang.hs. So this innocent looking step does the transformation from IR_case to IR_lift, all the while the data type is still LDecl.

Next up is a call to defunctionalise which turns LDefs into DDefs, this is the representation of IR_defunc where all function applications are fully applied and first order. DDefs is a Ctxt around DDecl, which looks pretty similar to LDecl (some cases have been simplified, I suppose). Next, DDefs are inlineed, as far as I can tell this does not do anything at the moment.

The next call is to simplifyDefs which turns the DDefs into SDefs, where SDefs are definitions in the simplified language defined in IRTS/Simplified.hs, I think this corresponds to IR_ANF, xthe simplest of the intermediate languages.

With this we're pretty much through the intermediate language generation and return a CodegenInfo, which contains the simplified (c, IR_ANF), defunctionalized (defuns IR_defun) and lifted (tagged, IR_lift) representations.