1IN2009: Language ProcessorsCoursework Part 3IntroductionThis is the 3rd and final part of the coursework. In Part 1 you created a parser for the LPLgrammar which, given a syntactically correct LPL program as input, builds an ASTrepresentation of the program. In Part 3 you develop a compiler that processes the AST togenerate IR code.For this part of the coursework we provide functional code for parsing and type checking.Module marksThis coursework is worth either 70% or 60% of the coursework marks for the module,whichever gives you the highest overall mark. This coursework is marked out of 100.PlagiarismIf you copy the work of others (either that of fellow students or of a third party), with orwithout their permission, you will score no marks and further disciplinary action will be takenagainst you.Pair-workingPair-working is permitted only if you officially registered as a pair. If working as a pair, bothmembers must submit identical files and both members must contribute equally to the work.Other stuff you need to knowSee Other stuff you need to know at the end of this document.Submission: Submit two files:1. A zip archive (not a rar file) of all your source code. If you havedeveloped in an IDE do not submit the entire project structure, only thesource code, and before you submit check that your source code can becompiled and executed using command line tools alone (javac andjava).2. Either an MS Word or PDF file containing your answers to part d.Note: Compilers which do not compile (see above) will get zero marks.2Getting startedDownload LPL-compiler.zip from Moodle and extract all files. Key contents to beaware of:1. Source code:• An LPL parser (in package lpl.parser).• An LPL type checker (in package lpl.staticanalysis).• A prototype compiler (lpl.compiler.Compiler).• A top-level program lpl.Compile for running your compiler: this createsan .ir file containing the IR code generated by your compiler.2. A jar of compiled library code Backend.jar . This provides the IR AST classes(package ir.ast) an abstract machine (package tac) an IR parser (packageir.parser) an IR compiler (package ir.compiler) and top-level programs:• ir.Compile: takes an .ir file as input and creates two new files: binarymachine code (with extension .tac) and a human-readable assembly versionof the same code (with extension .tacass).• tac.Exec: for running tac binaries.Study the code for the prototype compiler in package lpl.compiler. You will find thefollowing:1. Class Compiler which implements the Visitor interface (by extending VisitorAdapter)and contains a compile method which takes an LPL program as a parameter andreturns an IR program as its result. You need to complete the Visitor implementationas well as the top-level compile method.2. Support classes FreshLabelGenerator and IRUtils. The class IRUtils provides anumber of convenience static factory methods for building IR ASTs; you will bewriting code which uses these factory methods to build an IR program (you don’tstrictly need to use the factory methods, since you could use IR AST constructorsdirectly, but it will make your life much easier if you do).Your compiler should generate code which implements all assignable values as integers, asfollows:• int values: implement in the obvious way.• bool values: use 0 for false and 1 for true.• unit: the unit type has only one value; implement as 0.• array values: like Java, LPL uses reference semantics for arrays; implement as aninteger which is a memory address within the heap where the array data resides. Nullreferences are implemented as 0.The parts below should be attempted in sequence. When you have completed one part youshould make a back-up copy of the work and keep it safe, in case you break it in your attemptat the next part. Be sure to test the old functionality as well as the new (regression testing).We will not assess multiple versions so, if a later attempt breaks previously working code,you may gain a better mark by submitting the earlier version for assessment.3a) [30 marks] The Basic Compiler: partially complete the implementation oflpl.compiler.Compiler. To get started without becoming lost in the detail, don’tinitially implement support for function definitions, function calls, return statements, orarrays. The prototype code assumes that a program consists of a single initial “main” function(the actual function name does not matter) which has zero parameters; it just compiles thebody of this function. A proper treatment of variables is not possible in this version: compilevariables to TEMP expressions for now. Much of the code that you need to write can be foundin the Session 9 slides.b) [20 marks] Functions: add support for function definitions and function calls. Variablesnow must be implemented as MEM expressions using offsets from TEMP FP. You should nolonger assume that the initial procedure always has zero parameters. Instead, your generatedIR code should start with code which loads command-line argument values from the stack andcalls the top-level procedure using these as the actual parameters. Note: the call to the initialprocedure must be followed by a JUMP to the label _END, otherwise execution will fallthrough to the following code (the symptom is likely to be an infinite looping behaviour).Label _END is pre-defined and added by the IR compiler (do not add your own).You should generate code under the assumption that all command-line arguments have beenpushed on the stack, followed by an argument count. For example, suppose you execute acompiled program as follows:java -cp Backend.jar tac.Exec prog.tac 78 29Before executing prog.tac, the Exec program will initialise the machinestate so that the stack looks like the picture on the left.Note: initially, FP points to an address just before the base of the stack. Yourgenerated code can use negative offsets from FP to access the command-linearguments.c) [20 marks] Arrays: add support for arrays. Your generated code will need to call the predefined _malloc function to allocate heap memory for array creation. You are not requiredto generate code for bounds-checking.d) [30 Marks] LPL language extension. Design an extension for the LPL language. There aremany features present in languages like Java that are missing from LPL, so you have a widerange of options. More ambitious choices will have wider scope for marks than trivial ones.Your extension must be backwards compatible with the existing grammar (the languagegenerated by your new grammar must include the language generated by the existinggrammar). You should provide:• A short informal description of your language extension.• The grammar for your extension (you do not need to repeat all the existing rules, onlynew rules and modified versions of any rules which have changed). Use the samenotation as in the existing grammar.• A detailed description of any necessary code changes or additions to:o the set of AST classeso the type-checkero the compilerYou may use a mixture of natural language and pseudocode or Java. FP →7829SP →2 4TestingWhen you have a partially working compiler you can test it by compiling LPL programs to IRcode, then compiling the IR code to a tac executable, then executing the tac code. Forexample, you might try:javac -cp .:Backend.jar lpl/Compile.javajava -cp .:Backend.jar lpl.Compile counter.lpljava -cp Backend.jar ir.Compile counter.irjava -cp Backend.jar tac.Exec counter.tac 11(assuming that counter.lpl is the test input provided for Part 1, the expected output inthis case would be: 11 10 9 8 7 6 5 4 3 2 1 0).You can use the test inputs from Part 1 but note that these do not comprise a comprehensivetest-suite. You should invent and run your own tests. The document LPL compared with Javagives a concise summary of how LPL programs are supposed to behave.If the IR code generated by your LPL compiler is rejected by the IR compiler, or doesn’texecute as you expect, then you should study the .ir file to see why. (If it is accepted by theIR compiler you can also look at the assembly code in the .tacass file, but this is lesslikely to be useful for debugging your LPL compiler.) As always, test incrementallythroughout development, and design test inputs which are as simple as possible for thebehaviour that you want to test.Note: Windows usersshould use semi-colon (;)as the classpath separatorinstead of colon (:).5Other stuff you need to knowTo compile to correct IR code, you need to know a few things about what the IR compilerwill do with it:1. TEMP names. TEMP’s are the IR counterpart to machine registers (and will in fact becompiled to registers by the backend compiler). Some names are treated specially bythe IR compiler:FP is the frame pointerSP is the stack pointerRV this where your compiled function bodies must leave their return valuesApart from these, you are free to invent any names you like for TEMP nodes but careis needed to avoid name clashes, so you are advised to use theFreshLabelGenerator.freshLab methods.You will probably notice that in some cases it would be possible to be moreeconomical and reuse the same TEMP name in different parts of your code: DON’Tbe tempted to do this. Firstly, it is easy to get this wrong, leading to some very subtlebugs in your compiled code (IR code is deliberately designed to allow an unboundednumber of “registers” so that you can avoid these issues). Secondly, even if youmanage to get it right, it is actually likely to result in less efficient executable codebecause of the register-allocation algorithm used by the backend compiler.2. LABEL names. For the most part, you should use freshLab to create label names,since label names must be unique (but see the remarks below about compiling functiondefinitions). Don’t create any labels with names that start with an underscore: theseare reserved as label names for use by the backend IR compiler.3. Pre-defined labels. The IR compiler provides the following routines which you cancall in your generated IR code, as required. Each of them takes a single parameter._printchar : the parameter is an integer which will be interpreted as a 16-bitUnicode Plane 0 code point of a character to be printed (the 16 higher-order bits ofthe integer are ignored). Note that the first 128 code points coincide with ASCII._printint : the parameter is an integer which will be printed as text (with nonewline)._printstr : the parameter is a memory address for a null-terminated stringconstant; any valid memory address can be used but in practice you will alwaysspecify the parameter as NAME lab, where lab is a label name defined in the(optional) strings section at the start of your generated IR code._malloc : the parameter is the number of words of memory to allocate; the startaddress of the allocated block is returned. Note that _malloc will allocate memorywhich is not currently in use but makes no guarantees about the contents of theallocated memory (it may contain arbitrary junk).The IR compiler also adds a label _END to the very end of the compiled code.4. Function definitions: You will compile an LPL function definition by compiling itsbody into a sequence of IR statements, starting with LABEL foo, where foo is the LPLfunction name; your code should ensure that the return value is stored in TEMP RV.To compile the IR sequence into machine code which can be called and returned from,the backend IR compiler needs to top-and-tail the code that it generates withinstructions for pushing and popping a stack frame; to enable this, your generated codemust include a PROLOGUE at the start (immediately after LABEL foo) and anEPILOGUE at the end.
