The code for this consists of several pieces:
- Grammar
- Base/common classes
- A scanner
- A parser
- An emitter
- Put it all together
First and foremost, the software on this page is free.
1. Grammar
The grammar for this language is fairly simple and is spelled out below:
From the grammar, we can start to build up classes that will represent the language inside the compiler.
2. Base/common classes
These base classes, some derivatives, and enumerations all make up the definition of the language inside the compiler. The scanner will then use these to create a tree of the tokens in your source file, which in turn the parser uses to determine validity and passes onto the code generator.
It's important to note that the language grammar is defined recursively. This is true of most languages.
All of these classes make use of certain using statements, so don't get confused when you see these out and about in the code:
So, here are the base/common classes.
The base of them all is the Stmt class, this is so we can utilize polymorphism and create a list of Stmt objects to hold any of these objects.
3. The Scanner
The scanner is probably the simplest looking of the classes. It's important to note that this is where comments get removed from the code.
4. The Parser
This is pretty much the syntax checker of the compilation. If you read through the code, you'll see that everything is declared as what they objects are, but are stored as Stmt objects in lists.
This is where the compilation magic happens. Here we actually output the IL (CLR op codes) that will be JIT compiled at runtime by the .NET CLR. The CodeGen function creates a program with a Main function that then executes whatever code is in your Emo file. It creates a real .NET executable.
So, now that you have all of these classes that will create our programs, how are they supposed to be used? Fair question, take the function below and add it to the Emo class on the Emulator page. Then just call emo.Compile(); and you'll be good to go. Contrariwise, you could just replace SourceFile with the path to the source file and be good to go.
I would like to say thanks to Joel Pobar who wrote http://msdn.microsoft.com/en-us/magazine/cc136756.aspx making this compiler possible. I didn't know where to start writing a .NET compiler before I read that page and saw his example.
EMO language .NET compiler Copyright (C) 2010 Sean Fife This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
1. Grammar
The grammar for this language is fairly simple and is spelled out below:
<stmt> := <eyes><nose><mouth> | <eyes><nose><mouth><bottom> | <top><eyes><nose><mouth> | <stmt><whitespace><stmt> | <stmt><stmt> <eyes> := : | ; | X | = <nose> := ^ | - | o | c | <nose><nose> <mouth> := ( | ) | { | } | <pipecharacter> | @ | P <top> := < <bottom> := > <whitespace> := space | tab | newline | linefeed | <whitespace><whitespace> <pipecharacter> := |
From the grammar, we can start to build up classes that will represent the language inside the compiler.
2. Base/common classes
These base classes, some derivatives, and enumerations all make up the definition of the language inside the compiler. The scanner will then use these to create a tree of the tokens in your source file, which in turn the parser uses to determine validity and passes onto the code generator.
It's important to note that the language grammar is defined recursively. This is true of most languages.
All of these classes make use of certain using statements, so don't get confused when you see these out and about in the code:
using Reflect = System.Reflection; using Emit = System.Reflection.Emit; using Collections = System.Collections.Generic; using IO = System.IO; using Text = System.Text;
So, here are the base/common classes.
public class BaseEmoticon : Stmt { public bool IsEye(char ch) { foreach (char c in Eyes) { if (ch == c) { return true; } } return false; } public bool IsNose(char ch) { foreach (char c in Noses) { if (ch == c) { return true; } } return false; } public bool IsMouth(char ch) { foreach (char c in Mouths) { if (ch == c) { return true; } } return false; } public bool IsHat(char ch) { foreach (char c in Hats) { if (ch == c) { return true; } } return false; } public bool IsBeard(char ch) { foreach (char c in Beards) { if (ch == c) { return true; } } return false; } } /* <stmt> := <eyes><nose><mouth> | <eyes><nose><mouth><bottom> | <top><eyes><nose><mouth> | <stmt><stmt> */ public class Stmt { public static char[] Eyes = { ':', ';', 'X', '=' }; public static char[] Noses = { '^', '-', 'o', 'c' }; public static char[] Mouths = { ')', '(', '|', '{', '}', 'P', '@' }; public static char[] Hats = { '<' }; public static char[] Beards = { '>' }; } public class Emoticon : Stmt { public Stmt Eye; public Stmt Nose; public Stmt Mouth; public override string ToString() { string ret = ""; if (Eye != null) ret += Eye.ToString(); if (Nose != null) ret += Nose.ToString(); if (Mouth != null) ret += Mouth.ToString(); return ret; } } public class Sequence : Stmt { public Stmt First; public Stmt Second; } //<eyes> := : | ; | X | = public class Eye : Stmt { public EyeOps EyeOp; public override string ToString() { return Eyes[(int)EyeOp].ToString(); } } //<nose> := ^ | - | o | c | <nose><nose> public class Nose : Stmt { public NoseOps NoseOp; public override string ToString() { return Noses[(int)NoseOp].ToString(); } } public class NoseSequence : Stmt { public Stmt First; public Stmt Second; } public class Mouth : Stmt { public MouthOps MouthOp; public override string ToString() { return Mouths[(int)MouthOp].ToString(); } } public class Loop : Stmt { public Stmt Body; } public enum EyeOps { Colon, SemiColon, X, Equals } public enum NoseOps { Caret, Dash, LowerCaseO, LowerCaseC, } public enum MouthOps { RightParen, LeftParen, Pipe, LeftCurlyBrace, RightCurlyBrace, CapitalP, At } public enum Tops { LeftAngleBracket } public enum Bottoms { RightAngleBracket }
The base of them all is the Stmt class, this is so we can utilize polymorphism and create a list of Stmt objects to hold any of these objects.
3. The Scanner
The scanner is probably the simplest looking of the classes. It's important to note that this is where comments get removed from the code.
public class Scanner : BaseEmoticon { private readonly Collections.IListresult; public Scanner(IO.TextReader input) { this.result = new Collections.List (); this.Scan(input); } public Collections.IList Tokens { get { return this.result; } } private void Scan(IO.TextReader input) { while (input.Peek() != -1) { char token = (char)input.Peek(); if (IsHat(token) || IsEye(token) || IsNose(token) || IsMouth(token) || IsBeard(token)) { this.result.Add(token); } else if (token == '~') { //ignore comments token = (char)input.Peek(); while (token != '\n' && token != '\r' && input.Peek() != -1) { token = (char)input.Read(); token = (char)input.Peek(); } if ((char)input.Peek() == '\r') { input.Read(); } } input.Read(); } } }
4. The Parser
This is pretty much the syntax checker of the compilation. If you read through the code, you'll see that everything is declared as what they objects are, but are stored as Stmt objects in lists.
public class Parser : BaseEmoticon { private int index; private Collections.IList5. The Emittertokens; private readonly Stmt result; public Parser(Collections.IList tokens) { this.tokens = tokens; this.index = 0; this.result = this.ParseStmt(); if (this.index != this.tokens.Count) throw new System.Exception("expected EOF"); } public Stmt Result { get { return result; } } private Stmt ParseStmt() { Stmt result; if (this.index == this.tokens.Count) { throw new System.Exception("expected statement, got EOF"); } if (IsEye(this.tokens[this.index])) { //Start an emoticon command Emoticon emo = new Emoticon(); emo.Eye = GetEyeFromChar(this.tokens[this.index]); this.index++; emo.Nose = this.ParseNoses(); if (emo.Nose == null) { this.index--; } if (this.index < this.tokens.Count && IsMouth(this.tokens[this.index])) { emo.Mouth = GetMouthFromChar(this.tokens[this.index]); } else { throw new Exception("Expected mouth not found"); } this.index++; result = emo; } else if (IsHat(this.tokens[this.index])) { Loop loop = new Loop(); this.index++; loop.Body = this.ParseStmt(); result = loop; if (this.index == this.tokens.Count || !IsBeard(this.tokens[this.index])) { throw new System.Exception("unterminated loop body"); } this.index++; } else { throw new Exception("Unexpected input: " + this.tokens[this.index]); } if (this.index < this.tokens.Count && (IsEye(this.tokens[this.index]) || IsHat(this.tokens[this.index]) || IsBeard(this.tokens[this.index]))) { if (this.index < this.tokens.Count && !IsBeard(this.tokens[this.index])) { Sequence sequence = new Sequence(); sequence.First = result; sequence.Second = this.ParseStmt(); result = sequence; } } return result; } private Stmt ParseNoses() { Stmt result = null; if (this.index < this.tokens.Count && IsNose(this.tokens[this.index])) { Nose n = GetNoseFromChar(this.tokens[this.index]); result = n; this.index++; if (this.index < this.tokens.Count && IsNose(this.tokens[this.index])) { NoseSequence sequence = new NoseSequence(); sequence.First = result; sequence.Second = this.ParseNoses(); result = sequence; } else { //this.index--; } } else { this.index++; } return result; } private Eye GetEyeFromChar(char eye) { Eye e = new Eye(); switch (eye) { case ':': e.EyeOp = EyeOps.Colon; break; case ';': e.EyeOp = EyeOps.SemiColon; break; case '=': e.EyeOp = EyeOps.Equals; break; case 'X': e.EyeOp = EyeOps.X; break; } return e; } private Mouth GetMouthFromChar(char mouth) { Mouth m = new Mouth(); switch (mouth) { case '(': m.MouthOp = MouthOps.LeftParen; break; case ')': m.MouthOp = MouthOps.RightParen; break; case '{': m.MouthOp = MouthOps.LeftCurlyBrace; break; case '}': m.MouthOp = MouthOps.RightCurlyBrace; break; case '|': m.MouthOp = MouthOps.Pipe; break; case '@': m.MouthOp = MouthOps.At; break; case 'P': m.MouthOp = MouthOps.CapitalP; break; } return m; } private Nose GetNoseFromChar(char nose) { Nose n = new Nose(); switch (nose) { case '^': n.NoseOp = NoseOps.Caret; break; case '-': n.NoseOp = NoseOps.Dash; break; case 'o': n.NoseOp = NoseOps.LowerCaseO; break; case 'c': n.NoseOp = NoseOps.LowerCaseC; break; } return n; } }
This is where the compilation magic happens. Here we actually output the IL (CLR op codes) that will be JIT compiled at runtime by the .NET CLR. The CodeGen function creates a program with a Main function that then executes whatever code is in your Emo file. It creates a real .NET executable.
public class CodeGen : BaseEmoticon { Emit.ILGenerator il = null; Collections.Dictionary6. Putting it all togethersymbolTable; bool isLoop = false; public CodeGen(Stmt stmt, string moduleName) { if (IO.Path.GetFileName(moduleName) != moduleName) { throw new System.Exception("can only output into current directory!"); } Reflect.AssemblyName name = new Reflect.AssemblyName(IO.Path.GetFileNameWithoutExtension(moduleName)); Emit.AssemblyBuilder asmb = System.AppDomain.CurrentDomain.DefineDynamicAssembly(name, Emit.AssemblyBuilderAccess.Save); Emit.ModuleBuilder modb = asmb.DefineDynamicModule(moduleName); Console.WriteLine(string.Format("Full Executable Path: {0}", modb.FullyQualifiedName)); Emit.TypeBuilder typeBuilder = modb.DefineType("Emo"); Emit.MethodBuilder methb = typeBuilder.DefineMethod("Main", Reflect.MethodAttributes.Static, typeof(void), System.Type.EmptyTypes); // CodeGenerator this.il = methb.GetILGenerator(); this.symbolTable = new Collections.Dictionary (); // Go Compile! //INITIALIZE MEMORY ////push 256 * 4 onto the stack, for an array of 256 ints int x = 256 * 4; this.il.Emit(Emit.OpCodes.Ldc_I4, x); //allocate the memory this.il.Emit(Emit.OpCodes.Localloc); //duplicate the memory address twice so we have 3 copies of it on the stack this.il.Emit(Emit.OpCodes.Dup); this.il.Emit(Emit.OpCodes.Dup); //pop the memory location to local variable 1, this is the pointer to the memory block //it never changes through the program this.symbolTable["WorkingRegister"] = this.il.DeclareLocal(typeof(int)); this.symbolTable["MemoryStart"] = this.il.DeclareLocal(typeof(int)); this.symbolTable["MemoryAddress"] = this.il.DeclareLocal(typeof(int)); this.symbolTable["Register"] = this.il.DeclareLocal(typeof(int)); this.symbolTable["TopOfStack"] = this.il.DeclareLocal(typeof(int)); this.il.Emit(Emit.OpCodes.Stloc, symbolTable["MemoryStart"]); //pop the memory location to local variable 2, this is the current memory pointer this.il.Emit(Emit.OpCodes.Stloc, symbolTable["MemoryAddress"]); //Now we have one copy of the memory address left, and we initialize the memory to zero //Push zero, this is the value that the memory will be initialized to this.il.Emit(Emit.OpCodes.Ldc_I4, 0); //push 256 * 4, this is the size of the memory block this.il.Emit(Emit.OpCodes.Ldc_I4, x); ////Do it! this.il.Emit(Emit.OpCodes.Initblk); //Now we build the program this.GenStmt(stmt); this.il.Emit(Emit.OpCodes.Call, typeof(System.Console).GetMethod("ReadKey", new System.Type[] { })); this.il.Emit(Emit.OpCodes.Pop); //free the memory, this happens automatically upon returning from the function il.Emit(Emit.OpCodes.Ret); typeBuilder.CreateType(); modb.CreateGlobalFunctions(); asmb.SetEntryPoint(methb); asmb.Save(moduleName); this.symbolTable = null; this.il = null; } private void GenStmt(Stmt stmt) { /* * Working Register: (push) ldloc.0, (pop) stloc.0 * Register 1: (push) ldloc.3, (pop) stloc.3 * Stack: Stack used by CLR * Initial Memory Address: (push) ldloc.1, (pop) stloc.1 * Current Memory Address: (push) ldloc.2, (pop) stloc.2 */ if (stmt is Sequence) { Sequence seq = (Sequence)stmt; this.GenStmt(seq.First); this.GenStmt(seq.Second); } if (stmt is Loop) { Emit.Label test = this.il.DefineLabel(); this.il.Emit(Emit.OpCodes.Br, test); Emit.Label body = this.il.DefineLabel(); this.il.MarkLabel(body); isLoop = true; Loop l = (Loop)stmt; this.GenStmt(l.Body); this.il.MarkLabel(test); //Code for the loop condition this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]); this.il.Emit(Emit.OpCodes.Ldind_I4); this.il.Emit(Emit.OpCodes.Ldc_I4_0); this.il.Emit(Emit.OpCodes.Bgt, body); } if (stmt is Emoticon) { Emoticon e = (Emoticon)stmt; Eye eye = (Eye)e.Eye; Mouth m = (Mouth)e.Mouth; #region EyeOperations switch (eye.EyeOp) { case EyeOps.Colon: //read from the register //and push onto the stack this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["Register"]); break; case EyeOps.SemiColon: //read from the current memory location //and push onto the stack this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]); this.il.Emit(Emit.OpCodes.Ldind_I4); break; case EyeOps.Equals: //read from the keyboard //and push onto the stack this.il.Emit(Emit.OpCodes.Call, typeof(System.Console).GetMethod("Read")); break; case EyeOps.X: //pop from the stack //and store in the working register //May have to change operation to have a language specific stack // that is separate from the operation stack break; default: break; } #endregion #region NoseOperations if (e.Nose != null) { Nose n = (Nose)e.Nose; switch (n.NoseOp) { case NoseOps.Caret: //increment working area by one if (eye.EyeOp == EyeOps.Colon) { this.il.Emit(Emit.OpCodes.Ldc_I4_1); this.il.Emit(Emit.OpCodes.Add); } else if (eye.EyeOp == EyeOps.SemiColon) { this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]); this.il.Emit(Emit.OpCodes.Ldc_I4_4); this.il.Emit(Emit.OpCodes.Add); this.il.Emit(Emit.OpCodes.Stloc, symbolTable["MemoryAddress"]); } else { throw new Exception("Invalid eye-nose pair: " + e.ToString()); } break; case NoseOps.Dash: //decrement working area by one if (eye.EyeOp == EyeOps.Colon) { this.il.Emit(Emit.OpCodes.Ldc_I4_1); this.il.Emit(Emit.OpCodes.Sub); } else if (eye.EyeOp == EyeOps.SemiColon) { this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]); this.il.Emit(Emit.OpCodes.Ldc_I4_4); this.il.Emit(Emit.OpCodes.Sub); this.il.Emit(Emit.OpCodes.Stloc, symbolTable["MemoryAddress"]); } else { throw new Exception("Invalid eye-nose pair: " + e.ToString()); } break; case NoseOps.LowerCaseC: //shift right this.il.Emit(Emit.OpCodes.Ldc_I4_1); this.il.Emit(Emit.OpCodes.Shr); break; case NoseOps.LowerCaseO: //shift left this.il.Emit(Emit.OpCodes.Ldc_I4_1); this.il.Emit(Emit.OpCodes.Shl); break; default: break; } } #endregion #region MouthOperations switch (m.MouthOp) { case MouthOps.LeftParen://( //Write to register //this.il.Emit(Emit.OpCodes.Pop); this.il.Emit(Emit.OpCodes.Stloc, symbolTable["WorkingRegister"]); this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]); this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["WorkingRegister"]); this.il.Emit(Emit.OpCodes.Stind_I4); break; case MouthOps.RightParen://) //Write to memory this.il.Emit(Emit.OpCodes.Stloc, symbolTable["Register"]); break; case MouthOps.CapitalP: //push onto the stack //BUT the value is already on the stack, so do nothing. //May have to change operation to have a language specific stack // that is separate from the operation stack break; case MouthOps.Pipe://| //no op, do nothing, or rather pop the top value off the stack // to get rid of it because it's no longer needed this.il.Emit(Emit.OpCodes.Pop); break; case MouthOps.LeftCurlyBrace://{ //Copy memory value to register this.il.Emit(Emit.OpCodes.Pop);//remove the working location because it's no longer needed this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]); this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["Register"]); this.il.Emit(Emit.OpCodes.Stind_I4); break; case MouthOps.RightCurlyBrace://} //Copy register value to memory location this.il.Emit(Emit.OpCodes.Pop);//remove the working location because it's no longer needed this.il.Emit(Emit.OpCodes.Ldloc, symbolTable["MemoryAddress"]); this.il.Emit(Emit.OpCodes.Ldind_I4); this.il.Emit(Emit.OpCodes.Stloc, symbolTable["Register"]); break; case MouthOps.At://@ //print to screen this.il.Emit(Emit.OpCodes.Call, typeof(char).GetMethod("ConvertFromUtf32", new System.Type[] { typeof(int) })); this.il.Emit(Emit.OpCodes.Call, typeof(System.Console).GetMethod("Write", new System.Type[] { typeof(string) })); break; default: break; } #endregion } } }
So, now that you have all of these classes that will create our programs, how are they supposed to be used? Fair question, take the function below and add it to the Emo class on the Emulator page. Then just call emo.Compile(); and you'll be good to go. Contrariwise, you could just replace SourceFile with the path to the source file and be good to go.
public void Compile() { Scanner scanner = null; using (TextReader input = File.OpenText(SourceFile)) { scanner = new Scanner(input); } Parser parser = new Parser(scanner.Tokens); string file = Path.GetFileNameWithoutExtension(SourceFile) + ".exe"; CodeGen codeGen = new CodeGen(parser.Result, file); }
I would like to say thanks to Joel Pobar who wrote http://msdn.microsoft.com/en-us/magazine/cc136756.aspx making this compiler possible. I didn't know where to start writing a .NET compiler before I read that page and saw his example.