See the JavaCC documentation for details. Also see the mini-tutorial on the JavaCC site for tips on writing lexer specifications from which JavaCC can generate. At the end of the tutorial, we will parse a SQL file and extract table specifications ( please note that this is for an illustrative purpose; complete. In this first edition of the new Cool Tools column, Oliver Enseling discusses JavaCC — the Java Compiler Compiler. JavaCC facilitates.
|Published (Last):||8 October 2017|
|PDF File Size:||18.86 Mb|
|ePub File Size:||4.10 Mb|
|Price:||Free* [*Free Regsitration Required]|
Erik Lievaart Recently, I wanted to write my own parser for a hobby project. In my journey I did find, however, that there is a lack of a gentle introduction into Navacc.
The articles I found were very high level and theoretical, or they always solved the same problems basic arithmetic expressions.
I plan on writing a couple of articles that are very hands on, very practical. This first article will be a quick getting started guide aimed to get you up and running quickly. The next paragraph will describe from a theoretical perspective what JavaCC is and why one would want to use a parser generator. In “Creating a workspace” I will show you how to get the hello world project up and running in a minute or so. Then I will discuss the demo project for JavaCC in depth. It is a good starting point for creating a parser.
It is an open source parser generator. You can use it to create your own custom parser. A parser is a program which uses validates an input file against a grammar. It reads a file in one format and if valid converts it to another format, usually executable code. We typically create parsers when we are creating our own programming language.
Sometimes we create them for templating languages or for tokenizing raw text. Creating parsers by hand is error prone and parser generators offer a higher level syntax to aid process. For simple text manipulation one would use String manipulation libraries, for moderately complex problems regex is enough.
Using JavaCC for such problems is overkill and will cause mor e problems than is solves. When things get complex, however, having a parser generator at your disposal is a life saver. One of the features of JavaCC that I really like, is that the generated parser has no dependency on javacc.
Erik’s Java Rants
In other words, javacc creates pure java source files that can run without any external dependencies. Antlr in comparison requires the antlr jar to be present at run time.
My focus here is practical, so this is the only theory on the matter I will be discussing here. I am not going to discuss concepts such as LL 1 grammars or recursive descent parsing. I do recommend acquainting oneself with the theory behind parsing, when working with parser generators. Creating a workspace In this section we will create a basic JavaCC project in a few minutes, so you can start learning quickly.
Create a new java project in an eclipse workspace. You can delete javacf src folder created by Eclipse, we will use our own source folders. Create a folder in the root of the workspace named demo. Download the tugorial archive and unpack it in the demo folder: Refresh the root of the project so that eclipse sees the files.
It should look something like this: And you’re good to go. Simply double click run-parser to run the parser or unit to javac the unit tests.
You probably want to compile the java sources in eclipse as well for tooling support. If all goes well, they will move to the root of the package explorer with a package symbol inside a folder. This adds junit to the build path and ensures the junit tests compile.
An Introduction to JavaCC
Your project should look like this now: The files will not compile unless you run jvaacc javacc target and refresh the workspace. This is because a the java files depend on the parser and it hasn’t been generated yet. JavaCC creates code that will give warnings in Eclipse. That is perfectly normal. There is a javacc plugin for Eclipse, but I’ve never tried it.
In the next section, I will create a build file for compiling, running and testing the parser. Lastly, we tutofial examine the source files and generated results.
tutkrial Here is a picture of the folder structure in the sample project: In the root of the project you can see the ant build. The demo uses two jar files found in the lib directory 1 javacc.
They will automatically be picked up by the build file. The sources are in tutorila mavenlike folder structure, where main is used for application code and test for unit tests.
The boot package contains files with a main method, which will be invoked from the build file for running the demo. The demo package will be filled by JavaCC with the parser generated from the grammar file javacc. I have used a stop sign to indicate directories that should not be modified, because they are generated.
In the resource dir we see the grammar file javacc. It is empty for now, but you could specify manifest entries here if you wanted to. Lastly, the build directory will contain any artifacts generated by the build file. The jar file will be created in the dist directory. Ant Build File build. Any java classes in the test directory whose names end with a capital ‘U’ are executed. There is a external target that creates a jar file for the javac. This way, one can create the parser and attach it to another project as a separate jar file.
I recommend this approach, because JavaCC generated code generates a lot of warnings. These warnings might obfuscate problems in your own sources if not placed in a separate jar. Lastly, I the options target dumps all configuration tutoiral for the javacc parser on the command line.
These can be set on the command line or in the grammar file next section. Options set on the command line will override those in the grammar file. Note that when compiling with JavaCC, compilation is a 2-step process.
First JavaCC creates the parser from the grammar file. The parser is generated as java source files. Java source files need to be compiled before they can be executed, thus the second step. In the build file target javacc generates the parser, target compile-main jsvacc the parser, lastly compile-test compiles the unit tests. These targets are called implicitly through dependencies where needed. Every JavaCC project has a grammar file which describes valid input.
The grammar format defines the tokens used, the parser rules and can even be used to specify code that has to be executed while parsing. Let us take a look at the first 15 lines of the grammar file first: You can specify options at the start of the file, and as you can see, I set the static option to false. You will see java code here, because the java code is copied to the java source file for the parser literally.
Specify the name of the parser class to generate inside the parentheses. The name you specify here is the name tutoria the class you will be invoking from your application. Also, it is used as the base name for two tutoriql classes: I will discuss the generated classes in the next section, this section is focused solely on the grammar file.
Note that when JavaCC creates the parser it is created in the directory you specify. JavaCC does not place the generated parser in a subdirectory matching the package declaration.
Getting started in JavaCC
In other words, if you change the package jafacc, you will need to modify the destination directory in the build file as well. If you forget to do this, the java compiler will complain that the declared package does not match the expected package.
If the parser uses classes that require imports, then add them here. We will see why I need those imports later. A minimal parser declaration would have no package declaration, no imports, no options configured, but it will normally contain at least the above. If you desire so, you can add a main method or other custom methods as well.
The next section of the grammar file is used for lexical analysis: Typically whitespaces and newline characters are skip characters. We don’t want the parser to fail when we encounter a new line, but we also don’t have any use for them in the parser. Skip characters will not be passed onto the parser unless they occur somewhere as part of a token. Note that the tutorlal is still aware of the line number even when carriage returns are skipped.
Next we define tutoriwl token “NAME”, which consists of one or more letters lower or upper case. This is the name we are going to use for our hello world application.
Tokens and lexical analysis will be explored in the next installment of this tutoril. Finally, we add thtorial single parser rule to our grammar file: A method with this name will be added to the created parser.