Java 1.5 Parser - Scanner and Lexer - Part 1

<- back

Next Chapter Scanner and Lexer - part 2


The lexical specification is found in Chapter 3 - Lexical Structure in the java language specification, third edition.

Skeleton Lexer, Scanner and Parser files

Before starting converting java syntax rules into Coco/R EBNF rules, a few skeleton files are needed. This is good start also for a new project to check that the environment is set up correctly and that the skeleton files compiles and that the produced java files compiles without errors (there might be warnings, though).

The Skeleton JFlex Grammar file

A JFlex grammar file is a good starting point. This file is named lexer.flex and will produce the file Lexer.java.

JFlex grammar Rule
package org.structuredparsing.java15grammar.cocor.parser_jflex_scanner;

import org.structuredparsing.java15grammar.cocor.parser_jflex_scanner.Parser;

class Token {
public Token() { this.kind = Parser._EOF; }
public Token( int kind ) { this.kind = kind;this.val = ""; }
public Token( int kind, String val ) { this.kind = kind; this.val = val; }
public Token( int kind, int col, int line, int charPos ) { this.kind = kind; this.col = col;
this.line = line; this.charPos = charPos; this.val = ""; }
public Token( int kind, int col, int line, int charPos, String val ) { this.kind = kind;
this.col = col; this.line = line; this.charPos = charPos; this.val = val; }
public int kind; // token kind
public int pos; // token position in bytes in the source text (starting at 0)
public int charPos; // token position in characters in the source text (starting at 0)
public int col; // token column (starting at 1)
public int line; // token line (starting at 1)
public String val; // token value
public Token next; // ML 2005-03-11 Peek tokens are kept in linked list
}

%%
%class Lexer
%public
%type Token
%line
%column
%char
%eofval{
return new Token(Parser._EOF, yycolumn + 1, yyline + 1, yychar);
%eofval}

%{
StringBuffer textcontent = new StringBuffer();
int nColumn, nLine, nChar, nCharCount;
%}

%%

.|\n { return new Token(Parser._Illegaltoken, yycolumn + 1, yyline + 1, yychar, yytext()); }

Note: The above skeleton file will compile in jflex tool, but will give compiler errors in javac, because the class Parser is not yet defined.

Note 2: Consider the creation of the illegal token. The column and line fields is incremented with 1 versus the values in yycolumn and yyline. This will result in that tokens start at column 1 and line 1.

The Scanner file

A "glue" file is needed in between the Lexer class and the Coco/R generated Parser file. The Scanner is that glue. This is the exact same Scanner file found in previous tutorials, but with a updated package statement. This file is named Scanner.java.

package org.structuredparsing.java15grammar.cocor.parser_jflex_scanner;
 
import java.io.InputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
 
public class Scanner {
 
    private List< Token > buffer = null;
    private int currentBufferIndex = 0;
    private int peekBufferIndex = 0;
    private Lexer lexer = null;
 
    public Scanner(InputStream s) {
        lexer = new Lexer( s );
    }
 
    public Token Scan() {
        Token token = null;
        if ( buffer == null ) {
            buffer = new ArrayList< Token >();
            do {
                try {
                    token = lexer.yylex();
                } catch (IOException e) {
                    e.printStackTrace();
                    buffer.add( new Token( Parser._Illegaltoken ) );
                    token = new Token( Parser._EOF );
                }
                buffer.add( token );
            } while( token.kind != Parser._EOF );
        }
        token = buffer.get(currentBufferIndex);
        ++currentBufferIndex;
        return token;
    }
 
    public Token Peek() {
        if ( peekBufferIndex < currentBufferIndex ) {
            peekBufferIndex = currentBufferIndex;
        }
        if ( peekBufferIndex >= buffer.size() ) {
            peekBufferIndex = buffer.size() - 1;
        }
        Token token = buffer.get(peekBufferIndex);
        ++peekBufferIndex;
        return token;
    }
 
    public void ResetPeek() {
        peekBufferIndex = currentBufferIndex + 1;
    }
}

The skeleton Coco/R Grammar file

A skeleton Coco/R grammar file that produce the file Parser.java is also needed. This file is named parser.atg.

Coco/R EBFN Rule
package org.structuredparsing.java15grammar.cocor.parser_jflex_scanner;

COMPILER Java15

TOKENS
Illegaltoken

PRODUCTIONS

Java15 = Illegaltoken.

END Java15.

Compiling the skeleton files

In this tutorial, Eclipse IDE for Java Developers, is used but any IDE or text editor may be used. The following root paths are used in commands:

  • PROJECT_ROOT_PATH - this is the Eclipse project path. For example ~/Document/workspace/java_15_parser/.
  • FLEX_ROOT_PATH - where JFlex is installed.
  • COCOR_ROOT_PATH - where Coco/R is installed.

It is neat to create a bourn again script file:

BASH Script
#!/bin/bash

java -jar FLEX_ROOT_PATH/jflex-1.4.3/lib/JFlex.jar
-d PROJECT_ROOT_PATH/src/org/structuredparsing/java15grammar/cocor/parser_jflex_scanner/
PROJECT_ROOT_PATH/grammar/parser_external_scanner/lexer.flex

java -jar COCOR_ROOT_PATH/Coco.jar
PROJECT_ROOT_PATH/grammar/parser_external_scanner/parser.atg
-frames COCOR_ROOT_PATH
-o PROJECT_ROOT_PATH/src/org/structuredparsing/java15grammar/cocor/parser_jflex_scanner/

Running the script the Lexer.java and Parser.java should be created without any errors or warnings. After refreshing the projects (F5-key or right-click on the project folder and select refresh in the popup menu) the Eclipse project should look something like this

Eclipse_java_15_skeleton_files.png

Time to do some JFlex macro definitions and tokens for Java 1.5

Next Chapter Scanner and Lexer - part 2


<- back

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License