Coco/R Parser Creating Grammar Rules - part 2

<- back to Coco/R parser with Internal Scanner - part 1
<- back to Coco/R parser with external Flex Scanner - part 1

Go to Creating Syntax Tree part 3

Creating Grammar Rules

In this part we will concentrate on the Syntax of Coco/R on page 34 in the Coco/R PDF manual. Coco/R grammar must be in the EBNF format, and luckily the Syntax of Coco/R is in that format. But, first a short introduction to How to write Coco/R grammar rules.

Coco/R Grammar Syntax in Short

This section explains the Parser Specification part (see the Coco/R Grammar Syntax in Short in the page before).

Parser Specification

These rules operate on a stream of tokens. The rules reside below the PRODUCTIONS statement in the grammar file.

A Coco/R grammar rule have the following syntax


This syntax is some what simplified. Check the EBNF of the Syntax in the Coco/R PDF manual on all details of how to write grammar rules.

Lets simplify the syntax a little bit more. Consider this syntax:

identifier = <GRAMMAR_RULE_EXPRESSION> '.'


  • tokens - these are definied in the TOKENS part.
  • rule identifiers - these are defined in this part, the PRODUCTIONS part.

More complex rules may be formed by using [], {}, () and |:

  • [EXPRESSION] - means zero or one.
  • {EXPRESSION} - means zero or many.
  • (EXPRESSION) - means to group the elements in the expression together.


An example of rule that defines a java class is the ClassDeclaration below. It is though simplified. ClassDeclaration expects the keyword class represented with the token Class followed by an Identifier token and a ClassBody. ClassBody is an example of a grammar rule that is defined else where in the grammar (in the PRODUCTIONS part).

ClassDeclaration =
    Class Identifier ClassBody.

The ClassBody is defined below. It expects the character '{', represented by the token LeftCurlyBrace followed by zero or more ClassBodyDeclaration:s. ClassBodyDeclaration is also a grammar rule that must be defined in the grammar. The ClassBody rule expects a character '}', represented by a RightCurlyBrace token at the end of the token stream.

ClassBody =
    LeftCurlyBrace {ClassBodyDeclaration} RightCurlyBrace.

The Statement rule define a few Java statements. It is compound with token streams separated with the OR element (a vertical bar). That means, that a Statement expects one of the several options. Also note that the token stream beginning with the Assert token contains an optional token stream of the Colon token followed by an Expression rule.

Statement =
    Assert Expression [ Colon Expression] Semicolon |
    If ParExpression Statement |
    Else Statement |
    For LeftParenthesis ForControl RightParenthesis Statement |
    While ParExpression Statement |
    Do Statement While ParExpression Semicolon.

The Coco/R Grammar

The EBNF of the Coco/R Syntax in the Coco/R PDF manual at page 34 is not that easy to read if EBNF is a new experience. It is, though, possible to visualize how rules are connected to each other.


Original size of diagram

The top most rule, CocoR, resides at the top in the diagram. This could be helpful to understand the grammar. In this diagram, only rules, non-terminal tokens, are visualized as boxes. Terminal tokens (and non-terminals) are described inside the boxes in the rule definitions. A colored box indicate that the rule is used elsewhere. A brown box indicate that the rule is not referenced from other rules.

The Top Grammar Rule CocoR

All grammars have a starting grammar definition, a top grammar rule. In this grammar it is Coco/R rule. It is defined in the Syntax of Coco/R as

Cocol = 
  {ANY}              // using clauses in C#, import clauses in Java, 
                     // #include clauses in C++ 
  "COMPILER" ident 
  {ANY}              // global fields and methods 
  "END" ident '.'.

In this grammar the above rules are translated into


CocoR = 
  Compiler Ident
  End Ident Point.

This is a quite strait forward translation. In Coco/R it is possible to define tokens in the PRODUCTIONS part, like "COMPILER". However, this is already defined in the TOKENS part as

  Compiler = "COMPILER".

The keyword ANY need some comments. This is not the token Any as defined in the TOKENS part

  Any = "ANY".

ANY means any token may be expected here. ANY in between { } means zero or more tokens of any kind. That makes sense, because before the COMPILER keyword the user code may consist of any package and import statements (that should be copied verbatim into the parser).

Also directly after the COMPILER keyword is an area where user source code may exist. So, the  {ANY} statement makes sense here too.

The ScannerSpecification grammar rule

ScannerSpecification is defined as in the Syntax of Coco/R as

ScannerSpecification =  
  ["CHARACTERS" {SetDecl}] 
  ["TOKENS"  {TokenDecl}] 
  ["PRAGMAS" {PragmaDecl}] 
SetDecl        = ident '=' Set. 
Set            = BasicSet {('+'|'-') BasicSet}. 
BasicSet       = string | ident | char [".." char] | "ANY". 
TokenDecl      = Symbol ['=' TokenExpr '.']. 
TokenExpr      = TokenTerm {'|' TokenTerm}. 
TokenTerm      = TokenFactor {TokenFactor} ["CONTEXT" '(' TokenExpr ')']. 
TokenFactor    = Symbol 
               | '(' TokenExpr ')' 
               | '[' TokenExpr ']' 
               | '{' TokenExpr '}'. 
Symbol         = ident | string | char. 
PragmaDecl     = TokenDecl [SemAction]. 
CommentDecl    = "COMMENTS" "FROM" TokenExpr "TO" TokenExpr ["NESTED"]. 
WhiteSpaceDecl = "IGNORE" (Set | "CASE").

This is translated into

ScannerSpecification =  
  [Character {SetDecl}] 
  [Tokens  {TokenDecl}] 
  [Pragmas {PragmaDecl}] 

SetDecl = Ident Equal Set Point.

Set = BasicSet {(Plus|Minus) BasicSet}.

BasicSet = String | Ident | Char [PointPoint Char] | Any.

TokenDecl =
  Symbol [Equal TokenExpr Point].

TokenExpr = TokenTerm {VerticalBar TokenTerm}.

TokenTerm = TokenFactor {TokenFactor} [Context LeftParenthesis TokenExpr RightParenthesis].

TokenFactor =
  Symbol |
  LeftParenthesis TokenExpr RightParenthesis |
  LeftSquareBracket TokenExpr RightSquareBracket |
  LeftCurlyBrace TokenExpr RightCurlyBrace.

Symbol =
  Ident | String | Char.

PragmaDecl     = TokenDecl [SemAction].
CommentDecl    = Comments From TokenExpr To TokenExpr [Nested].
WhiteSpaceDecl = Ignore Set.

This is also translated quite as is. WhiteSpaceDecl need a comment, though. In the manual there is a typo. Note, the difference.

The ParserSpecification grammar rule

This is defined as in the Syntax of Coco/R as

ParserSpecification = "PRODUCTIONS" {Production}. 
Production  = ident [Attributes] [SemAction] '=' Expression '.'. 
Expression  = Term {'|' Term}. 
Term        = [[Resolver] Factor {Factor}]. 
Factor      = ["WEAK"] Symbol [Attributes] 
            | '(' Expression ')' 
            | '[' Expression ']' 
            | '{' Expression '}' 
            | "ANY" 
            | "SYNC" 
            | SemAction. 
Attributes  = '<' {ANY} '>' | "<." {ANY} ".>". 
SemAction   = "(." {ANY} ".)". 
Resolver    = "IF" '(' {ANY} ')' .

This is translated into

ParserSpecification =
  Productions {Production}.

Production =
  Ident [Attributes] [SemAction] Equal Expression Point.

Expression =
  Term {VerticalBar Term}.

Term =
  [[Resolver] Factor {Factor}].

Factor =
  [Weak] Symbol [Attributes]  |
  LeftParenthesis Expression RightParenthesis |
  LeftSquareBracket Expression RightSquareBracket  |
  LeftCurlyBrace Expression RightCurlyBrace  |
  Any  |
  Sync  |

Resolver = If LeftParenthesis Ident LeftParenthesis RightParenthesis RightParenthesis.

END CocoR.

Last part

Go to Creating Syntax Tree part 3

<- back to Coco/R parser with Internal Scanner - part 1
<- back to Coco/R parser with external Flex Scanner - part 1

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License