How to find the length of a token in antlr? -

i trying create grammar accepts character or number or anything, provided length equal 1.

is there function check length?

edit

let me make question more clear example. wrote following code:

grammar first;  tokens {     set =   'set';     val =   'val';     und =   'und';     con =   'con';     on  =   'on';     off =   'off'; }  @parser::members {   private boolean inbounds(token t, int min, int max) {     int n = integer.parseint(t.gettext());     return n >= min && n <= max;   } }  parse   :   set expr;  expr    :   val('u'('e')?)? string |         und('e'('r'('l'('i'('n'('e')?)?)?)?)?)? (on | off) |         con('n'('e'('c'('t')?)?)?)? onechar     ;  char    :   'a'..'z';  digit   :   '0'..'9';  string  :   (char | digit)+;  dot :   .;  onechar :   dot { $dot.text.length() == 1;} ;  space  : (' ' | '\t' | '\r' | '\n') {$channel=hidden;};

i want grammar following things:

accept commands like: 'set value abc' , 'set underli on' , 'set conn #'. grammar should intelligent enough accept incomplete words 'underl' instead of 'underline. etc etc.
the third syntax: 'set connect onechar' should accept character, 1 character. can numeric digit or alphabet or special character. getting compiler error in generated parser file because of this.
the first syntax: 'set value' should accept possible strings, on , off. when give like: 'set value offer', grammar failing. think happening because have token 'off'.

in grammar 3 requirements have listed above not working fine. don't know why.

there mistakes and/or bad practices in grammar:

#1

the following not validating predicate:

{$dot.text.length() == 1;}

a proper validating predicate in antlr has question mark @ end, , inner code has no semi colon @ end. should be:

{$dot.text.length() == 1}?

instead.

#2

you should not handling these alternative commands:

expr   :  val('u'('e')?)? string    |  und('e'('r'('l'('i'('n'('e')?)?)?)?)?)? (on | off)    |  con('n'('e'('c'('t')?)?)?)? onechar   ;

in parser rule. should let lexer handle instead. it:

expr   :  val string   |  und (on | off)   |  con onechar   ;  // ...  val : 'val' ('u' ('e')?)?; und : 'und' ( 'e' ( 'r' ( 'l' ( 'i' ( 'n' ( 'e' )?)?)?)?)?)?; con : 'con' ( 'n' ( 'e' ( 'c' ( 't' )?)?)?)?;

(also see #5!)

#3

your lexer rules:

char    :   'a'..'z'; digit   :   '0'..'9';   string  :   (char | digit)+;

are making things complicated you. lexer can produce 3 different kind of tokens because of this: char, digit or string. ideally, should create string tokens since string can single char or digit. can adding fragment keyword before these rules:

fragment char  : 'a'..'z' | 'a'..'z'; fragment digit : '0'..'9'; string : (char | digit)+;

there no char , digit tokens in token stream, string tokens. in short: fragment rules used inside lexer rules, by other lexer rules. never tokens of own (and can therefor never appear in parser rule!).

#4

the rule:

dot :   .;

does not think does. matches "any token", not "any character". inside lexer rule, . matches character in parser rules, matches token. realize parser rules can make use of tokens created lexer.

the input source first tokenized based on lexer-rules. after has been done, parser (though parser rules) can operate on these tokens (not characters!!!). make sure understand this! (if not, ask clarification or grab book antlr)

- example -

take following grammar:

p : . ; : 'a' | 'a'; b : 'b' | 'b';

the parser rule p match token lexer produces: a- or b-token. so, p can match 1 of characters 'a', 'a', 'b' or 'b', nothing else.

and in following grammar:

prs : . ; foo : 'a'; bar : . ;

the lexer rule bar matches single character in range \u0000 .. \uffff, can never match character 'a' since lexer rule foo defined before bar rule , captures 'a' already. , parser rule prs again matches token, either foo or bar.

#5

putting single characters 'u' inside parser rules, cause lexer tokenize u separate token: don't want that. also, putting them in parser rules, unclear token has precedence on other tokens. should keep such literals outside parser rules , make them explicit lexer rules instead. use lexer rules in parser rules.

so, don't do:

prule  : 'u' ':' string string : ...

but do:

prule  : u ':' string u      : 'u'; string : ...

you make ':' lexer rule, of less importance. 'u' can string must appear lexer rule before string rule.

okay, obvious things come mind. based on them, here's proposed grammar:

grammar first;  parse   :  (set expr {system.out.println("expr = " + $expr.text);} )+ eof   ;  expr   :  val string    {system.out.print("a :: ");}   |  ul (on | off) {system.out.print("b :: ");}   |  con onechar   {system.out.print("c :: ");}   ;  onechar    :  string {$string.text.length() == 1}?   ;  set : 'set'; val : 'val' ('u' ('e')?)?; ul  : 'und' ( 'e' ( 'r' ( 'l' ( 'i' ( 'n' ( 'e' )?)?)?)?)?)?; con : 'con' ( 'n' ( 'e' ( 'c' ( 't' )?)?)?)?; on  : 'on'; off : 'off';  string : (char | digit)+;  fragment char  : 'a'..'z' | 'a'..'z'; fragment digit : '0'..'9';  space : (' ' | '\t' | '\r' | '\n') {$channel=hidden;};

that can tested following class:

import org.antlr.runtime.*;  public class main {     public static void main(string[] args) throws exception {         string source =                  "set value abc  \n" +                  "set underli on \n" +                  "set conn x     \n" +                  "set conn xy      ";         antlrstringstream in = new antlrstringstream(source);         firstlexer lexer = new firstlexer(in);         commontokenstream tokens = new commontokenstream(lexer);         firstparser parser = new firstparser(tokens);         system.out.println("parsing:\n======\n" + source + "\n======");         parser.parse();     } }

which, after generating lexer , parser:

 java -cp antlr-3.2.jar org.antlr.tool first.g  javac -cp antlr-3.2.jar *.java java -cp .:antlr-3.2.jar main

prints following output:

parsing: ====== set value abc   set underli on  set conn x      set conn xy       ====== :: expr = value abc b :: expr = underli on c :: expr = conn x line 0:-1 rule onechar failed predicate: {$string.text.length() == 1}? c :: expr = conn xy

as can see, last command, c :: expr = conn xy, produces error, expected.

Search This Blog

Assebmley