Boman: Colossal memory usage/stack problems with ANTLR lexer/parser

Sunday, 15 September 2013

Colossal memory usage/stack problems with ANTLR lexer/parser

Colossal memory usage/stack problems with ANTLR lexer/parser

I'm porting over a grammar from flex/bison, and mostly seem to have
everything up and running (in particular, my token stream seems fine, and
my parser grammar is compiling and running), but seem to be running into
problems of runaway stack/memory usage even with very small/moderate sized
inputs to my grammar. What is the preferred construct for chaining
together an unbounded sequence of the same nonterminal? In my Bison
grammar I had production rules of the form:
statements: statement | statement statements
words: | word words
In ANTLR, if I maintain the same rule setup, this seems to perform
admirably on small inputs (on the order of 4kB), but leads to stack
overflow on larger inputs (on the order of 100kB). In both cases the
automated parse tree produced is also rather ungainly.
I experimented with changing these production rules to have an explicitly
additive (rather than recursive form):
statements: statement+
words: word*
However this seems to have lead to absolutely horrific blowup in memory
usage (upwards of 1GB) on even very small inputs, and the parser has not
yet managed to return a parse tree after 20 minutes of letting it run.
Any pointers would be appreciated.

Boman

Sunday, 15 September 2013

Colossal memory usage/stack problems with ANTLR lexer/parser

No comments:

Post a Comment