Source code

Revision control

Copy as Markdown

Other Tools

## jsparagus/js_parser: Generating a parser for JavaScript
In this directory:
* **esgrammar.pgen** A grammar for the mini-language the ECMAScript
standard uses to describe ES grammar.
* **es.esgrammar** - The actual grammar for ECMAScript, in emu-grammar
format, extracted automatically from the spec.
* **extract_es_grammar.py** - The script that creates *es.esgrammar*.
* **es-simplified.esgrammar** - A hacked version of *es.esgrammar* that
jsparagus can actually handle.
* **generate_js_parser_tables.py** - A script to generate a JS parser
based on *es-simplified.esgrammar*. Read on for instructions.
## How to run it
To generate a parser, follow these steps:
```console
$ cd ..
$ make init
$ make all
```
**Note:** The last step currently takes about 35 seconds to run on my
laptop. jsparagus is slow.
Once you're done, to see your parser run, try this:
```console
$ cd crates/driver
$ cargo run --release
```
The build also produces a copy of the JS parser in Python.
After `make all`, you can use `make jsdemo` to run that.
### How simplified is "es-simplified"?
Here are the differences between *es.esgrammar*, the actual ES grammar,
and *es-simplified.esgrammar*, the simplified version that jsparagus can
actually handle:
* The four productions with [~Yield] and [~Await] conditions are dropped.
This means that `yield` and `await` do not match *IdentifierReference*
or *LabelIdentifier*. I think it's better to do that in the lexer.
* Truncated lookahead.
`ValueError: unsupported: lookahead > 1 token, [['{'], ['function'], ['async', ('no-LineTerminator-here',), 'function'], ['class'], ['let', '[']]`
* Delete a rule that uses `but not` since it's not implemented.
Identifier :
IdentifierName but not ReservedWord
Making sense of this rule in the context of an LR parser is an
interesting task; see issue #28.
* Ban loops of the form `for (async of EXPR) STMT` by adjusting a
lookahead assertion. The grammar is not LR(1).