Parsing
Introduction
Parsing Prolog terms and theories is the purpose of the parser-core
and parser-theory
modules, respectively.
Both modules leverage on ANTLR for parsing strings, behind the scenes, but they only expose an ANTLR-agnostic public API.
Users of 2P-Kt will only rely on types and methods from these modules to parse their Prolog strings.
In principle, if the ANTLR technology is replaced in the future, the public API of the parser-core
and parser-theory
modules should not change at all.
ANTLR is a parser generator capable of producing Java or JavaScript (JS) code, i.e., platform-specific sources. Any parser generated by ANTLR requires a runtime library to actually function in production. There are many platform-specific runtime libraries (or simply "runtimes", henceforth). Of course, Java-based parser require the Java runtime (that is, a Java library), whereas JS-based parsers require the JS runtime, (that is, a JS library).
To handle such complexity, we introduced two more modules, namely, parser-jvm
and parser-js
.
They are two Kotlin platform-specific projects.
In particular, parser-jvm
is a Kotlin/JVM project, whereas parser-js
is a Kotlin/JS project.
They are to be considered implementation modules and they must only depend on their respective ANTLR runtime.
Accordingly, they must NOT depend on any other module in 2P-Kt.
TODO documentation below must be rewritten
ANTLR Grammar for Prolog
The main ANTLR grammar feature is the expression
rule.
The purpose is to parse an expression, like 3 + 4 * 5
, with right operators' associativity and priorities: how to do that?
Prolog has an internal representation of priority, that is a number in the range between 0 and 1200
(0 identifies the highest priority).
When we have an expression, we have to understand which of the following situations we are facing: XFY
, YFX
, YF
, XF
, XFX
, FX
, FY
.
The action we have to do is different depending on these situations.
If we have the YFX
, we need to memorize the left term, then parse the operator, making sure it's not part of disabled operators (for example,
if we have a List with elements separated by ',', than ',' must be a disabled operator, because in this context identifies the separation between elements),
and memorize right expressions (can be more than one). In this case, right expressions have higher priority, so we have to parse the expression
with a lower priority level, as aforementioned. Below a parser expression extract of grammar.
expression[int priority, String[] disabled] -> local utilities variables
: (left=term
( operators+=op[YFX] right+=expression[$op.priority - 1, $disabled]
(operators+=op[YFX] right+=expression[$op.priority - 1, $disabled])*
| operators+=op[XFY] right+=expression[$op.priority, $disabled]
(operators+=op[XFY] right+=expression[$op.priority, $disabled])*
| operators+=op[XFX] right+=expression[$op.priority - 1, $disabled]
| operators+=op[YF] (operators+=op[YF])*
| operators+=op[XF]
)?
| operators+=op[FX] right+=expression[$op.priority - 1, $disabled]
| operators+=op[FY] (operators+=op[FY])* right+=expression[$op.priority, $disabled]
) (outers+=outer[$priority, $bottom, disabled])*
;
Looking at the last instruction, we can see an "outers" accumulator and an "outer" rule. To understand why we introduce this rule,
the better way is to do an example. All above cases consider the right expression as a high priority term (excluded XFY and FY that consider it as the same priority).
So in all these cases, the priority level decreases (we invoke the expression rule with priority-1). But, what happen when we have an expression like 3 + 4 * 5 - 1
?
We have left expression with high priority level (so low priority) 3 + expression
, then we have a lower priority level expression (so higher priority) left * expression
.
Then we have a higher priority level (so lower priority) left - expression
and we have to handle this priority increase. To do that, we introduce the "outer" term, in which
we memorize expression that go back in priority level. So the parse tree of this expression becomes:
.
+-- EOF
+-- singletonExpr
| +-- "3"
| +-- "+"
+-- expr
| +-- "4"
| +-- "*"
| +-- "5"
+-- outer
| +-- "-"
| +-- "1"
We can see how the go-back expression is transformed into an "outer" branch, which
contains all the original information. Nevertheless this tree is not the right parse tree, in fact the ?
operation must be the first
appearing. The right tree corresponding is '-'('+'(3, '*'(4, 5)), 1)
.
To do this, all outer branches are rotated in the order in which they appear.
The steps are:
- We build the tree until outer branch
('+'(3, '*'(4, 5))
- We parse the outer branch
'-'(X,1)
- we substitute the previous expression to X and we obtain the tree
'-'('+'(3, '*'(4, 5)), 1)
. Same steps for more than one outer.
parser-jvm
JVM specific module that aim to generate a jvm specific parser with ANTLR. It depends only on antlr4 features and jdk libraries.
When ANTLR generates the grammar in jvm specific target, it also generates:
PrologLexer.java
: it contains all lexer support methods, token recognition and other minor featuresPrologParser.java
: it contains the parser main features. There are also a lot ofContext
classes, corresponding to every node/rule found in the grammar.PrologParserBaseVisitor
: it extends an interface (PrologParserVisitor) and it contains semantic actions to do in every node and every context. It must be extended to add right semantic actions.PrologParserBaseListener
: it extends an interface (PrologParserListener) and it contains all semantic actions for specific events that happen during the parsing (enter/exit Rule ecc). It's like an event handler.
To implement semantic in the right way, it was necessary to define custom methods and add them to parser features.
To do that, as introduced into ANTLR paragraph, we created a class (DynamicLexer
/Parser
for Lexer
/Parser
) that implements dynamic features
Lexer/Parser need to have, for example into DynamicLexer
we memorize found operators and we define support methods like isOperator
and addOperators
.
In DynamicParser we must track also operator's associativity and priority, so we need a lot of support methods to understand how to handle
every cases. For example, with option { superClass = DynamicParser}
into PrologParser.g4 we specify that generated PrologParser.java
extends DynamicParser.
parser-js
JS specific module that aim to generate a js specific parser with ANTLR. It depends only on antlr4 and js features.
To create dynamic behavior to PrologLexer, in parser-jvm
, as mentioned above, it was created a DynamicLexer.java
, extended by PrologLexer.java
(thanks to superClass ANTLR option). Unfortunately, in js class concept does not exist.
JavaScript is a prototype-based language, meaning object properties and methods can be shared through generalized objects that have the ability to be cloned and extended.
This is known as prototype inheritance and differs from class inheritance. So, to affront this problem, the solution is:
- Create a
DynamicLexer.js
, that contains a constructor (DynamicLexer) and implements all behavior that must be added toPrologLexer
- Use
@headers
(PrologLexer.g4
) annotation to "import" this constructor into the generated PrologLexer (@headers{var DynamicLexer = require("./DynamicLexer").DynamicLexer}
) - Use
@members
(PrologLexer.g4
) annotation to create the prototype inheritance between PrologLexer and DynamicLexer using call method:@members {DynamicLexer.call(this);}
. This add involve that this code will be injected into PrologLexer constructor. Now, all functionality defined intoDynamicLexer
can be used fromPrologLexer
(the same of a class extension) The same procedure was use to developPrologParser
(withDynamicParser
). After that, js specific tests have been implemented to ensure proper functionality (necessity to copy/paste js test files into right build directory) Despite in jvm module this was enough, in js module is missing an important feature: how to map from specific js to common? Kotlin helps us with external class mapping in several simple steps: - Exports necessarily js modules (for example,
exports.Associativity = Associativity;
) - Create Kotlin class with header
@file:JsModule("./Associativity")
(followed by@file:JsNonModule
) - Create class/val/fun with same js signature and with "external" keyword (for example, external enum class
Associativity
) Now this class can be used in Kotlin common modules. It was necessarily to createPrologParser.kt
,PrologLexer.kt
,Associativity.kt
,StringType.kt
,PrologParserVisitor.kt
and most of antlr4 sources file.
parser-core
Bridge-like module that aim to link platform specific world to common world. See "Rationale and Architecture/parsing.md"
It was necessary to define common methods in commonMain
: implemented interface TermParser
(with parse method extension in
main project classes, and also in String/Integer Kotlin classes, to simplify parsing). This interface had to be implemented in both platform specific
submodules (jvm and js). So, expect methods have been defined (that needed an actual match in both jvm and js submodules) to simplify TermParser
creation. After that, common tests have been prepared, with aim to run in both jvm and js submodules. To prepare those, as parser-core/common
does not depend on any platform, there have benn complications: JUnit cannot be used. To resolve this inconvenience, two support classes
have been developed, one to simplify the assertion between stringToBeParser and expected term, the other to introduce an exhausting test set (written
in Prolog dsl, developed in Kotlin in dslX modules).
parser-core/jvm
Platform specific submodule for jvm.
Considering parser-jvm
module, it was necessarily to extend PrologParserBaseVisitor
, PrologParserBaseListener
and simplify Parser creation:
- Implemented
PrologExpressionVisitor
that extendsPrologParserBaseVisitor
: new structures of the core project, like List or Scope (that includes memory management) - Added
PrologParserFactory
interface injvmMain
with parse methods and Singleton Pattern - Implemented
PrologParserFactory
with a Singleton Impl and with Kotlin's methods and structures: this class is very useful to simplify the initialization of parsing - Implemented
DynamicOpListener
that extendsPrologParserBaseListener
: it adds exitClause functionality - In fact, now, to trigger the parsing (in TermParserImpl) we just need to do:
PrologParserFactory().parseExpression(input, operators).accept(PrologExpressionVisitor())
parser-core/js
Platform specific submodule for js.
Considering parser-js
module, it was necessarily to extend PrologParserVisitor.kt to develop expect semantic. Then, like
parser-core/jvm
, a PrologParserFactory has been developed to simplify Parser creation and parsing.
As in parser-js
module the antlr grammar does not create listener, DynamicOpListener has not been developed.
PrologVisitor
: extendsPrologParserVisitor<Term>
and implements visit logic for each node.PrologParserFactory
with a Singleton Impl and with Kotlin's methods and structures: to simplify the initialization of parsing Now to trigger the parsing (in TermParserImpl) we just need to do:
PrologParserFactory().parseExpression(input,operators).accept(PrologVisitor())}
Main issues:
- JS specific code into file.g4 does not consider ';' character. Probably an ANTLR4 issue. The solution is to use double ';;' instead. JS specific test from commonTest gave problems:
- No way to add bare js sources in build folder, actually a Gradle issue. So to launch tests it is necessarily to copy/paste every time js sources into build folder. [fixed]