An automated Layout-sensitive AMBiguity detector for layout-sensitive grammars, written in Python 3, powered by Z3 solver, and (its theories) verified in the Coq proof assistant.
Layout-sensitive (a.k.a. indentation-sensitive) grammars were first proposed by Landin. Nowadays, they have been adopted in Python, Haskell, F#, Yaml, Markdown, and so on. These grammars are also introduced as new features during language evolution, such as the indentation rules now have been introduced in Scala 3. Check Wikipedia for more languages that use layout-sensitive syntax.
Due to the presence of layout constraints, indentations and whitespaces intensively affect how a program should be parsed. Think of Python: your program cannot be executed if the indentations are incorrect. Taking indentations and whitespaces into account brings more difficulties and challenges in the study of layout-sensitive grammars than context-free grammars. That’s way tooling is still limited and immature for layout-sensitive grammars.
If a grammar is indeed ambiguous (there exists a sentence with more than one ways of parsing), it could be the case that some program is parsed into an abstract syntax tree that the user does not expect, which may further leads to misunderstandings of the semantics during the compilation. Therefore, in a serious languages design phase, the grammar that describes the syntax specification must be unambiguous.
Although the language designers can choose to prove (or even worse, to admit) their grammars are unambiguous themselves, in reality, they may easily miss some layout constraints that are indeed necessary, particularly in large grammars. Hence, detecting potential ambiguity automatically is beneficial. The ambiguous sentences produced by Lamb are the shortest, which make it easier for the users to quickly identify the cause of ambiguity. Once the ambiguity issue is understood, the users could resolve it by adding extra layout constraints to their grammars.
- Simple input format
- Fully automatic
- Shortest ambiguous sentences
- Visualization of parse trees
- Machine-checked theories
Inspired by a previous work on checking the bounded ambiguity of context-free grammars via SAT solving, we intensively extend their approach to support layout-sensitive grammars but via SMT solving to express the ordering and quantitative relations over line/column numbers.
Our key novelty lies in a reachability condition, which takes the impact of layout constraints on ambiguity into careful account. With this condition in hand, we propose an equivalent ambiguity notion called local ambiguity for the convenience of SMT encoding. We translate local ambiguity into an SMT formula and developed a bounded ambiguity checker that automatically finds a shortest nonempty ambiguous sentence (if exists) for a user-input grammar. The soundness and completeness of our SMT encoding are mechanized in the Coq proof assistant.