Operator no-precedence parsing

c go java js lua py vbs rkt scm pro

Here we use a simplified shunting algorithm for operator precedence parsing, originally invented by Edsger Dijkstra. It is simplified because fully parenthesized expressions are required. This will later be relaxed by adding operator precedences.

The goal is to parse a sequence of characters into a postfix form of a logical expression (tree).

A simplified shunting algorithm is used to show the essential ideas of a compiler.

A compiler typically includes the following parts.

source handler to process and return characters in sequence as needed
scanner or lexer to take a sequence of characters and return a sequence of tokens as needed
parser to take a sequence of tokens and construct a parse tree or code as needed
code generator (compiler) or code execution (interpreter)

A source handler separates the source of the characters in a program from the scanner which separates the characters into tokes.

Here, for logical expressions, we consider only single character tokens with the extension to multiple character tokes being left as an exercise.

( X & Y ) | ( ( ! X ) & ( ! Y ) )

Here is the extended truth table for this logical expression.

X Y | ( X & Y ) | ( ( ! X ) & ( ! Y ) ) --------------------------------------- 0 0 | ( 0 0 0 ) 1 ( ( 1 0 ) 1 ( 1 0 ) ) 0 1 | ( 0 0 1 ) 0 ( ( 1 0 ) 0 ( 0 1 ) ) 1 0 | ( 1 0 0 ) 0 ( ( 0 1 ) 0 ( 1 0 ) ) 1 1 | ( 1 1 1 ) 1 ( ( 0 1 ) 0 ( 0 1 ) )

For more on manually creating this table, see Truth tables: manual method .

Rather than a file or other source, the text of the infix logical expression is hard-coded to a string variable. Here is a way to iterate through that string.

Here is the JavaScript code.

var s1 = "( X & Y ) | (( ! X ) & (! Y ))";
var n1 = s1.length;
for (i1=0; i1 <= n1-1; i1++) {
	var ch1 = s1.substring(i1,i1+1);
	console.log(i1+". ["+ch1+"]\n");
	}

Here is the output of the JavaScript code.

0. [(]

1. [ ]

2. [X]

3. [ ]

4. [&]

5. [ ]

6. [Y]

7. [ ]

8. [)]

9. [ ]

10. [|]

11. [ ]

12. [(]

13. [(]

14. [ ]

15. [!]

16. [ ]

17. [X]

18. [ ]

19. [)]

20. [ ]

21. [&]

22. [ ]

23. [(]

24. [!]

25. [ ]

26. [Y]

27. [ ]

28. [)]

29. [)]

Note that this code is not particularly useful as a way is needed to request the next character in sequence.

A source handler will be initialized (with a source), provide characters, be able to detect the end of the source (e.g., file) and whether the end of the source has been reached. Here is some simple code to accomplish this. Here is the source handler.

Here is the JavaScript code.

///:CODEPART x=[0],t=[Hard-coded test data]

var testText1 = "( X & Y ) | (( ! X ) & (! Y ))";
///:CODEPART x=[1],t=[Source handler code]

var sourceText1 = "";
var sourceLen1 = 0;
var sourcePos1 = 0;
var sourceCh1 = "";

function sourceInit1(sourceText0) {
	sourceText1 = sourceText0;
	sourceLen1 = sourceText1.length;
	sourcePos1 = 0;
	sourceNext1();
	}

function sourceNext1() {
	if (sourcePos1 == sourceLen1) {
		sourceCh1 = "";
		}
	else {
		sourceCh1 = sourceText1.charAt(sourcePos1);
		sourcePos1++;
		}
	}

function sourceEof1() {
	return sourcePos1 == sourceLen1;
	}

function sourceDone1() {
	}
///:TESTPART x=[1],t=[Source handler test code]

function sourceTest1() {
	sourceInit1(testText1);
	while (! sourceEof1()) {
		console.log(sourcePos1+". ["+sourceCh1+"]\n");
		sourceNext1();
		}
	sourceDone1();
	}
sourceTest1();

As with examples to date, a module or global scope is used. When objects and classes are covered, some of the examples used previously will be updated to see how they would be expressed using objects and classes.

In the above code, the prefix source is used. These parts might become part of a module or part of a class, depending on what is being done.

The sourceDone1 routine is typically used to clean up after all the input is read, raise an error if not all the input was read, etc.

The above code will be assumed in the following examples, but not shown in order to concentrate on the essential parts being discussed.

The output is the same as the previous output except that it is convenient to have the sourcePos1 be one more than the value of i1 in the previous example.

Here is the output of the JavaScript code.

1. [(]

2. [ ]

3. [X]

4. [ ]

5. [&]

6. [ ]

7. [Y]

8. [ ]

9. [)]

10. [ ]

11. [|]

12. [ ]

13. [(]

14. [(]

15. [ ]

16. [!]

17. [ ]

18. [X]

19. [ ]

20. [)]

21. [ ]

22. [&]

23. [ ]

24. [(]

25. [!]

26. [ ]

27. [Y]

28. [ ]

29. [)]

The source handler is a simple example of a state machine, remembering the state of the previous character so it can return the next character.

An enhanced source handler would allow input form a file and the inclusion of a file from within the current file. This requires a stack mechanism to push the next file being read, pop the stack when that file is finished, until all of the input is read.

A scanner, or lexer, takes as input a sequence of characters and returns a sequence of tokens, one at a time. There is a special token to indicate the end of the input.

In systems that provide source error location, the location of the token is part of the token structure returned. This is omitted for example purposes.

Note that the scanner removes the white space and returns only the symbols (or tokens).

Left as exercises:

Raise an error if an attempt is made to get the next symbol after the end of the input has been reached.
Allow symbols to contain more than one character.
Return the type of the symbol.

Here is the JavaScript code.

///:CODEPART x=[0],t=[Hard-coded test data]

var testText1 = "( X & Y ) | (( ! X ) & (! Y ))";
///:CODEPART x=[1],t=[Source handler code]

var sourceText1 = "";
var sourceLen1 = 0;
var sourcePos1 = 0;
var sourceCh1 = "";

function sourceInit1(sourceText0) {
	sourceText1 = sourceText0;
	sourceLen1 = sourceText1.length;
	sourcePos1 = 0;
	sourceNext1();
	}

function sourceNext1() {
	if (sourcePos1 == sourceLen1) {
		sourceCh1 = "";
		}
	else {
		sourceCh1 = sourceText1.charAt(sourcePos1);
		sourcePos1++;
		}
	}

function sourceEof1() {
	return sourcePos1 == sourceLen1;
	}

function sourceDone1() {
	}
///:CODEPART x=[2],t=[Scanner code]

function isSpace1(ch1) {
	return " \t".indexOf(ch1) != -1;
	}

function isDigit1(ch1) {
	return "0123456789".indexOf(ch1) != -1;
	}

function isLetter1(ch1) {
	return "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".indexOf(ch1) != -1;
	}
var opDict1 = {
	"!" : ["op1", "not"],
	"&" : ["op2", "and"],
	"|" : ["op2", "or"],
	"=" : ["op2", "eqv"],
	"^" : ["op2", "xor"]
	};
var scanSymbol1 = "";
var scanType1 = "";
var scanName1 = "";
var stackItem1 = [];

function scanInit1(scanText0) {
	sourceInit1(scanText0);
	scanNext1();
	}

function scanNext1() {
	while (true) {
		if (sourceEof1()) {
			scanSymbol1 = "";
			scanType1 = "eof";
			scanName1 = "eof";
			return;
			}
		if (! isSpace1(sourceCh1)) {
			scanSymbol1 = sourceCh1;
			if (scanSymbol1 == "(") {
				scanType1 = "symbol";
				scanName1 = "lparen";
				}
			else if (scanSymbol1 == ")") {
				scanType1 = "symbol";
				scanName1 = "rparen";
				}
			else if (opDict1.hasOwnProperty(scanSymbol1)) {
				var opList2 = opDict1[scanSymbol1];
				scanType1 = opList2[0];
				scanName1 = opList2[1];
				}
			else if (isDigit1(scanSymbol1)) {
				scanType1 = "int";
				scanName1 = scanSymbol1;
				}
			else if (isLetter1(scanSymbol1)) {
				scanType1 = "var";
				scanName1 = scanSymbol1;
				}
			else {
				scanType1 = "error";
				scanName1 = scanSymbol1;
				}
			scanItem1 = [];
			scanItem1.push(scanType1);
			scanItem1.push(scanName1);
			sourceNext1();
			return;
			}
		sourceNext1();
		}
	}

function scanDone1() {
	sourceDone1();
	}
///:TESTPART x=[2],t=[Scanner test code]

function scanTest1() {
	scanInit1(testText1);
	console.log("[symbol][type][name]\n");
	while (scanType1 != "eof") {
		console.log("["+scanSymbol1+"]["+scanType1+"]["+scanName1+"]\n");
		scanNext1();
		}
	scanDone1();
	}
scanTest1();

Here is the output of the JavaScript code.

[symbol][type][name]

[(][symbol][lparen]

[X][var][X]

[&][op2][and]

[Y][var][Y]

[)][symbol][rparen]

[|][op2][or]

[(][symbol][lparen]

[(][symbol][lparen]

[!][op1][not]

[X][var][X]

[)][symbol][rparen]

[&][op2][and]

[(][symbol][lparen]

[!][op1][not]

[Y][var][Y]

[)][symbol][rparen]

Here is the JavaScript code.

///:CODEPART x=[0],t=[Hard-coded test data]

var testText1 = "( X & Y ) | (( ! X ) & (! Y ))";
///:CODEPART x=[1],t=[Source handler code]

var sourceText1 = "";
var sourceLen1 = 0;
var sourcePos1 = 0;
var sourceCh1 = "";

function sourceInit1(sourceText0) {
	sourceText1 = sourceText0;
	sourceLen1 = sourceText1.length;
	sourcePos1 = 0;
	sourceNext1();
	}

function sourceNext1() {
	if (sourcePos1 == sourceLen1) {
		sourceCh1 = "";
		}
	else {
		sourceCh1 = sourceText1.charAt(sourcePos1);
		sourcePos1++;
		}
	}

function sourceEof1() {
	return sourcePos1 == sourceLen1;
	}

function sourceDone1() {
	}
///:CODEPART x=[2],t=[Scanner code]

function isSpace1(ch1) {
	return " \t".indexOf(ch1) != -1;
	}

function isDigit1(ch1) {
	return "0123456789".indexOf(ch1) != -1;
	}

function isLetter1(ch1) {
	return "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".indexOf(ch1) != -1;
	}
var opDict1 = {
	"!" : ["op1", "not"],
	"&" : ["op2", "and"],
	"|" : ["op2", "or"],
	"=" : ["op2", "eqv"],
	"^" : ["op2", "xor"]
	};
var scanSymbol1 = "";
var scanType1 = "";
var scanName1 = "";
var stackItem1 = [];

function scanInit1(scanText0) {
	sourceInit1(scanText0);
	scanNext1();
	}

function scanNext1() {
	while (true) {
		if (sourceEof1()) {
			scanSymbol1 = "";
			scanType1 = "eof";
			scanName1 = "eof";
			return;
			}
		if (! isSpace1(sourceCh1)) {
			scanSymbol1 = sourceCh1;
			if (scanSymbol1 == "(") {
				scanType1 = "symbol";
				scanName1 = "lparen";
				}
			else if (scanSymbol1 == ")") {
				scanType1 = "symbol";
				scanName1 = "rparen";
				}
			else if (opDict1.hasOwnProperty(scanSymbol1)) {
				var opList2 = opDict1[scanSymbol1];
				scanType1 = opList2[0];
				scanName1 = opList2[1];
				}
			else if (isDigit1(scanSymbol1)) {
				scanType1 = "int";
				scanName1 = scanSymbol1;
				}
			else if (isLetter1(scanSymbol1)) {
				scanType1 = "var";
				scanName1 = scanSymbol1;
				}
			else {
				scanType1 = "error";
				scanName1 = scanSymbol1;
				}
			scanItem1 = [];
			scanItem1.push(scanType1);
			scanItem1.push(scanName1);
			sourceNext1();
			return;
			}
		sourceNext1();
		}
	}

function scanDone1() {
	sourceDone1();
	}
var codeList1 = [];

function parseInit1(parseText0) {
	scanInit1(parseText0);
	codeList1 = [];
	}

function parseDo1() {
	var stackItem1;
	var stackSize1;
	var stackList1 = [];
	while (scanType1 != "eof") {
		if (scanType1 == "var") {
			codeList1.push(scanItem1);
			}
		else if (scanType1 == "int") {
			codeList1.push(scanItem1);
			}
		else if (scanType1 == "op2") {
			stackList1.push(scanItem1);
			}
		else if (scanType1 == "op1") {
			stackList1.push(scanItem1);
			}
		else if (scanType1 == "symbol") {
			if (scanName1 == "lparen") {
				stackList1.push(scanItem1);
				}
			else if (scanName1 == "rparen") {
				while (true) {
					stackSize1 = stackList1.length;
					if (stackSize1 == 0) {
						console.log("Parse error: Extra right paren. (stack underflow)\n");
						return;
						}
					stackItem1 = stackList1.pop();
					stackName1 = stackItem1[1];
					if (stackName1 == "lparen") {
						break;
						}
					codeList1.push(stackItem1);
					}
				}
			else {
				console.log("Parse error: Symbol "+scanName1+" not handled. (should not happen)\n");
				}
			}
		else {
			console.log("Parse error: Type "+scanType1+" not handled. (should not happen)\n");
			}
		scanNext1();
		}
	stackSize1 = stackList1.length;
	while (stackSize1 != 0) {
		stackItem1 = stackList1.pop();
		stackName1 = stackItem1[1];
		if (stackName1 == "lparen") {
			console.log("Parse error: unmatched left parentheses.\n");
			}
		else {
			codeList1.push(stackItem1);
			}
		stackSize1 = stackList1.length;
		}
	}

function parseDone1() {
	scanDone1();
	}

function parseShow1() {
	console.log("(begin code list)\n");
	var n1 = codeList1.length;
	for (i1=0; i1 <= n1-1; i1++) {
		var code1 = codeList1[i1];
		var type1 = code1[0];
		var value1 = code1[1];
		console.log("\t"+i1+". [\""+type1+"\",\""+value1+"\"]\n");
		}
	console.log("(end code list)\n");
	}

function parseTest1() {
	parseInit1(testText1);
	parseDo1();
	parseDone1();
	parseShow1();
	}
parseTest1();

Here is the output of the JavaScript code.

Parse error: unmatched left parentheses.

(begin code list)

	0. ["var","X"]

	1. ["var","Y"]

	2. ["op2","and"]

	3. ["var","X"]

	4. ["op1","not"]

	5. ["var","Y"]

	6. ["op1","not"]

	7. ["op2","and"]

	8. ["op2","or"]

(end code list)