Parse Trees
Python’s parser is an LL(1) parser mostly based off of the implementation laid out in the Dragon Book [Aho86].
The grammar file for Python can be found in Grammar/Grammar with the numeric value of grammar rules stored in Include/graminit.h. The list of types of tokens (literal tokens, such as :, numbers, etc.) can be found in Grammar/Tokens with the numeric value stored in Include/token.h. The parse tree is made up of node *structs (as defined in Include/node.h).
Querying data from the node structs can be done with the following macros (which are all defined in Include/node.h):
CHILD(node *, int)- Returns the nth child of the node using zero-offset indexing
RCHILD(node *, int)- Returns the nth child of the node from the right side; use negative numbers!
NCH(node *)- Number of children the node has
STR(node *)- String representation of the node; e.g., will return
:for aCOLONtoken TYPE(node *)- The type of node as specified in
Include/graminit.h REQ(node *, TYPE)- Assert that the node is the type that is expected
LINENO(node *)- Retrieve the line number of the source code that led to the creation of the parse rule; defined in
Python/ast.c
For example, consider the rule for ‘while’:
while_stmt ::= "while"expression":"suite: ["else" ":"suite]
The node representing this will have TYPE(node) == while_stmt and the number of children can be 4 or 7 depending on whether there is an ‘else’ statement. REQ(CHILD(node, 2), COLON) can be used to access what should be the first : and require it be an actual : token.