Parse Trees

Python’s parser is an LL(1) parser mostly based off of the implementation laid out in the Dragon Book [Aho86].

The grammar file for Python can be found in Grammar/Grammar with the numeric value of grammar rules stored in Include/graminit.h. The list of types of tokens (literal tokens, such as :, numbers, etc.) can be found in Grammar/Tokens with the numeric value stored in Include/token.h. The parse tree is made up of node *structs (as defined in Include/node.h).

Querying data from the node structs can be done with the following macros (which are all defined in Include/node.h):

CHILD(node *, int): Returns the nth child of the node using zero-offset indexing
RCHILD(node *, int): Returns the nth child of the node from the right side; use negative numbers!
NCH(node *): Number of children the node has
STR(node *): String representation of the node; e.g., will return : for a COLON token
TYPE(node *): The type of node as specified in Include/graminit.h
REQ(node *, TYPE): Assert that the node is the type that is expected
LINENO(node *): Retrieve the line number of the source code that led to the creation of the parse rule; defined in Python/ast.c

For example, consider the rule for ‘while’:

while_stmt ::=  "while" expression ":" suite : ["else" ":" suite]

The node representing this will have TYPE(node) == while_stmt and the number of children can be 4 or 7 depending on whether there is an ‘else’ statement. REQ(CHILD(node, 2), COLON) can be used to access what should be the first : and require it be an actual : token.

Categories

Explore

Help

Get In Touch

Learn Python – Interactive Python

Curriculum

Parse Trees

Leave a Comment Cancel Reply

Categories

Explore

Help

Get In Touch

Login

Register

Modal title