Skip to content

Commit 8ac6581

Browse files
bpo-30455: Generate all token related code and docs from Grammar/Tokens. (pythonGH-10370)
"Include/token.h", "Lib/token.py" (containing now some data moved from "Lib/tokenize.py") and new files "Parser/token.c" (containing the code moved from "Parser/tokenizer.c") and "Doc/library/token-list.inc" (included in "Doc/library/token.rst") are now generated from "Grammar/Tokens" by "Tools/scripts/generate_token.py". The script overwrites files only if needed and can be used on the read-only sources tree. "Lib/symbol.py" is now generated by "Tools/scripts/generate_symbol_py.py" instead of been executable itself. Added new make targets "regen-token" and "regen-symbol" which are now dependencies of "regen-all". The documentation contains now strings for operators and punctuation tokens.
1 parent c1b4b0f commit 8ac6581

18 files changed

+940
-462
lines changed

.gitattributes

+4
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,7 @@ Include/opcode.h linguist-generated=true
5555
Python/opcode_targets.h linguist-generated=true
5656
Objects/typeslots.inc linguist-generated=true
5757
Modules/unicodedata_db.h linguist-generated=true
58+
Doc/library/token-list.inc linguist-generated=true
59+
Include/token.h linguist-generated=true
60+
Lib/token.py linguist-generated=true
61+
Parser/token.c linguist-generated=true

Doc/library/token-list.inc

+206
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Doc/library/token.rst

+1-58
Original file line numberDiff line numberDiff line change
@@ -44,64 +44,7 @@ functions. The functions mirror definitions in the Python C header files.
4444

4545
The token constants are:
4646

47-
.. data:: ENDMARKER
48-
NAME
49-
NUMBER
50-
STRING
51-
NEWLINE
52-
INDENT
53-
DEDENT
54-
LPAR
55-
RPAR
56-
LSQB
57-
RSQB
58-
COLON
59-
COMMA
60-
SEMI
61-
PLUS
62-
MINUS
63-
STAR
64-
SLASH
65-
VBAR
66-
AMPER
67-
LESS
68-
GREATER
69-
EQUAL
70-
DOT
71-
PERCENT
72-
LBRACE
73-
RBRACE
74-
EQEQUAL
75-
NOTEQUAL
76-
LESSEQUAL
77-
GREATEREQUAL
78-
TILDE
79-
CIRCUMFLEX
80-
LEFTSHIFT
81-
RIGHTSHIFT
82-
DOUBLESTAR
83-
PLUSEQUAL
84-
MINEQUAL
85-
STAREQUAL
86-
SLASHEQUAL
87-
PERCENTEQUAL
88-
AMPEREQUAL
89-
VBAREQUAL
90-
CIRCUMFLEXEQUAL
91-
LEFTSHIFTEQUAL
92-
RIGHTSHIFTEQUAL
93-
DOUBLESTAREQUAL
94-
DOUBLESLASH
95-
DOUBLESLASHEQUAL
96-
AT
97-
ATEQUAL
98-
RARROW
99-
ELLIPSIS
100-
OP
101-
ERRORTOKEN
102-
N_TOKENS
103-
NT_OFFSET
104-
47+
.. include:: token-list.inc
10548

10649
The following token type values aren't used by the C tokenizer but are needed for
10750
the :mod:`tokenize` module.

Grammar/Tokens

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
ENDMARKER
2+
NAME
3+
NUMBER
4+
STRING
5+
NEWLINE
6+
INDENT
7+
DEDENT
8+
9+
LPAR '('
10+
RPAR ')'
11+
LSQB '['
12+
RSQB ']'
13+
COLON ':'
14+
COMMA ','
15+
SEMI ';'
16+
PLUS '+'
17+
MINUS '-'
18+
STAR '*'
19+
SLASH '/'
20+
VBAR '|'
21+
AMPER '&'
22+
LESS '<'
23+
GREATER '>'
24+
EQUAL '='
25+
DOT '.'
26+
PERCENT '%'
27+
LBRACE '{'
28+
RBRACE '}'
29+
EQEQUAL '=='
30+
NOTEQUAL '!='
31+
LESSEQUAL '<='
32+
GREATEREQUAL '>='
33+
TILDE '~'
34+
CIRCUMFLEX '^'
35+
LEFTSHIFT '<<'
36+
RIGHTSHIFT '>>'
37+
DOUBLESTAR '**'
38+
PLUSEQUAL '+='
39+
MINEQUAL '-='
40+
STAREQUAL '*='
41+
SLASHEQUAL '/='
42+
PERCENTEQUAL '%='
43+
AMPEREQUAL '&='
44+
VBAREQUAL '|='
45+
CIRCUMFLEXEQUAL '^='
46+
LEFTSHIFTEQUAL '<<='
47+
RIGHTSHIFTEQUAL '>>='
48+
DOUBLESTAREQUAL '**='
49+
DOUBLESLASH '//'
50+
DOUBLESLASHEQUAL '//='
51+
AT '@'
52+
ATEQUAL '@='
53+
RARROW '->'
54+
ELLIPSIS '...'
55+
56+
OP
57+
ERRORTOKEN
58+
59+
# These aren't used by the C tokenizer but are needed for tokenize.py
60+
COMMENT
61+
NL
62+
ENCODING

Include/token.h

+3-8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)