Skip to content

Commit 705df7c

Browse files
authored
Merge pull request #276 from jlapeyre/change-lexing-openqasm-version
Return a single token for OpenQASM version statement from lexer This commit changes lexing the OpenQASM version statement. Flags recording failure to recognize major and minor versions are included. An example of the statement is ``` OPENQASM 3.1; ``` Here are some choices for lexing the openqasm version statement. This commit changes the implementaton from the first choice to the second. This was first implemented by doing nothing in the lexer to treat this statement. This is the way almost all keywords are lexed. All identifiers are recognized as keywords when parsing, rather than lexing. The number `3.1` is tokenized as a float literal (actually an integer literal, converted to float somewhere before parsing. This is inherited from r-a) But parsing does not have access to the input text, so this defers checking that `3.1` is a valid to the semantic analysis. This last step is easy enough to do, but was not done. Another way to implement lexing version statement is to recognize the entire statement as a token. This means recognizing the character sequence`OPENQASM` + whitespace + valid version number. If this fails, it will be lexed as an invalid identifier, or an invalid version statement, depending on where the error is. This allows catching errors where they ought to be caught, at the lexing and/or parsing stages. The version is then extracted from the token. This can be done with an api call created from the ungrammar, and/or coded by hand. Another way is to have the lexer recognize `OPENQASM` as a token, then a whitespace token, and finally a version number tokenkind, say `OPENQASM_VERSION`. This would be easier to parse and consume in semantic analysis. But the version number text would ordinarily be lexed as a float literal. Furthermore, the lexer produces a stream of tokens and maintains as little state a possible. We would need to break this invariant in some way to accommodate this way of parsing the version string. For example, saving tokens and emitting all three when done. Or entering a special mode. The reference parser enters a "mode" enabled by the antlr parser generator. The lexer is currently very simple and fast. I don't want to experiment with adding complexity just for this purpose.
2 parents 54d23f3 + bbcfa1d commit 705df7c

18 files changed

+282
-201
lines changed

crates/oq3_lexer/src/lib.rs

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,11 @@ pub enum TokenKind {
7676
/// Like the above, but containing invalid unicode codepoints.
7777
InvalidIdent,
7878

79+
OpenQasmVersionStmt {
80+
major: bool,
81+
minor: bool,
82+
},
83+
7984
Pragma,
8085

8186
Dim,
@@ -301,6 +306,22 @@ impl Cursor<'_> {
301306

302307
'p' => self.pragma_or_ident_or_unknown_prefix(),
303308

309+
// This implementation stores the entire "OPENQASM x.y" string as a single token.
310+
// There are several options, but others seem to come with a cost.
311+
// At present the tokenizer has in a sense no context, very little state.
312+
// There are no "modes". It might be possible to do this in a way that does not
313+
// affect performance. But I don't want to take the time to experiment. Breaking
314+
// an abstraction for one small feature seems unwise.
315+
'O' => {
316+
if self.have_openqasm() {
317+
self.eat_while(is_whitespace);
318+
let (major, minor) = self.openqasm_version();
319+
OpenQasmVersionStmt { major, minor }
320+
} else {
321+
self.ident_or_unknown_prefix()
322+
}
323+
}
324+
304325
// Identifier (this should be checked after other variant that can
305326
// start as identifier).
306327
c if is_id_start(c) => self.ident_or_unknown_prefix(),
@@ -483,6 +504,7 @@ impl Cursor<'_> {
483504
}
484505
}
485506

507+
/// Yikes!! assumes the *previous* token was also whitespace.
486508
fn whitespace(&mut self) -> TokenKind {
487509
debug_assert!(is_whitespace(self.prev()));
488510
self.eat_while(is_whitespace);
@@ -528,6 +550,55 @@ impl Cursor<'_> {
528550
false
529551
}
530552

553+
/// This is called if we just consumed 'O'
554+
/// If we consume "OPENQASM" + whitespace, then is is a version statement.
555+
/// Otherwise, an identifier, valid or not.
556+
fn have_openqasm(&mut self) -> bool {
557+
if self.first() == 'P' {
558+
self.bump();
559+
if self.first() == 'E' {
560+
self.bump();
561+
if self.first() == 'N' {
562+
self.bump();
563+
if self.first() == 'Q' {
564+
self.bump();
565+
if self.first() == 'A' {
566+
self.bump();
567+
if self.first() == 'S' {
568+
self.bump();
569+
if self.first() == 'M' {
570+
self.bump();
571+
return is_whitespace(self.first());
572+
}
573+
}
574+
}
575+
}
576+
}
577+
}
578+
}
579+
false
580+
}
581+
582+
fn openqasm_version(&mut self) -> (bool, bool) {
583+
if !self.eat_decimal_digits() {
584+
return (false, false);
585+
}
586+
let c = self.first();
587+
if c == '.' {
588+
self.bump();
589+
590+
if !self.eat_decimal_digits() {
591+
// Do not allow "3."
592+
return (true, false);
593+
}
594+
}
595+
let c = self.first();
596+
if c != ';' && !is_whitespace(c) {
597+
return (false, false);
598+
}
599+
(true, true)
600+
}
601+
531602
fn pragma_or_ident_or_unknown_prefix(&mut self) -> TokenKind {
532603
if self.have_pragma() {
533604
Pragma
@@ -856,6 +927,7 @@ impl Cursor<'_> {
856927
}
857928

858929
fn eat_decimal_digits(&mut self) -> bool {
930+
//
859931
let mut has_digits = false;
860932
loop {
861933
match self.first() {

crates/oq3_parser/src/grammar/expressions.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,14 @@ pub(crate) fn stmt(p: &mut Parser<'_>) {
101101
return q_or_c_reg_declaration(p, m);
102102
}
103103

104+
if p.at(VERSION_STRING) {
105+
p.bump_any();
106+
if !p.eat(T![;]) {
107+
p.error("Expecting semicolon terminating statement");
108+
}
109+
m.complete(p, VERSION_STRING);
110+
return;
111+
}
104112
// FIXME: straighten out logic
105113
if !(p.current().is_classical_type() && (p.nth(1) == T!['('] || p.nth(1) == T!['[']))
106114
&& !p.at_ts(EXPR_FIRST)

crates/oq3_parser/src/lexed_str.rs

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,19 @@ fn inner_extend_token<'a>(
225225
COMMENT
226226
}
227227

228+
oq3_lexer::TokenKind::OpenQasmVersionStmt { major, minor } => {
229+
if *major {
230+
if *minor {
231+
// All good
232+
} else {
233+
err = "Invalid minor version in OpenQASM version statement";
234+
}
235+
} else {
236+
err = "Invalid version number in OpenQASM version statement";
237+
}
238+
VERSION_STRING
239+
}
240+
228241
oq3_lexer::TokenKind::Whitespace => WHITESPACE,
229242
oq3_lexer::TokenKind::Ident if token_text == "_" => UNDERSCORE,
230243

crates/oq3_semantics/src/syntax_to_semantics.rs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -536,8 +536,10 @@ fn stmt_to_asg_stmt(stmt: synast::Stmt, context: &mut Context) -> Option<asg::St
536536
synast::Stmt::ExprStmt(expr_stmt) => expr_stmt_to_asg_stmt(expr_stmt, context),
537537

538538
synast::Stmt::VersionString(version_string) => {
539-
let version = version_string.version().unwrap().version().unwrap();
540-
let _ = version.split_into_parts();
539+
// let version = version_string.version().unwrap().version().unwrap();
540+
// not_impl!(context, version_string)
541+
context.insert_error(NotImplementedError, &version_string);
542+
// Better None than NullStmt.
541543
None
542544
}
543545

crates/oq3_semantics/tests/from_string_tests.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -515,7 +515,7 @@ bit[2] out;
515515
out[0] = measure $0;
516516
"#;
517517
let (program, errors, _symbol_table, _have_syntax_errors) = parse_string(code);
518-
assert!(errors.is_empty());
518+
assert_eq!(errors.len(), 1);
519519
assert_eq!(program.len(), 2);
520520
}
521521

crates/oq3_semantics/tests/spec.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ include "stdgates.inc";
5656
// Rest of QASM program
5757
"#;
5858
let (program, errors, _symbol_table) = parse_string(code);
59-
assert!(errors.is_empty());
59+
assert_eq!(errors.len(), 1);
6060
assert!(program.is_empty());
6161
}
6262

crates/pipeline-tests/tests/snapshots/runner__tests__snippets__invalid__statements__headers.qasm-lex.snap

Lines changed: 35 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -18,43 +18,38 @@ include "hello;
1818
ok: false
1919
errors: 1
2020
[0] Whitespace "\n" @0..1
21-
[1] Ident "OPENQASM" @1..9
22-
[2] Whitespace " " @9..10
23-
[3] Ident "int" @10..13
24-
[4] Semi ";" @13..14
25-
[5] Whitespace "\n" @14..15
26-
[6] Ident "OPENQASM" @15..23
27-
[7] Whitespace " " @23..24
28-
[8] Literal { kind: Str { terminated: true }, suffix_start: 14 } "'hello, world'" @24..38
29-
[9] Semi ";" @38..39
30-
[10] Whitespace "\n" @39..40
31-
[11] Ident "OPENQASM" @40..48
32-
[12] Whitespace " " @48..49
33-
[13] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "3" @49..50
34-
[14] Whitespace " " @50..51
35-
[15] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "3" @51..52
36-
[16] Semi ";" @52..53
37-
[17] Whitespace "\n" @53..54
38-
[18] Ident "OPENQASM" @54..62
39-
[19] Whitespace " " @62..63
40-
[20] Literal { kind: Float { base: Decimal, empty_exponent: false }, suffix_start: 2 } "3.x" @63..66
41-
[21] Semi ";" @66..67
42-
[22] Whitespace "\n" @67..68
43-
[23] Ident "include" @68..75
44-
[24] Whitespace " " @75..76
45-
[25] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "3" @76..77
46-
[26] Semi ";" @77..78
47-
[27] Whitespace "\n" @78..79
48-
[28] Ident "include" @79..86
49-
[29] Whitespace " " @86..87
50-
[30] Ident "include" @87..94
51-
[31] Semi ";" @94..95
52-
[32] Whitespace "\n" @95..96
53-
[33] Ident "include" @96..103
54-
[34] Whitespace " " @103..104
55-
[35] Ident "def" @104..107
56-
[36] Semi ";" @107..108
57-
[37] Whitespace "\n" @108..109
58-
[38] Ident "include" @109..116
59-
[39] Whitespace " " @116..117
60-
[40] Literal { kind: Str { terminated: false }, suffix_start: 8 } ""hello;\n" @117..125
21+
[1] OpenQasmVersionStmt { major: false, minor: false } "OPENQASM " @1..10
22+
[2] Ident "int" @10..13
23+
[3] Semi ";" @13..14
24+
[4] Whitespace "\n" @14..15
25+
[5] OpenQasmVersionStmt { major: false, minor: false } "OPENQASM " @15..24
26+
[6] Literal { kind: Str { terminated: true }, suffix_start: 14 } "'hello, world'" @24..38
27+
[7] Semi ";" @38..39
28+
[8] Whitespace "\n" @39..40
29+
[9] OpenQasmVersionStmt { major: true, minor: true } "OPENQASM 3" @40..50
30+
[10] Whitespace " " @50..51
31+
[11] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "3" @51..52
32+
[12] Semi ";" @52..53
33+
[13] Whitespace "\n" @53..54
34+
[14] OpenQasmVersionStmt { major: true, minor: false } "OPENQASM 3." @54..65
35+
[15] Ident "x" @65..66
36+
[16] Semi ";" @66..67
37+
[17] Whitespace "\n" @67..68
38+
[18] Ident "include" @68..75
39+
[19] Whitespace " " @75..76
40+
[20] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "3" @76..77
41+
[21] Semi ";" @77..78
42+
[22] Whitespace "\n" @78..79
43+
[23] Ident "include" @79..86
44+
[24] Whitespace " " @86..87
45+
[25] Ident "include" @87..94
46+
[26] Semi ";" @94..95
47+
[27] Whitespace "\n" @95..96
48+
[28] Ident "include" @96..103
49+
[29] Whitespace " " @103..104
50+
[30] Ident "def" @104..107
51+
[31] Semi ";" @107..108
52+
[32] Whitespace "\n" @108..109
53+
[33] Ident "include" @109..116
54+
[34] Whitespace " " @116..117
55+
[35] Literal { kind: Str { terminated: false }, suffix_start: 8 } ""hello;\n" @117..125

crates/pipeline-tests/tests/snapshots/runner__tests__snippets__invalid__statements__headers.qasm-parse.snap

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ expect-parse: Diag
77
--- parser ---
88
ok: false
99
panicked: false
10-
errors: 1
10+
errors: 4
1111
--- ast ---
1212
(no ast)

crates/pipeline-tests/tests/snapshots/runner__tests__snippets__reference__comments__comments.qasm-lex.snap

Lines changed: 52 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -21,57 +21,55 @@ cx q[0], q[1];
2121
ok: true
2222
errors: 0
2323
[0] Whitespace "\n" @0..1
24-
[1] Ident "OPENQASM" @1..9
25-
[2] Whitespace " " @9..10
26-
[3] Literal { kind: Float { base: Decimal, empty_exponent: false }, suffix_start: 3 } "3.0" @10..13
27-
[4] Semi ";" @13..14
28-
[5] Whitespace "\n" @14..15
29-
[6] LineComment "// Line comment before include" @15..45
30-
[7] Whitespace "\n" @45..46
31-
[8] Ident "include" @46..53
32-
[9] Whitespace " " @53..54
33-
[10] Literal { kind: Str { terminated: true }, suffix_start: 14 } ""stdgates.inc"" @54..68
34-
[11] Semi ";" @68..69
35-
[12] Whitespace " " @69..70
36-
[13] LineComment "// Inline comment" @70..87
37-
[14] Whitespace "\n" @87..88
38-
[15] BlockComment { terminated: true } "/* Block comment before declaration */" @88..126
39-
[16] Whitespace "\n" @126..127
40-
[17] Ident "qubit" @127..132
41-
[18] OpenBracket "[" @132..133
42-
[19] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "2" @133..134
43-
[20] CloseBracket "]" @134..135
44-
[21] Whitespace " " @135..136
45-
[22] Ident "q" @136..137
46-
[23] Semi ";" @137..138
47-
[24] Whitespace " " @138..139
48-
[25] BlockComment { terminated: true } "/* Inline block comment */" @139..165
49-
[26] Whitespace "\n\n" @165..167
50-
[27] LineComment "// Comment before gate" @167..189
51-
[28] Whitespace "\n" @189..190
52-
[29] Ident "h" @190..191
53-
[30] Whitespace " " @191..192
54-
[31] Ident "q" @192..193
55-
[32] OpenBracket "[" @193..194
56-
[33] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "0" @194..195
57-
[34] CloseBracket "]" @195..196
58-
[35] Semi ";" @196..197
59-
[36] Whitespace " " @197..198
60-
[37] LineComment "// Gate with comment" @198..218
61-
[38] Whitespace "\n" @218..219
62-
[39] BlockComment { terminated: true } "/* Multi-line block comment\n spanning multiple lines */" @219..276
63-
[40] Whitespace "\n" @276..277
64-
[41] Ident "cx" @277..279
65-
[42] Whitespace " " @279..280
66-
[43] Ident "q" @280..281
67-
[44] OpenBracket "[" @281..282
68-
[45] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "0" @282..283
69-
[46] CloseBracket "]" @283..284
70-
[47] Comma "," @284..285
71-
[48] Whitespace " " @285..286
72-
[49] Ident "q" @286..287
73-
[50] OpenBracket "[" @287..288
74-
[51] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "1" @288..289
75-
[52] CloseBracket "]" @289..290
76-
[53] Semi ";" @290..291
77-
[54] Whitespace "\n" @291..292
24+
[1] OpenQasmVersionStmt { major: true, minor: true } "OPENQASM 3.0" @1..13
25+
[2] Semi ";" @13..14
26+
[3] Whitespace "\n" @14..15
27+
[4] LineComment "// Line comment before include" @15..45
28+
[5] Whitespace "\n" @45..46
29+
[6] Ident "include" @46..53
30+
[7] Whitespace " " @53..54
31+
[8] Literal { kind: Str { terminated: true }, suffix_start: 14 } ""stdgates.inc"" @54..68
32+
[9] Semi ";" @68..69
33+
[10] Whitespace " " @69..70
34+
[11] LineComment "// Inline comment" @70..87
35+
[12] Whitespace "\n" @87..88
36+
[13] BlockComment { terminated: true } "/* Block comment before declaration */" @88..126
37+
[14] Whitespace "\n" @126..127
38+
[15] Ident "qubit" @127..132
39+
[16] OpenBracket "[" @132..133
40+
[17] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "2" @133..134
41+
[18] CloseBracket "]" @134..135
42+
[19] Whitespace " " @135..136
43+
[20] Ident "q" @136..137
44+
[21] Semi ";" @137..138
45+
[22] Whitespace " " @138..139
46+
[23] BlockComment { terminated: true } "/* Inline block comment */" @139..165
47+
[24] Whitespace "\n\n" @165..167
48+
[25] LineComment "// Comment before gate" @167..189
49+
[26] Whitespace "\n" @189..190
50+
[27] Ident "h" @190..191
51+
[28] Whitespace " " @191..192
52+
[29] Ident "q" @192..193
53+
[30] OpenBracket "[" @193..194
54+
[31] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "0" @194..195
55+
[32] CloseBracket "]" @195..196
56+
[33] Semi ";" @196..197
57+
[34] Whitespace " " @197..198
58+
[35] LineComment "// Gate with comment" @198..218
59+
[36] Whitespace "\n" @218..219
60+
[37] BlockComment { terminated: true } "/* Multi-line block comment\n spanning multiple lines */" @219..276
61+
[38] Whitespace "\n" @276..277
62+
[39] Ident "cx" @277..279
63+
[40] Whitespace " " @279..280
64+
[41] Ident "q" @280..281
65+
[42] OpenBracket "[" @281..282
66+
[43] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "0" @282..283
67+
[44] CloseBracket "]" @283..284
68+
[45] Comma "," @284..285
69+
[46] Whitespace " " @285..286
70+
[47] Ident "q" @286..287
71+
[48] OpenBracket "[" @287..288
72+
[49] Literal { kind: Int { base: Decimal, empty_int: false }, suffix_start: 1 } "1" @288..289
73+
[50] CloseBracket "]" @289..290
74+
[51] Semi ";" @290..291
75+
[52] Whitespace "\n" @291..292

crates/pipeline-tests/tests/snapshots/runner__tests__snippets__reference__comments__comments.qasm-parse.snap

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ h q[0]; // Gate with comment
2323
cx q[0], q[1];
2424

2525
VERSION_STRING@1..14: OPENQASM 3.0;
26-
VERSION@10..14: 3.0;
2726
INCLUDE@46..69: include "stdgates.inc";
2827
FILE_PATH@54..68: "stdgates.inc"
2928
QUANTUM_DECLARATION_STATEMENT@127..138: qubit[2] q;

0 commit comments

Comments
 (0)