Skip to content

Commit 11a1b2b

Browse files
committed
Add a functionality to import external PEG files
1 parent 35b71a7 commit 11a1b2b

File tree

13 files changed

+1897
-117
lines changed

13 files changed

+1897
-117
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
PackCC: a packrat parser generator for C.
22

3-
Copyright (c) 2014, 2019-2022 Arihiro Yoshida. All rights reserved.
3+
Copyright (c) 2014, 2019-2024 Arihiro Yoshida. All rights reserved.
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 88 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# PackCC #
1+
# PackCC
22

3-
## Overview ##
3+
## Overview
44

55
**PackCC** is a parser generator for C.
66
Its main features are as follows:
@@ -41,14 +41,14 @@ This feature is irrelevant to common users, but helpful for PackCC developers to
4141

4242
PackCC itself is under MIT license, but you can distribute your generated code under any license you like.
4343

44-
## Installation ##
44+
## Installation
4545

4646
You can obtain the executable `packcc` by compiling [`src/packcc.c`](src/packcc.c) using your favorite C compiler.
4747
For convenience, the build environments using GCC, Clang, and Microsoft Visual Studio are prepared under [`build`](build) directory.
4848

49-
### Using GCC ###
49+
### Using GCC
5050

51-
#### Other than MinGW ####
51+
#### Other than MinGW
5252

5353
`packcc` will be built in both directories `build/gcc/debug/bin` and `build/gcc/release/bin` using `gcc` by executing the following commands:
5454

@@ -60,7 +60,7 @@ make check # bats-core and uncrustify are required (see tests/README.md)
6060

6161
`packcc` in the directory `build/gcc/release/bin` is suitable for practical use.
6262

63-
#### MinGW ####
63+
#### MinGW
6464

6565
`packcc` will be built in both directories `build/mingw-gcc/debug/bin` and `build/mingw-gcc/release/bin` using `gcc` by executing the following commands:
6666

@@ -72,9 +72,9 @@ make check # bats-core and uncrustify are required (see tests/README.md)
7272

7373
`packcc` in the directory `build/mingw-gcc/release/bin` is suitable for practical use.
7474

75-
### Using Clang ###
75+
### Using Clang
7676

77-
#### Other than MinGW ####
77+
#### Other than MinGW
7878

7979
`packcc` will be built in both directories `build/clang/debug/bin` and `build/clang/release/bin` using `clang` by executing the following commands:
8080

@@ -86,7 +86,7 @@ make check # bats-core and uncrustify are required (see tests/README.md)
8686

8787
`packcc` in the directory `build/clang/release/bin` is suitable for practical use.
8888

89-
#### MinGW ####
89+
#### MinGW
9090

9191
`packcc` will be built in both directories `build/mingw-clang/debug/bin` and `build/mingw-clang/release/bin` using `clang` by executing the following commands:
9292

@@ -98,10 +98,11 @@ make check # bats-core and uncrustify are required (see tests/README.md)
9898

9999
`packcc` in the directory `build/mingw-clang/release/bin` is suitable for practical use.
100100

101-
### Using Microsoft Visual Studio ###
101+
### Using Microsoft Visual Studio
102102

103103
You have to install Microsoft Visual Studio 2019 in advance.
104104
After that, you can build `packcc.exe` by the following instructions:
105+
105106
- Open the solution file `build\msvc\msvc.sln`,
106107
- Select a preferred solution configuration (*Debug* or *Release*) and a preferred solution platform (*x64* or *x86*),
107108
- Invoke the *Build Solution* menu item.
@@ -110,20 +111,21 @@ After that, you can build `packcc.exe` by the following instructions:
110111
Here, `XXX` is `x64` or `x86`, and `YYY` is `Debug` or `Release`.
111112
`packcc.exe` in the directory `build\msvc\XXX\Release` is suitable for practical use.
112113

113-
## Usage ##
114+
## Usage
114115

115-
### Command ###
116+
### Command
116117

117-
You must prepare a PEG source file (see the following section).
118-
Let the file name `example.peg` for example.
118+
You must prepare a PEG source file in advance.
119+
For details of the PEG source syntax, see the section "Syntax".
120+
Here, let the file name `example.peg` for example.
119121

120122
```
121123
packcc example.peg
122124
```
123125

124126
By running this, the parser source `example.h` and `example.c` are generated.
125127

126-
If no PEG file name is specified, the PEG source is read from the standard input, and `-.h` and `-.c` are generated.
128+
If no PEG file name is specified, the PEG source is read from the standard input, and `-.h` and `-.c` will be generated.
127129

128130
The base name of the parser source files can be changed by `-o` option.
129131

@@ -132,6 +134,19 @@ packcc -o parser example.peg
132134
```
133135

134136
By running this, the parser source `parser.h` and `parser.c` are generated.
137+
This option can be specified only once.
138+
139+
A directory to search for import files can be added by `-I` option (version 2.0.0 or later).
140+
This option can be specified as many times as needed.
141+
The firstly specified directory will be searched first, the secondly specified directory will be searched next, and so on.
142+
143+
```
144+
packcc -I foo -I bar/baz example.peg
145+
```
146+
147+
By running this, the directory `foo` is searched first, and the directory `bar/baz` is searched next.
148+
The directories specified by this option have higher priority than those specified in the environment variable `PCC_IMPORT_PATH` and the default directories.
149+
For more details of import, see the explanation of `%import` written in the section "Syntax".
135150

136151
If you want to disable UTF-8 support, specify the command line option `-a` or `--ascii` (version 1.4.0 or later).
137152

@@ -144,7 +159,7 @@ If you want to confirm the version of the `packcc` command, execute the below.
144159
packcc -v
145160
```
146161

147-
### Syntax ###
162+
### Syntax
148163

149164
A grammar consists of a set of named rules.
150165
A rule definition can be split into multiple lines.
@@ -317,37 +332,37 @@ All matched actions are guaranteed to be executed only once.
317332

318333
In the action, the C source code can use the predefined variables below.
319334

320-
- **`$$`**
335+
- **`$$`** :
321336
The output variable, to which the result of the rule is stored.
322337
The data type is the one specified by `%value`.
323338
The default data type is `int`.
324-
- **`auxil`**
339+
- **`auxil`** :
325340
The user-defined data that has been given via the API function `pcc_create()`.
326341
The data type is the one specified by `%auxil`.
327342
The default data type is `void *`.
328-
- _variable_
343+
- _variable_ :
329344
The result of another rule that has already been evaluated.
330345
If the rule has not been evaluated, it is ensured that the value is zero-cleared (version 1.7.1 or later).
331346
The data type is the one specified by `%value`.
332347
The default data type is `int`.
333-
- **`$`**_n_
348+
- **`$`**_n_ :
334349
The string of the captured text.
335350
The _n_ is the positive integer that corresponds to the order of capturing.
336351
The variable `$1` holds the string of the first captured text.
337-
- **`$`**_n_**`s`**
352+
- **`$`**_n_**`s`** :
338353
The start position in the input of the captured text, inclusive.
339354
The _n_ is the positive integer that corresponds to the order of capturing.
340355
The variable `$1s` holds the start position of the first captured text.
341-
- **`$`**_n_**`e`**
356+
- **`$`**_n_**`e`** :
342357
The end position in the input of the captured text, exclusive.
343358
The _n_ is the positive integer that corresponds to the order of capturing.
344359
The variable `$1e` holds the end position of the first captured text.
345-
- **`$0`**
360+
- **`$0`** :
346361
The string of the text between the start position in the input at which the rule pattern begins to match
347362
and the current position in the input at which the element immediately before the action ends to match.
348-
- **`$0s`**
363+
- **`$0s`** :
349364
The start position in the input at which the rule pattern begins to match.
350-
- **`$0e`**
365+
- **`$0e`** :
351366
The current position in the input at which the element immediately before the action ends to match.
352367

353368
An example is shown below.
@@ -390,17 +405,20 @@ rule2 <- (e1 e2 e3) ~{ error("one of e[123] has failed"); }
390405
The specified C source code is copied verbatim to the C header file before the generated parser API function declarations.
391406
Any braces in the C source code must be properly nested.
392407
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.
408+
When `%header` is used multiple times, the respective C source codes are copied in order of their appearance.
393409

394410
**`%source` `{` _c source code_ `}`**
395411

396412
The specified C source code is copied verbatim to the C source file before the generated parser implementation code.
397413
Any braces in the C source code must be properly nested.
398414
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.
415+
When `%source` is used multiple times, the respective C source codes are copied in order of their appearance.
399416

400417
**`%common` `{` _c source code_ `}`**
401418

402419
The specified C source code is copied verbatim to both of the C header file and the C source file
403420
before the generated parser API function declarations and the implementation code respectively.
421+
This has the same effect as `%header {` _c source code_ `} %source {` _c source code_ `}`.
404422
Any braces in the C source code must be properly nested.
405423
Note that braces in directive lines and in comments (`/*`...`*/` and `//`...) are appropriately ignored.
406424

@@ -419,15 +437,42 @@ This can be useful for example when it is necessary to modify behavior of standa
419437

420438
The type of output data, which is output as `$$` in each action and can be retrieved from the parser API function `pcc_parse()`,
421439
is changed to the specified one from the default `int`.
440+
This can be used only once and cannot be used in imported files.
422441

423442
**`%auxil` `"`_user-defined data type_`"`**
424443

425444
The type of user-defined data, which is passed to the parser API function `pcc_create()`,
426445
is changed to the specified one from the default `void *`.
446+
This can be used only once and cannot be used in imported files.
427447

428448
**`%prefix` `"`_prefix_`"`**
429449

430450
The prefix of the parser API functions is changed to the specified one from the default `pcc`.
451+
This can be used only once and cannot be used in imported files.
452+
453+
**`%import` `"`_import file name_`"`**
454+
455+
The content of the specified import file is expanded at the text location of `%import` (version 2.0.0 or later).
456+
This can be used multiple times anywhere and can be used also in imported files.
457+
The _import file name_ can be a relative path to the current directory or an absolute path.
458+
If it is a relative path, the directories listed below are searched for the import file in the listed order.
459+
460+
1. the directory where the file that imports the import file is located
461+
2. the directories specified with `-I` options
462+
- They are prioritized in order of their appearance in the command line.
463+
3. the directories specified by the environment variable `PCC_IMPORT_PATH`
464+
- They are prioritized in order of their appearance in the value of this variable.
465+
- The character used as a delimiter between directory names is the colon `':'` if PackCC is built for a Unix-like platform such as Linux, macOS, and MinGW.
466+
The character is the semicolon `';'` if PackCC is built as a native Windows executable.
467+
(This is exactly the same manner as the environment variable `PATH`.)
468+
4. the per-user default directory
469+
- This is the subdirectory `.packcc/import` in the home directory if PackCC is built for a Unix-like platform,
470+
and in the user profile directory, "`C:\Users\`_username_" for example, if PackCC is built as a native Windows executable.
471+
5. the system-wide default directory
472+
- This is the directory `/usr/share/packcc/import` if PackCC is built for a Unix-like platform,
473+
and is the subdirectory `packcc/import` in the common application data directory, "`C:\ProgramData`" for example.
474+
475+
Note that the file imported once is silently ignored when it is attempted to be imported again.
431476

432477
**`#`_comment_**
433478

@@ -440,7 +485,16 @@ All text following `%%` is copied verbatim to the C source file after the genera
440485

441486
<small>(The specification is determined by referring to [peg/leg](http://piumarta.com/software/peg/) developed by Ian Piumarta.)</small>
442487

443-
### Macros ###
488+
### Import Files
489+
490+
The following import files are currently bundled.
491+
492+
- [`import/char/ascii_character_group.peg`](import/char/ascii_character_group.peg) :
493+
This contains various rules to match an ASCII character belonging to a specific character group.
494+
- [`import/char/unicode_general_category.peg`](import/char/unicode_general_category.peg) :
495+
This contains various rules to match a Unicode character belonging to a specific [general category](https://unicode.org/reports/tr44/#General_Category_Values).
496+
497+
### Macros
444498

445499
Some macros are prepared to customize the parser.
446500
The macro definition should be in <u>`%source` section</u> in the PEG source.
@@ -560,9 +614,10 @@ For other events, `buffer` and `length` indicate a part of the currently loaded
560614
The user-defined data passed to the API function `pcc_create()` can be retrieved from this argument.
561615
562616
There are currently three supported events:
563-
- `PCC_DBG_EVALUATE` (= 0) - called when the parser starts to evaluate `rule`
564-
- `PCC_DBG_MATCH` (= 1) - called when `rule` is matched, at which point buffer holds entire matched string
565-
- `PCC_DBG_NOMATCH` (= 2) - called when the parser determines that the input does not match currently evaluated `rule`
617+
618+
- `PCC_DBG_EVALUATE` (= 0) - called when the parser starts to evaluate `rule`
619+
- `PCC_DBG_MATCH` (= 1) - called when `rule` is matched, at which point buffer holds entire matched string
620+
- `PCC_DBG_NOMATCH` (= 2) - called when the parser determines that the input does not match currently evaluated `rule`
566621
567622
A very simple implementation could look like this:
568623
@@ -590,7 +645,7 @@ The initial size (the number of elements) of the internal arrays other than the
590645
The arrays are expanded as needed.
591646
The default is `2`.
592647
593-
### API ###
648+
### API
594649
595650
The parser API has only 3 simple functions below.
596651
@@ -653,9 +708,9 @@ while (pcc_parse(ctx, &ret));
653708
pcc_destroy(ctx);
654709
```
655710

656-
## Examples ##
711+
## Examples
657712

658-
### Desktop calculator ###
713+
### Desktop calculator
659714

660715
A simple example which provides interactive four arithmetic operations of integers is shown here.
661716
Note that **left-recursive** grammar rules are defined in this example.
@@ -700,7 +755,7 @@ int main() {
700755
}
701756
```
702757

703-
### AST builder for Tiny-C ###
758+
### AST builder for Tiny-C
704759

705760
You can find the more practical example in the directory [`examples/ast-tinyc`](examples/ast-tinyc).
706761
It builds an AST (abstract syntax tree) from an input source file
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# This file is hereby placed in the public domain.
2+
#
3+
# THIS SOFTWARE IS PROVIDED BY THE AUTHORS AS IS AND ANY EXPRESS
4+
# OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
5+
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
6+
# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE
7+
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
8+
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
9+
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
10+
# BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
11+
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
12+
# OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
13+
# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
14+
15+
ASCII_Printable_Character <- ASCII_Special_Character / ASCII_Number / ASCII_Letter
16+
ASCII_Letter <- ASCII_Uppercase_Letter / ASCII_Lowercase_Letter
17+
18+
ASCII_Control_Character <- [\x00-\x1f\x7f]
19+
ASCII_Special_Character <- [\x20-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]
20+
ASCII_Number <- [0-9]
21+
ASCII_Uppercase_Letter <- [A-Z]
22+
ASCII_Lowercase_Letter <- [a-z]
23+
24+
ASCII_C_alnum <- [0-9A-Za-z]
25+
ASCII_C_alpha <- [A-Za-z]
26+
ASCII_C_blank <- [ \t]
27+
ASCII_C_cntrl <- [\x00-\x1f\x7f]
28+
ASCII_C_digit <- [0-9]
29+
ASCII_C_graph <- [\x21-\x7e]
30+
ASCII_C_lower <- [a-z]
31+
ASCII_C_print <- [\x20-\x7e]
32+
ASCII_C_punct <- [\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]
33+
ASCII_C_space <- [ \t\n\v\f\r]
34+
ASCII_C_upper <- [A-Z]
35+
ASCII_C_xdigit <- [0-9A-Fa-f]

0 commit comments

Comments
 (0)