-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: rewrite the C code generator #1333
base: devel
Are you sure you want to change the base?
Conversation
There's not much to it. The code could be shortened a bit using templates, but that can happen at a later point. The definition of `CodeGenEnv` is hand-waved into the future.
They're meant to be easy to use and have low overhead.
All relevant C code generator modules are suffixed with a "2", in order to make room for the new modules. They're not yet removed, so that their code can still be referenced easily.
The general structure is similar to the old `cbackend`, but with two important differences: * the global and per-module types are owned by orchestrator now, not `cgendata` * the output (i.e., the C files) are funnelled through a dedicated type (`Output`)
It works much like the previous version, but with more generalized support for header files. Compare to before, all the write-to-disk management is now fully handled by the orchestrator, not the code generator (i.e., `cgen`). The compiler compiles again (but the result cannot compile the compiler, for obvious reasons).
For helping me in the development, I've added a small profiling utility: the Various key procedure are instrumented with |
`CNode` erroneously used a raw `uint32` for `ident`.
The simplest solution for now. Moving them to a separate type might be better, but that can happen later.
Some field names were outdated.
This also includes some mid-end processing, like destructor call optimizations, in order to get a better relative feel for where time is spent.
The orchestrator will need it to concatenate partial MIR bodies.
The MIR environment is owned by the `CodeGenEnv` now.
Simple: if assembling produced some code, append it to output list, otherwise don't. In other words, much like before, no C file is created for modules that don't result in any code.
Everything only needed within a single module is stored in `BModule`, things that are shared are stored globally (in `BModuleList`). This keeps the scopes of local entities small, and will make it easy to free memory early (by destroying a `BModule` instance once the C code for it has been generated).
Some details are still missing, but the general flow is there. CIR is generated for the various entities, which is then put into either the global or module-local AST. When all CIR was generated, `assemble` gathers everything the TU needs into a single place and renders the result.
The genX procedure are expected to output at least *something*, otherwise sadness ensues, so an empty block is temporarily emitted.
I've implemented the basic code orchestration flow. It's rather simple, especially compared to before: the orchestrator runs common backend processing ( Except for registering some new identifiers, the new C code generator itself doesn't modify any global state -- it simply takes a MIR body and outputs the CIR for it. No new entities are registered with the MIR environment. This makes it possible to handle Finally, the new orchestrator also addresses the issue of ostensibly small changes in one module causing many modules (sometimes the whole project) to be recompiled, something which got exacerbated when the code generation orchestrator architecture was first introduced. |
The next big blocker is the missing type IR for the MIR. It's the basis for lowering With the MIR's type IR in place, further work on both |
The list of things that need to be moved outside the code generator (via separate PRs) before the rewrite can be completed:
|
This PR is a from-scratch rewrite of the C code generator. It's the
successor to #424, incorporating many ideas and lessons-learned from
the latter. The concrete goals are to:
the legacy C code generator
to inspect)
A big focus is on using a data-oriented design. The planned
architecture is as follows:
CIR)
C's syntax
(types, procs, etc.) -- the orchestrator does
rendered into textual C code and assembled into complete C files,
which are then written to disk
Using an intermediate IR has multiple benefits over directly
translating to text:
duplicate identifiers
used at the end
types, etc.)
generator doesn't have to worry about formatting)
(*a).x
toa->x
)The new code generator operates directly on the MIR, no full in-between
IR (like the CGIR) is used. Prior to code generation, the MIR is
lowered to a degree where:
entities
The PR is a work in progress. While the broad design and direction is
likely final already, many details are most likely going to change.