-
Notifications
You must be signed in to change notification settings - Fork 39
implement a new CGIR and C code generator #1625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
Imported object types that contain a pointer to themselves weren't handled properly, leading to the interior types symbols pointing to the internal object type, not the imported type.
The section specifier wasn't at the start of the string, meaning it was placed into the procedure section.
Also adjust the related specification test.
It's not used for anything, nor will it be needed.
Instead of a dedicated tagged union type, tagged union support is now provided by allowing union fields to be associated with tag fields.
It's misplaced, but injecting the initialization earlier is currently not possible.
There's no field at position -1 in PType-based type representation, which previously caused rendering to crash.
Allows for static cleanup of the environment in some cases.
The new shape is a lot more robust and also easier to parse.
Lookup in generic instance types won't return the expected field ID otherwise.
Nim Debug Information files are not generated anymore, making the module for creating them obsolete.
Both concepts are now represented via zero-length arrays
This is meant as an accommodation for the C code generator.
* remove the array-in-struct wrapping; only emit array typedefs * remove the obsolete `cnkPtrToArrayTy` and `cnkFlexField` handling * translate pointers-to-inline-array to pointers-to-element types; taking the address of an array lets the lvalue "decay" * handle inline array types properly (they only appear in field declarations) * use a common procedure for emitting non-function C declarations
They're arrays underneath, which cannot be passed by value in C (without extra code generator support).
Now that CGIR arrays are translated to C arrays, using normal assignments for CGIR arrays no longer works.
A proper memcopy can only be omitted when the source expression operand is a proper `Expr` (otherwise taking the address is not possible), so the `genAsgn` overload taking an `Expr` value has to be used whenever a memcopy might be necessary.
Not pretty, but it's required now that CGIR arrays are translated directly to C arrays.
The parameter list for functions without any parameters must be `void` prior to C20.
In C, arrays are not first class types: they're implicitly converted to pointers to their first element, array declarations decay to pointers in parameter positions, and returning arrays from a procedure (without a pointer indirection) is not possible. To keep {.emit: "/*TYPESECTION*/ struct Foreign { int x[2]; };".}
type Foreign {.importc: "struct Foreign", nodecl.} =
x: array[2, cint] In the context of the declaration There are multiple ways to address this problem, but for now, I've simply opted for using the same approach as the previous code generator, namely to translate NimSkull/MIR arrays to C arrays directly and accommodate for the array limitation in the code generator (mostly type Obj = object
x: ptr array[2, array[2, Obj]]
# the C code generated for `Obj` doesn't compile so this is quite unfortunate. Still, the changes also allowed for simplifying the CGIR type system a bit, by removing the dedicated pointer-to-array type and flexible struct fields (both are subsumed by zero-length arrays, inspired by LLVM). |
Replacing the return type of .tailcall procedures with the `Continuation` is tricky to do in `mir2cg`, as it would happen during type translation, which doesn't have access to a mutable type environment (and neither should it).
Emit event handling may cause new entities being registered with the MIR environment, which too have to be queued for processing.
Only parts of the test matrix fail (the ones using `--tlsemulation:on`), so the test is simply disabled wholesale.
When the slice length is zero, the array pointer must not be accessed.
`.compilerproc`s are treated as never raising when used in source code, leading to the necessary error handling being omitted.
An `if` followed by a `scope` can still "contain" defs who are not "scoped", as it's possible for there to still be defs between the end of the scope and the end of the `if`. Aside: using a pseudo basic-block representation instead of a real one was a major mistake. Oh well.
ae5ed40
to
453370a
Compare
Quite an edge case, with undefined behaviour.
Some time measurements performed on Windows using hyperfine --warmup 1 "<exe> --compileOnly --verbosity:0 --hints:off --warnings:off compiler/nim.nim" on the compiler sources at devel. The C compiler used is
These results show that the new backend is significantly faster at producing C code, but with the produced C code being a lot slower. A surface-level profiling at some earlier point during development only yielded that everything became a little slower, and I haven't taken a deeper look at it yet. My guess is that that the |
Summary
Add an all new CGIR, together with a new code generation architecture
using it and a new C code generator.
Details
New CGIR
Core design decisions for the language:
asm.js
easieralready knows NimSkull and its AST
Core design decisions for the IR:
are interned
precision debug information possible
The IR also comes with a grammar and type checker, to help with
debugging, troubleshooting, and codifying the static semantics. It's
always built into the compiler, but due to its overhead, has to be
enabled at run-time by passing
-d:validateCgir
to the compiler.The old CGIR is still used for the JS and VM code generators, and thus
has to be kept for now.
New Architecture
The intertwined MIR -> CGIR -> C translation is replaced with
separate MIR -> CGIR and CGIR -> C translation steps. This allows
reusing the MIR -> CGIR parts for other code generators.
Code generator may only support a subset/dialect of the CGIR, which the
MIR -> CGIR facilitates by accepting a set of code generator
capabilities.
As a preparation for incremental compilation, the CGIR has support for
being split multiple units (i.e., modules), though this feature is not
actually used right now.
Breaking Changes
exists, but enabling emulation now causes an error
emit
statements anymoreChanges To The Produced C Code
NIM_NOALIAS
.compilerprocs
is not omitted anymoreused anymore
TFrame
instances doesn't use C macros anymoreby sem (which is roughly the order they appear in the source code)
To-Do
--expandArc
-output related test failuresNotes For Reviewers
The code is quite old and went through multiple major refactors (for example,
mir2cg
once used thesubTree
approach to tree construction). I've made multiple Q/A passes over it, but given the time I've spent working on the changes, it's likely that I've become blind to some issues.Points Of Interest:
cgir2
; contains the type definitions and traversal code for the CGIRvalidation
; implements all validation logic. It's not pretty, as both the grammar and type checker are crammed into the modulemir2cg
; implements the MIR -> CGIR translation. Given the size of the language that falls out of the MIR stage, this module is enormous, with some of its logic moved into companion modules (rtti
,mirflow
, andmirtypes2cg
)cgen
; implements both the CGIR -> C translation. The pass for generating a C translation unit description from a CGIR module is also located herecgbackend
; implements the generic backend