Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify Named ABIs and C type sizes and alignments #117

Merged
merged 6 commits into from
Oct 14, 2019

Conversation

lenary
Copy link
Contributor

@lenary lenary commented Oct 9, 2019

I have found the specification confusing, because C type sizes and
alignments have so far been defined only for the architectures RV32G
and RV64G. This is confusing, as there are 32- and 64-bit architectures
without all parts of the G extension which also share these type sizes
and alignments.

This pull request separates the definitions of C type sizes and alignments
into new section, and defines them for particular ABIs, not
architectures. The definitions themselves remain largely unchanged,
though wording has been clarified and extra subsection headers have been
introduced.

For the avoidance of doubt around C struct alignment rules, I have
defined the RISC-V psABI rules around size and alignment of structures
and unions. This is based on both the x86-64 and Intel386 System V psABI
(which is not mentioned in the specification so far).

This now explicitly enumerates the sizes and alignments of C11's complex
types into the tables of sizes and alignments, based on the defined rules.

This also adds definitions of the following "named" ABIs, in terms of the
calling convention and C type details. These definitions match the
interpretation given by the -mabi argument to GCC and Clang.

  • ilp32
  • ilp32f
  • ilp32d
  • ilp32e
  • lp64
  • lp64f
  • lp64d
  • lp64q (I don't know if this is actually supported by GCC and Clang, but this combination is mentioned in "Default ABIs", so it seems reasonable to define it based on the existing naming scheme)

@lenary
Copy link
Contributor Author

lenary commented Oct 9, 2019

For clarity, these changes do not include any changes to sizes/alignments of atomic types, as raised in #112 - those changes are on hold while the x86-64 ABI change is discussed.

@aswaterman
Copy link
Contributor

aswaterman commented Oct 9, 2019 via email

Copy link
Collaborator

@asb asb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically looks good to me, though I add a few inline comments. Thanks Sam.

riscv-elf.md Outdated Show resolved Hide resolved
riscv-elf.md Outdated Show resolved Hide resolved
The alignment of `max_align_t` is 16.

Structs and unions are aligned to the alignment of their most strictly aligned
member. The size of any object is a multiple of its alignment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't true of packed structs. Although the psABI might not want to over-specify behaviour in this case, it might still be worth adding an explicit acknowledgement that this statement won't hold in some cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just went spelunking in the clang codebase, as well as every spec I could find.

  • The C spec leaves this implementation-/un-defined. This applies to any value returned from sizeof(…), but also the implementation of #pragma packed and the like.
  • the C11 spec does require objects to have a size in bits that is a multiple of CHAR_BITS (§6.2.6.1p4). (This is unchanged in the C17 spec)
  • CHAR_BITS is required by C11 (and C17) to be at least 8.
  • This psABI defines sizeof(char) and _Alignof(char): they are both 1 (table above).
  • This has to be the weakest alignment requirement (C11 §6.2.8p6).
  • The psABIs I've found so far have not mentioned packed structs at all.
  • #pragma packed(n) is documented pretty well by microsoft (the n specifies the maximal allowed alignment, in bytes). The GCC docs match this understanding.
  • #pragma packed is the same as #pragma packed(1), ie choosing to align to the nearest byte.
  • C11 defines arrays as "a contiguously allocated nonempty set of objects with a particular member object type" (§6.2.5p10). This means that there is no padding between array elements. To ensure that each element of an array of structs is aligned correctly, trailing padding may have to be added to the struct itself (trailing padding on structs is allowed), to bring it to its specified alignment.
  • Looking at the implementation in clang (which I expect, though have not checked, matches GCC fairly closely), packed structs have the same alignment as char, unless you're explicit about which alignment is requested (using n, described above).

Therefore:

  • If you request #pragma packed, your struct is aligned to the same alignment as char (1), and so this sentence is true.
  • If you request #pragma packed(n), each field of your struct is aligned to at most n times the alignment of char, and so the whole struct is aligned to at most n times the alignment of char (using the sentence above). This is a case where trailing padding may have to be used so that the whole struct size (in multiples of char) is a multiple of n.

So I think the definition in this paragraph remains correct in the presence of #pragma packed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I could have been convinced without such a detailed response, but I appreciate the super in-depth correction all the same! :)

@lenary
Copy link
Contributor Author

lenary commented Oct 10, 2019

Noted about the redundancy in XLEN. Happy to update those paragraphs.

I think we're also missing a definition of CHAR_BITS, which I will add. It needs to be 8, I think.

Copy link
Collaborator

@asb asb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like @palmer-dabbelt, @jim-wilson, or @aswaterman to give an LGTM too, but this looks good to me. Thanks!

@asb
Copy link
Collaborator

asb commented Oct 10, 2019

Also, please squash into a single commit (or at least squash the "Removes redundant XLEN clauses" if you'd rather keep "Adds definition of CHAR_BITS separate).

Copy link
Collaborator

@jim-wilson jim-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks OK to me except for the wchar_t and wint_t references as these should be specified by the C library, not the processor specific ABI.

It would be OK to specify this stuff in a linux specific ABI, but not in an ABI that is used for all targets.

riscv-elf.md Outdated Show resolved Hide resolved
riscv-elf.md Outdated Show resolved Hide resolved
riscv-elf.md Outdated
short | 2 | 2
int | 4 | 4
wchar_t | 4 | 4
wint_t | 4 | 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, wint_t is a property of the C library, and different C libraries have different definitions for wint_t, with different type sizes. I think it should not be mentioned here.

riscv-elf.md Outdated Show resolved Hide resolved
@lenary
Copy link
Contributor Author

lenary commented Oct 10, 2019

@asb I shall squash out the XLEN change, but not the CHAR_BITS change.

@jim-wilson I think definitions of the signedness, size, and alignment of wchar_t and wint_t were introduced in #57 and #58, so it would be good to get approval from @aswaterman and @asb before I remove them. I'm happy to do so, I just want to ensure this is the consensus. I'll note there is no mention of them in the system-v psABI documents for intel that I have.

@aswaterman
Copy link
Contributor

@jim-wilson what I’m confused about is, how are they not part of the ABI, given that you can’t correctly link objects that use different wchar_t disciplines? Wouldn’t it be more correct to say that it is part of the ABI, but these different C libraries implement different ABIs than each other?

@jim-wilson
Copy link
Collaborator

Wouldn’t it be more correct to say that it is part of the ABI, but these different C libraries implement different ABIs than each other?

Yes, that is one way to look at it. There is no expectation that code compiled for linux will work on freebsd and vice versa for instance, so they can have different ABIs. But we need a common base for portability between different systems, and that is the part we are defining here. So part of the ABI is specified by the processor, and part of the ABI is specified by the C library and/or OS. We should only be defining the processor specific parts of the ABI here, and that does not include wchar_t and wint_t. Unless you want to start specifying C library and/or OS specific ABIs, then you could include wchar_t and wint_t in those documents.

I suppose another way to look at this is as historical error, that we didn't make things like wchar_t and wint_t processor specific at the start. But these C libraries and operating systems are mostly written independently, and it is hard to get everyone to agree on everything before hand. We didn't discover that we had a problem until too late. Some of these problems happened before linux and gcc/clang started dominating development, and coordination was a lot harder back then. When everyone has their own compiler and OS sources that are different than everyone else, it is easy for different systems to end up with different types for things like wchar_t and wint_t. But maybe someday in the future these wchar_t and wint_t problems will be fixed.

@aswaterman
Copy link
Contributor

Makes sense. Maybe we should move wchar_t and friends to another table at the bottom of this document (“Additional Linux ABI Types” or something) so this table can be “Base ABI Types” or something.

@asb
Copy link
Collaborator

asb commented Oct 11, 2019

I think it's definitely worth continuing to indicate the properties of wchar_t and co under "standard" or "Linux" ABIs - it's useful to compiler authors and there's not much point in moving out to a separate document. Putting it under "Additional Linux ABI Types" sounds sensible.

I have found the specicifaction confusing, because C type sizes and
alignments have so far been defined only for the *architectures* RV32G
and RV64G. This is confusing, as there are 32- and 64-bit architectures
without all parts of the G extension which also share these type sizes
and alignments.

This commit separates the definitions of C type sizes and alignments
into new section, and defines them for particular ABIs, not
architectures. The definitions themselves remain largely unchanged,
though wording has been clarified and extra subsection headers have been
introduced.

For the avoidance of doubt around C struct alignment rules, I have
defined the RISC-V psABI rules around size and alignment of structures
and unions. This is based on both the x86-64 and Intel386 System V
psABI. These rules apply to both packed and non-packed structs.

This commit explicitly enumerates the sizes and alignments of C11's
complex types into the tables of sizes and alignments, based on the
definition these types as struct-like and struct representation details.

I have removed the definitions for `wint_t` and `wchar_t` pending a
future commit to add them to a Linux-specific RISC-V psABI. It is noted
that these definitions were wrong for some platforms such as FreeBSD
that use ELF but have their own requirements around `wint_t` and
`wchar_t`.
The specification was previously over-perscriptive in the definitions of
the size, alignment, and signedness of `wchar_t` and `wint_t`.

After some discussion it has been agreed to split these definitions out
from the main specification and into a separate section that only
applies on Linux. This commit introduces that section with the existing
definitions for `wchar_t` and `wint_t`.
@lenary
Copy link
Contributor Author

lenary commented Oct 11, 2019

I have:

  • Squashed the XLEN commit
  • Fixed up the table of contents and reworded the first commit message to finish the unfinished sentence.
  • Introduced the linux-specific ABI section (see the final commit in the branch)
  • Moved the definitions for wchar_t and wint_t into the linux-specific ABI section.

@lenary lenary changed the title Clarify C type sizes and alignments Clarify Named ABIs and C type sizes and alignments Oct 11, 2019
@lenary
Copy link
Contributor Author

lenary commented Oct 11, 2019

I have added a commit to name some specific ABIs. These should match commonly accepted definitions currently used in GCC/clang (though I know clang does not yet support all of them).

floating-point register in the ABI. The ISA might have wider
floating-point registers than the ABI.
For the purposes of this section, FLEN refers to the width of a floating-point
register in the ABI. The ABI's FLEN must be no wider than the ISA's FLEN. The
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added sentence is "The ABI's FLEN must be no wider than the ISA's FLEN."

This calling convention is the same as the integer calling convention,
except for the following differences. The stack pointer need only be aligned
to a 32-bit boundary. Registers x16-x31 do not participate in the ABI, so
The ILP32E calling convention is designed to be usable with the RV32E ISA. This
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this ABI to "ILP32E", because that is the ABI name, not the ISA name.

riscv-elf.md Outdated
* **LP64D**: LP64 with hardware floating-point calling convention for FLEN=64
(i.e. ELFCLASS64 and EF_RISCV_FLOAT_ABI_DOUBLE).

* **LP64Q**: LP64 with hardware floating-point calling convention for FLEN=128
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This combination is noted under the "Default ABIs" section, so I decided to give it an explicit name.

Copy link
Collaborator

@jim-wilson jim-wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current proposal looks OK to me.

This commit adds explicitly named ABI definitions, to match and extend
the conventions used so far by GCC and Clang.

The aim here is to be explicit about what each named ABI means in terms
of which parts of the calling conventions are used, when they are
compatible with different ISAs, and how this maps to C types and sizes.

This Commit defines
- ILP32 (32-bit, no hardware floating-point)
- ILP32F (32-bit, single-precision hardware floating-point)
- ILP32D (32-bit, double-precision hardware floating-point)
- ILP32E (32-bit, no hardware floating point, RVE registers only)
- LP64 (64-bit, no hardware floating-point)
- LP64F (64-bit, single-precision hardware floating-point)
- LP64D (64-bit, double-precision hardware floating-point)
- LP64Q (64-bit, quad-precision hardware floating-point)
@lenary
Copy link
Contributor Author

lenary commented Oct 14, 2019

These two commits do two things:

  • Add <a name=>s so we can link to the named ABIs.
  • Reword the Default ABIs in terms of the named ABIs.

@aswaterman
Copy link
Contributor

Any objections to my merging this PR?

@aswaterman
Copy link
Contributor

@lenary can you also update the PR to include your name in the list of contributors at the top of the file, and perhaps reformat it to add some line breaks :)

@lenary
Copy link
Contributor Author

lenary commented Oct 14, 2019

@aswaterman Done. Rendered

This uses the "two spaces before a newline" linebreak in markdown, so stripping all whitespace on save will remove the linebreaks. Happy to change it to a bulleted list if this trailing whitespace will be problematic.

@aswaterman
Copy link
Contributor

Thank you. I’m going to pull the trigger; if there are any lingering concerns, please address with follow-up PRs/issues.

@aswaterman aswaterman merged commit 16964f4 into riscv-non-isa:master Oct 14, 2019
@lenary lenary deleted the lenary/c-types-alignments branch October 14, 2019 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants