-
Notifications
You must be signed in to change notification settings - Fork 4
Bumb libclang version
#1292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bumb libclang version
#1292
Conversation
Closes #1220.
02b2df8 to
6ef41ae
Compare
|
@TravisCardwell @dschrempf This PR works, but only for Option 1First, we can skip the new test I add in this PR for However, the way that we deal with macros in names is not actually great. The solution that we adopted previously is -- If the expansion location and the spelling location /of the name/
-- are different, this means that the name is constructed using a
-- macro. This must therefore have been done by the user.However, this is not quite accurate. If we have a macro that would construct the entire struct, for example, it may very well be that the struct anonymous. Option 2I'm wondering if there isn't an alternative solution. The heart of the problem is that later typedef struct { .. } foo;
typedef struct foo { .. } foo;because So here's my proposal: what if we don't distinguish either? In other words, we treat these two examples as exactly the same, just like struct outer {
struct {
int inner_field;
} outer_field;
}that inner struct is still anonymous; ProposalSo concretely, I'm proposing: an unnamed struct defined inside a typedef is not considered anonymous; instead, we consider its tag to be derived from the typedef name. I think this can work, and it means we no longer have to jump through so many hoops (and then still fail) to make things work with Thoughts? |
|
Consider the following example, where struct foo { char a; char b; };
typedef struct { double x; double y; } foo;Such terrible (yet valid) C code is an example of code that we cannot handle well by default: we generate conflicting
With this proposal, we would no longer distinguish the first two; they would both be |
|
I have been thinking about possibilities... While I have not yet thought of any particularly good ones, my best candidate so far is something that we previously decided to not do: global renaming. After parsing declarations, where anonymous declarations declared inside of a For example, we could rename by appending Note that using source locations would be a bad idea, because that would be extremely fragile. This strategy is not very elegant, but users would only be exposed to it when dealing with particularly terrible C code. It would allow users to distinguish all declarations in binding specifications as well as selection predicates. We could provide an Thoughts? |
|
Note that test (I ran across this while reviewing changes for the |
|
Is this only affecting structs and unions? Could we separate the situations typedef struct foo {..} foo;
typedef struct {..} foo;using a hand-rolled parser on If not, globally checking for conflicting declarations may be the only way to go. I am a bit skeptical though, because, even though the bindings will be deterministic, there may be unexpected renamings. It also means that the names of the translated declarations of a module will depend on other modules in the closure which is a bit of a red flag. |
|
Parsing tokens is indeed straightforward if there is no CPP. Macros significantly complicate things, and use of macros to generate type declarations is not uncommon practice. (Sometimes macros are used to implement naming conventions, with the body outside the macro. Sometimes macros are used to implement templating of similar types, with the whole declaration generated by the macro.) This is what we are doing now. My understanding is that we are discovering that differences across different versions of LLVM/Clang (due to improvements as well as bugs) make this difficult/impossible to get correct across all supported versions. Indeed, we avoided global renaming for good reasons. I just wonder if it is less complicated than the alternatives, especially given that users will only see it when translating particularly poorly written C.
I am not sure I understand what you mean. Imports in a module are all qualified (aside from a small set of standard |
I think I miscomunicated, using Is it true that global renaming means that other "C files" in the translation unit can influence the name of a declaration? That is, can the name of a declaration at hand change depending on other files we include, or do not include in the translation unit? |
|
Yes: we would need to consider names for all of the declarations that we "parse," which (may) include declarations across transitive includes (depending on the parse predicate). As a minimal example of when this matters, consider the following struct foo { char a; char b; };That declaration is translated to Haskell type typedef struct { double x; double y; } foo;These types are valid C. There are no conflicts because Note, however, that this case only arises when there is a name collision, caused by different C types where the tag of one type is the same as the ordinary name of another type. Currently, we generate Haskell types with name collisions, and it is up to users to provide a prescriptive binding specification in order to even generate valid Haskell. With this change, we would generate valid Haskell by default, and users may optionally use a prescriptive binding specification to configure better names. |
|
Thank you, this clarifies things. This means we should fall back to global renaming only if we don't find another solution. On the other hand, it seems like we have tried hard, and didn't find one so far. |
|
Interestingly, for @TravisCardwell ' example of struct foo { char a; char b; };
typedef struct { double x; double y; } foo;
So we indeed have two structs both called For me numbering declarations is a no; we've worked really hard not do to that. For now I'll go with "Option 1" above as a stop-gap measure and I'll open a ticket for us to reconsider this. |
IMHO no, but it should not matter much if such C code is rare. 😄 Perhaps we should document that |
Closes #1220.