-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use constant terms as identifiers in names and variables #1281
base: next
Are you sure you want to change the base?
Conversation
cf2adb4
to
8150259
Compare
Should be ready for reviewing. I am running some tests to be sure that I did not break anything with variables. |
46f86ef
to
32a687f
Compare
This PR changes the way we construct identifiers for names (in Symbols) and variables (in Var). We plan to completely replace the AE identifiers with identifiers from Dolmen. This PR serves as an intermediate step from string-based identifiers to Dolmen identifiers. - All the identifiers are now managed in the Id module. - Constant terms are used for declared or defined identifiers in the input file. - Internal identifiers are still pre-mangled strings. Replacing them with Dolmen identifiers requires careful investigation, particularly in the AC(X) implementation. - I haven't removed Id.typed yet. After this PR, the user identifiers will be typed, so we do not need to store them separately. If we use the internal type of the identifier, we will need to implement a conversion function from Dolmen type to Alt-Ergo type, which is not trivial. I prefer keeping this type and remove it after transitioning from Alt-Ergo types to Dolmen types. - The new API of Id prevents us from creating duplicate fresh/fresh_ac/abstract identifiers. - I still check that user identifiers do not start with . or @. In Before this PR, we mangled the identifier if it happened. In this PR, we crash. I don't have a strong opinion on this behavior, it was just simpler to implement this way. - I use identifiers for term variables too. I preserved the internal representation because: 1. The notion of local variables is AE-specific, whereas the local name in Dolmen identifiers is a distinct concept. 2. After this PR, variables are typed but what is the type of Underscore? May be forall a. a? This PR have been tested on ae-format (+2-2).
src/lib/structures/id.ml
Outdated
Dolmen.Std.Expr.Term.Const.hash tcst | ||
| Hstring { hs; _ } -> | ||
(* NB: Internal identifiers are pre-mangled, which means that we do not | ||
need to take the name space into consideration when hashing. *) | ||
Hstring.hash hs | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are probably a lot of collisions here because both Dolmen and Alt-Ergo use increasing integers starting from 0 for Hstring
and Id
hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually reminds me that, even though it's often tempting (and dolmen falls into that trap), using exactly the unique id as a hash is not always a very good choice: it might be good in the case where you expect all (or most) of the values to be in the hastbl, but especially with hashtables that uses hash modulo the table size, there may be some surprising collisions, particularly if there is some pattern to which values are added to the table (e.g. if only values with an even unique id are added to the table, or other similar situations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it is good when usage is dense (the hash table effectively becomes a dynarray -- e.g. for hash consing) but I'm partial myself to hashing the integer identifier in hash
as a default and exposing it so that people that know they are in the dense scenario can use it instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, that could indeed be the source of the slowdown. I will try another hash function.
This PR changes the way we construct identifiers for names (in
Symbols
) and variables (inVar
).We plan to completely replace the AE identifiers with identifiers from Dolmen. This PR serves as an intermediate step from string-based identifiers to Dolmen identifiers.
Id
module.particularly in the AC(X) implementation.
Id.typed
yet. After this PR, the user identifiers will be typed, so we do not need to store them separately. If we use the internal type of the identifier, we will need to implement a conversion function from Dolmen type to Alt-Ergo type, which is not trivial. I prefer keeping this type and remove it after transitioning from Alt-Ergo types to Dolmen types.Id
prevents us from creating duplicate fresh/fresh_ac/abstract identifiers..
or@
. In Before this PR, we mangled the identifier if it happened. In this PR, we crash. I don't have a strong opinion on this behavior, it was just simpler to implement this way.Underscore
? May beforall a. a
?This PR have been tested on ae-format (+2-2).