Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use constant terms as identifiers in names and variables #1281

Draft
wants to merge 7 commits into
base: next
Choose a base branch
from

Conversation

Halbaroth
Copy link
Collaborator

@Halbaroth Halbaroth commented Jan 30, 2025

This PR changes the way we construct identifiers for names (in Symbols) and variables (in Var).

We plan to completely replace the AE identifiers with identifiers from Dolmen. This PR serves as an intermediate step from string-based identifiers to Dolmen identifiers.

  • All the identifiers are now managed in the Id module.
  • Constant terms are used for declared or defined identifiers in the input file.
  • Internal identifiers are still pre-mangled strings. Replacing them with Dolmen identifiers requires careful investigation,
    particularly in the AC(X) implementation.
  • I haven't removed Id.typed yet. After this PR, the user identifiers will be typed, so we do not need to store them separately. If we use the internal type of the identifier, we will need to implement a conversion function from Dolmen type to Alt-Ergo type, which is not trivial. I prefer keeping this type and remove it after transitioning from Alt-Ergo types to Dolmen types.
  • The new API of Id prevents us from creating duplicate fresh/fresh_ac/abstract identifiers.
  • I still check that user identifiers do not start with . or @. In Before this PR, we mangled the identifier if it happened. In this PR, we crash. I don't have a strong opinion on this behavior, it was just simpler to implement this way.
  • I use identifiers for term variables too. I preserved the internal representation because:
    1. the notion of local variables is AE-specific, whereas the local name in Dolmen identifiers is a distinct concept.
    2. After this PR, variables are typed but what is the type of Underscore? May be forall a. a?

This PR have been tested on ae-format (+2-2).

@Halbaroth Halbaroth force-pushed the use-term-cst-in-mdl branch 2 times, most recently from cf2adb4 to 8150259 Compare January 30, 2025 17:31
@Halbaroth Halbaroth marked this pull request as ready for review January 30, 2025 17:32
@Halbaroth
Copy link
Collaborator Author

Should be ready for reviewing. I am running some tests to be sure that I did not break anything with variables.

@Halbaroth Halbaroth marked this pull request as draft February 4, 2025 15:02
@Halbaroth Halbaroth force-pushed the use-term-cst-in-mdl branch from 46f86ef to 32a687f Compare February 4, 2025 15:46
This PR changes the way we construct identifiers for names (in Symbols) and
variables (in Var).

We plan to completely replace the AE identifiers with identifiers from Dolmen.
This PR serves as an intermediate step from string-based identifiers to Dolmen
identifiers.

- All the identifiers are now managed in the Id module.
- Constant terms are used for declared or defined identifiers in the input file.
- Internal identifiers are still pre-mangled strings. Replacing them with
  Dolmen identifiers requires careful investigation, particularly in the AC(X)
  implementation.
- I haven't removed Id.typed yet. After this PR, the user identifiers will be
  typed, so we do not need to store them separately. If we use the internal
  type of the identifier, we will need to implement a conversion function from
  Dolmen type to Alt-Ergo type, which is not trivial. I prefer keeping this
  type and remove it after transitioning from Alt-Ergo types to Dolmen types.
- The new API of Id prevents us from creating duplicate fresh/fresh_ac/abstract
  identifiers.
- I still check that user identifiers do not start with . or @. In Before this
  PR, we mangled the identifier if it happened. In this PR, we crash. I don't
  have a strong opinion on this behavior, it was just simpler to implement
  this way.
- I use identifiers for term variables too. I preserved the internal
  representation because:
  1. The notion of local variables is AE-specific, whereas the local name in
     Dolmen identifiers is a distinct concept.
  2. After this PR, variables are typed but what is the type of Underscore?
     May be forall a. a?

This PR have been tested on ae-format (+2-2).
Comment on lines 117 to 122
Dolmen.Std.Expr.Term.Const.hash tcst
| Hstring { hs; _ } ->
(* NB: Internal identifiers are pre-mangled, which means that we do not
need to take the name space into consideration when hashing. *)
Hstring.hash hs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably a lot of collisions here because both Dolmen and Alt-Ergo use increasing integers starting from 0 for Hstring and Id hashes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually reminds me that, even though it's often tempting (and dolmen falls into that trap), using exactly the unique id as a hash is not always a very good choice: it might be good in the case where you expect all (or most) of the values to be in the hastbl, but especially with hashtables that uses hash modulo the table size, there may be some surprising collisions, particularly if there is some pattern to which values are added to the table (e.g. if only values with an even unique id are added to the table, or other similar situations).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it is good when usage is dense (the hash table effectively becomes a dynarray -- e.g. for hash consing) but I'm partial myself to hashing the integer identifier in hash as a default and exposing it so that people that know they are in the dense scenario can use it instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, that could indeed be the source of the slowdown. I will try another hash function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants