-
Notifications
You must be signed in to change notification settings - Fork 535
Add type hints #611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add type hints #611
Conversation
eliben
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of initial questions
In general, I'd really prefer someone with Python type checking experience to take a careful look at this
eliben
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question and general comment: would it be possible to split this PR to multiple? Even if just for the sake of review - could we start with a small-medium PR with a representative set of changes? It's OK if the intermediate steps don't fully type check until everything has landed
Yes, I can do that. Any advise on how to split best? Internel / low-level / high-level? |
Yes, by layers could be a great way to slice it. Starting at the lowest possible and then progressing upwards |
|
Where do we stand with this PR? How much of it has already been added? |
Sorry for the delay, I'm currently busy otherwise. Hope to find some time next weekend. |
|
What's the status here? |
b8d0cab to
c5f4dc8
Compare
The bad news: Sadly I'm busy otherwise .
I have been experimenting with typeguard, which turns those type-hints into runtime checks. This allows validating those hints when running the test-suite. Sadly that drastically increases the runtime and – even more sad – shows several errors in my type annotations. |
Timmmm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. It would be good to split into smaller PRs - maybe you could do one with all the basic str, int, None types that are pretty obvious and then a second one with everything else?
On the other hand that's super tedious to do and nobody has time for that. If this were my project I think I would just merge it as-is and improve it in future PRs.
If this passes Pyright that's already a huge achievement and improvement.
Signed-off-by: Philipp Hahn <[email protected]>
Convert from legacy `collections.namedtuple` to `typing.NamedTuple` to allow adding type hints. Signed-off-by: Philipp Hahn <[email protected]>
`pyelftools` does not use class hierarchies in all cases, but relies on _duck typing_. Introduce Protocols for those cases to allow type hinting. Signed-off-by: Philipp Hahn <[email protected]>
- [x] `pyright` - [x] `mypy` with several issues: > Cannot determine type of "dwarfinfo" > Cannot determine type of "structs" Signed-off-by: Philipp Hahn <[email protected]>
Several classes inheriting from `Construct`, which only has a generic `__getattr__(self, name: str) -> Any`. To improve typing explicitly name several attributes with their type. For `name` we know that it (almost) never will be `None` – an unnamed entity does not make much sense. Overwrite the type-hint `str | None` inherited from `Construct` with just `str` which saves us from a ton if `name is not None` checks. Signed-off-by: Philipp Hahn <[email protected]>
Several variable may have a `Union` type. Explicitly assert that those variables have the expected type. Signed-off-by: Philipp Hahn <[email protected]>
Several variable may have an `Optional` type and can be `None`. Explicitly assert that those variables are `not None`. Signed-off-by: Philipp Hahn <[email protected]>
Declare `lsda_pointer` and `aug_dict` and `last_line_in_CIE` before the conditional. Static type checking will complain otherwise that variables might not be declared on the `else` case. Signed-off-by: Philipp Hahn <[email protected]>
`Dwarf_dw_form` contains `None` values, but type checkers fail to do a static lookup to see, that they are constant and `not None`. Signed-off-by: Philipp Hahn <[email protected]>
`die.tag` can be an `int` for "user-defined tags". Invert the type check to silence type checking. Signed-off-by: Philipp Hahn <[email protected]>
`describe_reg_name()` may return `None`, but `str + None` raises a `TypeError`. Signed-off-by: Philipp Hahn <[email protected]>
`DIE._terminator` may be `None` and `search._terminator.offset` would raise a `AttributeError`. Signed-off-by: Philipp Hahn <[email protected]>
Declaring constants as `list` is bad as `list` is modifiable. Declare them as `Final[tuple]` instead to help type-checkers de-reference individual entries for checking and using their correct type `dict` or `None`. Signed-off-by: Philipp Hahn <[email protected]>
Improve type hinting - at least for `t` and `sz` - `data` remains `Any`. Signed-off-by: Philipp Hahn <[email protected]>
`section_names` is only used internally as a `list`, as section names might get modified and the list is appended. Signed-off-by: Philipp Hahn <[email protected]>
`ENUM_D_TAG` is a constant, but built by code. Convert it into a dict-comprehension. Signed-off-by: Philipp Hahn <[email protected]>
`get_symbol()` may return `None` and `symbol.name` would raise an `AttributeError`. Check for `not None` explicitly. Signed-off-by: Philipp Hahn <[email protected]>
`entry_translate` may contain `None`. Silence type checking. Signed-off-by: Philipp Hahn <[email protected]>
An unknown `ST_SHNDC` `str` will raise `ValueError`. Signed-off-by: Philipp Hahn <[email protected]>
`get_table_offset()` may return `tuple[…, None]`. Signed-off-by: Philipp Hahn <[email protected]>
`entry_translate` may contain `None` in the DWARF-5-case. Silence type checking. Signed-off-by: Philipp Hahn <[email protected]>
Code explicitly checks for `AttributeError`, but `mypy` is picky here. Signed-off-by: Philipp Hahn <[email protected]>
`params` is first declared as a `tuple` and then changed to `str`, which static type checkers like `mypy` do not like. Rename the first variable. Signed-off-by: Philipp Hahn <[email protected]>
`reveal_type` is first declared as `TypeDesc` and then changed to `str`, which static type checkers like `mypy` do not like. Rename the first variable. PS: Better rename `reveal_type()` to somethings else as there is `typing.reveal_type()`[^1]. [^1]: https://docs.python.org/3/library/typing.html#typing.reveal_type Signed-off-by: Philipp Hahn <[email protected]>
`all_offsets` is first declared as a `set` and then changed to `list`, which static type checkers like `mypy` do not like. Rename the first variable. Signed-off-by: Philipp Hahn <[email protected]>
`get_parent()` may return `None`, in wich case `parent.tag` would raise an `AttributeError`. Make the check explicit for type checking. Signed-off-by: Philipp Hahn <[email protected]>
Raise `ValueError` in all `else` cases. Signed-off-by: Philipp Hahn <[email protected]>
Combine the `if` statements to help type checking: Otherwise the 2nd DWARF-5-case need another case to check for `die not None`. FYI: This is a behavioral change as the file position gets changed in the error case. Signed-off-by: Philipp Hahn <[email protected]>
`dict.get() -> … | None` Signed-off-by: Philipp Hahn <[email protected]>
Also store the reference to `RelocationTable` in a local variable to help type checkers using the correct type when checking `entry_size`. Signed-off-by: Philipp Hahn <[email protected]>
`Dynamic` expects an instance following the protocol `_StringTable`, but `get_section()` just returns a `Section`. Signed-off-by: Philipp Hahn <[email protected]>
Explicitly return `None` to silence `mypy`. Signed-off-by: Philipp Hahn <[email protected]>
Also store the reference to `Container` in a local variable to help type checkers using the correct type when checking `bloom_size` and `nbuckets`. Signed-off-by: Philipp Hahn <[email protected]>
`Attribute` and its related classes `Attribute(Sub)*Section` have two concrete sub-classes for ARM and RISCV with *different* constructors. This confuses type checkers like `mypy` and `pyright` and humans like me, as this is not type safe: the `AttributeSubsubsection` classes are factory methods for `Attributes`, but their actual signature differs from the prototype. Basically you have to tell type-checkers: Expect a class with this signature, but you will get back an instance of another class having a different signature for `__init__()`. After having tried `Protocols` and `TypeVars` I gave up and re-implemented all 3*4 classes to have sane constructors: - The concrete sub-classes for ARM and RISCV sections only set different class variables; no extra code needed. - For `…Attributes` there's a new class method as `structs` must be handled at runtime. Signed-off-by: Philipp Hahn <[email protected]>
Signed-off-by: Philipp Hahn <[email protected]>
|
I have a hunch this will never see the light of day. Would it be feasible to scale back the mission - type-annotate the user facing part of the API and mark the private stuff as off limits for the type checker? |
|
Yeah this should just be merged IMO. It's pretty much impossible to take an untyped Python project and add correct type hints to it all in one go. Once this is merged you can gradually fix the errors until it all type checks, and then enable type checking with Pyright in CI (or Pyrefly/Ty maybe by the time that actually happens!). |
This is the mayor PR to add Python type hints #514 – without #609 this will not be complete as
elftools.construct.Containeris used in many places, which is a container forAnything: retrieving values from it will be typedAny, which basically means untyped: without manually type-hinting every such use case those values do propagate further and even spill into the public API.Because of missing 6f99ce0 running
mypywill find the following errors:Similar missing db4fb21 is responsible for
These I do not know how to fix - their type is not static and depends dynamically on the opened ELF file:
And finally the last group of issues, which are also caused by missing cc7b1ea:
Please have a 1st look.
Then we can decide on how to proceed, e.g. just merge it or try to extract a subset for only some public API files.