Skip to content

Cache type_object_type() #19514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ilevkivskyi
Copy link
Member

This gives almost 4% performance boost (Python 3.12, compiled). Unfortunately, using this in fine-grained mode is tricky, essentially I have three options:

  • Use some horrible hacks to invalidate cache when needed
  • Add (expensive) class target dependency from __init__/__new__
  • Only allow constructor caching during initial load, but disable it in fine-grained increments

I decided to choose the last option. I think it has the best balance complexity/benefits.

This comment has been minimized.

@ilevkivskyi
Copy link
Member Author

Non-trivial results in mypy_primer are surprising. I know there is a bug in type_object_type() that when it is called before the constructor is processed we may get wrong result (e.g. because it is decorated) instead of deferring the current node. But I thought the way I implemented caching should not be affected by this bug. I will take a look today or tomorrow.

Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

spark (https://github.com/apache/spark)
- python/pyspark/pandas/indexes/multi.py:1180: error: Argument 1 to "_index_fields_for_union_like" of "Index" has incompatible type "DataFrame[Any] | Series[Any] | Index | list[Any]"; expected "Index"  [arg-type]
+ python/pyspark/pandas/indexes/multi.py:1177: error: Redundant cast to "MultiIndex"  [redundant-cast]
+ python/pyspark/pandas/indexes/category.py:249: error: Incompatible return value type (got "Index", expected "CategoricalIndex | None")  [return-value]
+ python/pyspark/pandas/indexes/category.py:273: error: Incompatible return value type (got "Index", expected "CategoricalIndex | None")  [return-value]
+ python/pyspark/pandas/indexes/category.py:295: error: Incompatible return value type (got "Index", expected "CategoricalIndex | None")  [return-value]
+ python/pyspark/pandas/indexes/category.py:340: error: Incompatible return value type (got "Index", expected "CategoricalIndex | None")  [return-value]
+ python/pyspark/pandas/indexes/category.py:370: error: Incompatible return value type (got "Index", expected "CategoricalIndex | None")  [return-value]
+ python/pyspark/pandas/indexes/category.py:431: error: Incompatible return value type (got "Index", expected "CategoricalIndex | None")  [return-value]
+ python/pyspark/pandas/indexes/category.py:484: error: Incompatible return value type (got "Index", expected "CategoricalIndex | None")  [return-value]
+ python/pyspark/pandas/indexes/category.py:558: error: Incompatible return value type (got "Index", expected "CategoricalIndex | None")  [return-value]
+ python/pyspark/pandas/indexes/base.py:2021: error: Invalid index type "int | Any | tuple[Any, ...] | list[int | Any | tuple[Any, ...]]" for "list[Any | tuple[Any, ...]]"; expected type "SupportsIndex"  [index]
+ python/pyspark/pandas/indexes/base.py:2091: error: Argument 1 to "from_tuples" of "MultiIndex" has incompatible type "Index"; expected "list[tuple[Any, ...]]"  [arg-type]

@ilevkivskyi
Copy link
Member Author

Although remaining errors are kind of correct, they are sill weird. It looks like they appear because previously

class C
    @no_type_check
    def __new__(cls): ...

resulted in Any as the type of C class object. But somehow it is now C in case of an import cycle. FWIW I can't reproduce this in a simple test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant