Skip to content

Conversation

julianpollmann
Copy link
Collaborator

@julianpollmann julianpollmann commented Jun 10, 2025

Hey @gojomo @hechth @piskvorky @mpenkov,
I've created a new PR for the Migration to numPy 2.0 and the removal of deprecated scipy functions. What has been done so far:

  • Updated dtypes for functions where tests were failing
  • Fixed some numpy assertions alltrue -> all
  • Python 3.8 has been removed from the supported versions
  • Fixed deprecated scipy import for csc_matrix
  • Fixed different behaviour for argsort, where multiple results on the same level were sorted differently

What needs to be done:

Hope we get this done now 💪

@piskvorky
Copy link
Owner

piskvorky commented Jul 24, 2025

@julianpollmann can you please merge develop into this PR & see what else is missing before merge?

Also, since we are upgrading numpy, maybe update cython as well? As per the report in #3611.

Inside #3611 I also see a recent link to this (claude-authored) commit: flext-sh@8c418b7 . Is there anything in there that might help with this PR?

pyproject.toml Outdated
# If we build our extensions with Cython 3.0.0, then they will be an
# order of magnitude slower, so avoid it for now.
#
"Cython>=0.29.32,<3.0.0",
Copy link
Owner

@piskvorky piskvorky Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 3.1.2 any better regarding speed (see the comment above this line)?

If so let's change this line to 3.1.2 plus update that comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky I removed the upper version bound and tested with 3.1.2. However builds/tests take ages. It seems that Cython >=3 has stricter type checking, making it slower.

Copy link
Owner

@piskvorky piskvorky Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it's type checking that takes ages. Maybe something (a dependency?) gets built from source, whereas it used to be pulled pre-built before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky I tried several things, unfortunately no luck. Especially Testing of test_doc2vec.py::TestDoc2VecModel seems to be much slower on Cython 3.1.2.
With Upgrading to 3.1.2 Python 3.13 seems to work however?!

@julianpollmann
Copy link
Collaborator Author

@julianpollmann can you please merge develop into this PR & see what else is missing before merge?

Also, since we are upgrading numpy, maybe update cython as well? As per the report in #3611.

Inside #3611 I also see a recent link to this (claude-authored) commit: flext-sh@8c418b7 . Is there anything in there that might help with this PR?

I'll look into that tomorrow!

@morotti
Copy link
Contributor

morotti commented Jul 30, 2025

With Upgrading to 3.1.2 Python 3.13 seems to work however?!

if it's of any help. support for python 3.13 was only added in cython 3.1.x
so you have to update cython to support newer python versions, but old python versions could continue to use an old cython if you want.
https://cython.readthedocs.io/en/latest/src/changes.html

If you need, I think there is an old syntax for setup.py/setup.cfg to specify different versions.

 requires = [
    'Cython>=0.29.32,<3.0.0; python_version<"3.13"',
    'Cython>=3.1.2; python_version>="3.13"',

@julianpollmann
Copy link
Collaborator Author

With Upgrading to 3.1.2 Python 3.13 seems to work however?!

if it's of any help. support for python 3.13 was only added in cython 3.1.x so you have to update cython to support newer python versions, but old python versions could continue to use an old cython if you want. https://cython.readthedocs.io/en/latest/src/changes.html

If you need, I think there is an old syntax for setup.py/setup.cfg to specify different versions.

 requires = [
    'Cython>=0.29.32,<3.0.0; python_version<"3.13"',
    'Cython>=3.1.2; python_version>="3.13"',

Hey @morotti @piskvorky @gojomo,
I updated the Cython Version accordingly to >=3 on Python 3.13. With this Tests on Python 3.13 will pass, however very slow (~45min on my machine) and this feels more like a workaround.
Timeout for the CI might need to be adapted.

I don't have enough experience of Cython to judge, but profiling e.g. test_word2vec.py showed that train_batch_cbow and train_batch_sg are very slow (16-20sec). I suspect some kind of type conversion thing, but don't know.

Also, when compiling by hand I get lot of messages like:

performance hint: gensim\models\word2vec_inner.pyx:246:0: Exception check on 'w2v_fast_sentence_cbow_hs' will always require the GIL to be acquired.
Possible solutions:
        1. Declare 'w2v_fast_sentence_cbow_hs' as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.
        2. Use an 'int' return type on 'w2v_fast_sentence_cbow_hs' to allow an error code to be returned.

Hope somebody can figure out?!

@piskvorky
Copy link
Owner

piskvorky commented Aug 1, 2025

That's an unusually lucid and helpful warning message / hint! Thanks for spotting it.

Can you please try adding that noexcept and see if it makes any difference?

@julianpollmann
Copy link
Collaborator Author

julianpollmann commented Aug 2, 2025

That's an unusually lucid and helpful warning message / hint! Thanks for spotting it.

Can you please try adding that noexcept and see if it makes any difference?

@piskvorky It seems like Cython 3 will check for exceptions and therefore use the GIL. After adding noexcept there is a speedup. However I cannot judge, if noexcept is appropriate for all nogil methods. Would be good, if somebody with more Cython experience can have a look.

With this tests for Python 3.13 are passing 😀

P.S.: I'll be N/A till Aug 18th, so cannot make any changes to this PR.

@julianpollmann
Copy link
Collaborator Author

Hey @piskvorky, could you rerun failed jobs from the CI workflows? There are some http errors, which should disappear.
I updated to Cython 3, since noexcept did speed up the builds and this specifying different Versions would cause the CI build to fail.
I also disabled Python 3.8 and 3.14 builds.
With that everything should work.

@piskvorky
Copy link
Owner

piskvorky commented Aug 27, 2025

Sure. Re-running now under https://github.com/piskvorky/gensim/actions/runs/17163546119?pr=3615

@julianpollmann
Copy link
Collaborator Author

Sure. Re-running now under https://github.com/piskvorky/gensim/actions/runs/17163546119?pr=3615

@piskvorky thanks. Looks like builds should pass. There is one issue with test_parallel, which I encountered only on the CI:
self = <gensim.test.test_word2vec.TestWord2VecModel testMethod=test_parallel>

  def test_parallel(self):
      """Test word2vec parallel training."""
      corpus = utils.RepeatCorpus(LeeCorpus(), 10000)  # repeats about 33 times
  
      for workers in [4, ]:  # [4, 2]
          model = word2vec.Word2Vec(corpus, vector_size=16, min_count=(10 * 33), workers=workers)
          origin_word = 'israeli'
          expected_neighbor = 'palestinian'
          sims = model.wv.most_similar(origin_word, topn=len(model.wv))
          # the exact vectors and therefore similarities may differ, due to different thread collisions/randomization
          # so let's test only for topN
          neighbor_rank = [word for word, sim in sims].index(expected_neighbor)
         self.assertLess(neighbor_rank, 6)
  E     AssertionError: 7 not less than 6

Rerunning the failed (macOS) job should fix this. Then Tests should pass.

@piskvorky
Copy link
Owner

piskvorky commented Aug 28, 2025

@julianpollmann re-run #3 still says "failed".

Is there a way for you to be able to (re)run these tests yourself? I added you as a "collaborator" now, please let me know if that helps!

@julianpollmann
Copy link
Collaborator Author

@julianpollmann re-run #3 still says "failed".

Is there a way for you to be able to (re)run these tests yourself? I added you as a "collaborator" now, please let me know if that helps!

Thanks will try this evening!

@julianpollmann
Copy link
Collaborator Author

@piskvorky Builds and Tests are passing. I guess this can be merged.

@piskvorky
Copy link
Owner

piskvorky commented Aug 29, 2025

Thanks a lot!

@mpenkov @gojomo please review. Then we can merge whenever ready, as other PRs depend on this one.

@ecederstrand
Copy link

Any update on this PR? It would be really great to gain support for both numpy2 and Python 3.13 in gensim.

Copy link
Collaborator

@mpenkov mpenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

@mpenkov
Copy link
Collaborator

mpenkov commented Oct 2, 2025

@piskvorky @gojomo If there are no objections, let's merge and release

@piskvorky
Copy link
Owner

I already OKed above; please merge & let's release!

Thanks to everyone, I know this has been in the oven for a very long time.

@julianpollmann julianpollmann merged commit 8f81545 into piskvorky:develop Oct 2, 2025
52 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants