Skip to content

DataJoint import error due to missing pyarrow (a pandas dependency) #1202

Open
@ttngu207

Description

@ttngu207

Bug Report

Description

A fresh datajoint installation on python 3.10 is successful
However, upon import (import datajoint as dj), the following error is raised

Traceback (most recent call last):
  File "C:\Users\thinh\.conda\envs\microns_phase3\lib\site-packages\IPython\core\interactiveshell.py", line 3579, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-0b6eed5a3415>", line 1, in <module>
    import datajoint
  File "C:\Program Files\JetBrains\PyCharm 2023.3.4\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\thinh\.conda\envs\microns_phase3\lib\site-packages\datajoint\__init__.py", line 62, in <module>
    from .schemas import Schema
  File "C:\Program Files\JetBrains\PyCharm 2023.3.4\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\thinh\.conda\envs\microns_phase3\lib\site-packages\datajoint\schemas.py", line 10, in <module>
    from .jobs import JobTable
  File "C:\Program Files\JetBrains\PyCharm 2023.3.4\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\thinh\.conda\envs\microns_phase3\lib\site-packages\datajoint\jobs.py", line 4, in <module>
    from .table import Table
  File "C:\Program Files\JetBrains\PyCharm 2023.3.4\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\thinh\.conda\envs\microns_phase3\lib\site-packages\datajoint\table.py", line 6, in <module>
    import pandas
  File "C:\Program Files\JetBrains\PyCharm 2023.3.4\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\thinh\.conda\envs\microns_phase3\lib\site-packages\pandas\__init__.py", line 39, in <module>
    from pandas.compat import (
  File "C:\Program Files\JetBrains\PyCharm 2023.3.4\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\thinh\.conda\envs\microns_phase3\lib\site-packages\pandas\compat\__init__.py", line 27, in <module>
    from pandas.compat.pyarrow import (
  File "C:\Program Files\JetBrains\PyCharm 2023.3.4\plugins\python\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "C:\Users\thinh\.conda\envs\microns_phase3\lib\site-packages\pandas\compat\pyarrow.py", line 10, in <module>
    _palv = Version(Version(pa.__version__).base_version)
AttributeError: module 'pyarrow' has no attribute '__version__'

Upon further investigation, it looks like pandas>2.2 requires pyarrow as its dependency, however, pyarrow is not explicitly specified as a requirement for pandas (for good reasons, lots of things to consider, see this discussion), thus not installed when pandas is installed.

For datajoint, we can either

  1. pin pandas<2
  2. install pandas[pyarrow]
  3. set pyarrow as one of the dependency in pyproject.toml

Reproducibility

Include:

  • OS (WIN)
  • Python Version: 3.10
  • DataJoint Version: 0.14.3

Metadata

Metadata

Labels

bugIndicates an unexpected problem or unintended behavior

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions