-
Notifications
You must be signed in to change notification settings - Fork 3.9k
[python-package] Follow symlinks to lib_lightgbm in library-loading #6977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Follow symlinks to lib_lightgbm in library-loading #6977
Conversation
Thanks for your interest in LightGBM. Before we review this, can youn please explain precisely how you ended up in a situation where the That is unexpected in any of the installatiin methods this project supports. |
Please also update your branch to the latest state of |
Of course, it is a bit odd—it's happening with a customized Databricks setup. Databricks has a several build-in runtimes, e.g. 16.4 LTS, that come with lightgbm built in. When spinning up compute with a built-in runtime and attaching a notebook, everything works fine. For a more customized setup, Databricks supports custom containers that can be used to initialize compute instances with a custom environment. In my docker container, I have lightgbm installed via pixi environment. When running and attaching a Databricks notebook to the custom compute, Databricks sets up symlinks between a) the python environment I am exposing via the container and b) the environment that's used in the notebook UI. So, when executing commands in the notebook UI, it's all via python environment that's symlinked to a docker container environment. All of other python libraries I'm using work fine in this setup. But when trying to use lightgbm, I have to patch a link between the environments, and it's a bit clunky. When looking at the path resolution logic, it seemed like 1) using Happy to provide more details if it's helpful. Thanks for taking a look. |
That is very very weird and unexpected, I don't understand how anything about what you described would require symlinks (it seems like regular docker volume mounts should be fine). But anyway, I'm just complaining about Databricks and its choices... that's not helpful because neither you or I can change that behavior. Thanks for the details. I don't necessarily agree that this change is "a good idea regardless"... it adds a slight bit of complexity to debugging library-loading problems (now we have to consider symlinks). But I think it is probably harmless and I can see how it would help in the type of situation you described. So all that said... I'm happy to consider this! Please do the following:
Let's see if this passes in CI. |
Thanks a ton! (1) done and (2) done! |
@microsoft-github-policy-service agree company="QuantCo, Inc." |
/AzurePipelines run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Seeing all CI pass, and I spot-checked some logs and didn't see any issues.
Based on that + the very clear description you gave, I'm confident in merging this. Thanks for taking the time to contribute, we hope you'll come back and contribute more in the future 😊
Thanks @jameslamb! Really appreciate your helpful feedback and thorough review 😄 |
Motivation
When working with the
LightGBM
python package in an environment referenced via symlinks, the_find_lib_path()
function fails to find theLightGBM
library files.For example, if I'm working in
/path/to/my/environment/my/app/lgbm_code.py
, but a symlink exists fromThen
_find_lib_path()
looks for the LightGBM library files in/path/to/my/environment/*
instead of/path/to/the/real/environment/*
.I've come across this problem when trying to use LightGBM with Databricks' notebook UI while running on a compute created with a custom Docker image.
Changes
Changes the two existing
Path.absolute()
calls toPath.resolve()
References
According the docs,
Path.abosulte()
:And
Path.resolve()