You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I cannot currently reproduce this as a bug but I'm very confident some people will stumble on the issue in the long run (a colleague of mine just did but using Airflow 1.10, though I think the issue is still the same in Airflow 2). So I open this issue in order to at least document the subject.
The filesystem sensor uses a glob behaviour, since this PR : #5358
Yet, this sensor can be used without distinction with hooks that refer to a remote FS. Glob does not handle that.
On the one hand the Python documentation states that glob() uses a mix of os.scandir() and fnmatch.fnmatch() which make the code only adapted to a local FS. On the other hand Airflow provides hooks like the SFTPHook which manage a remote FS (not available to "os"), and those hooks are eligible to the sensor via inheritance.
Thus, trying to use a path with a glob pattern and a hook to a remote FS should end in an inconsistent behaviour:
either you're lucky and the glob() will not find the equivalent path locally and just return that the path does not exist (the sensor will never trigger);
or in a worse case scenario you might trigger the sensor for a file that exists locally but not on the remote FS as expected (a false trigger).
In my opinion this could be fixed by two means:
the compatibility should be made available as a function of the hook (hook.hasGlobbing() -> true/false ; false by default in SFTPHook etc., and of course true for a local FS to keep the current behaviour for already existing DAGs) to manage the sensor's behaviour
the sensor's behaviour should be improved to avoid using globs when possible (because they are not 100% portable), by allowing things like startsWith or endsWith path research functions (implemented by a directory listing + lookup, which would be the portable way to do things)
BR. And thanks for the existing code, bug or not :)
This discussion was converted from issue #15069 on January 31, 2022 21:08.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, I cannot currently reproduce this as a bug but I'm very confident some people will stumble on the issue in the long run (a colleague of mine just did but using Airflow 1.10, though I think the issue is still the same in Airflow 2). So I open this issue in order to at least document the subject.
The filesystem sensor uses a glob behaviour, since this PR : #5358
Yet, this sensor can be used without distinction with hooks that refer to a remote FS. Glob does not handle that.
On the one hand the Python documentation states that glob() uses a mix of os.scandir() and fnmatch.fnmatch() which make the code only adapted to a local FS. On the other hand Airflow provides hooks like the SFTPHook which manage a remote FS (not available to "os"), and those hooks are eligible to the sensor via inheritance.
Thus, trying to use a path with a glob pattern and a hook to a remote FS should end in an inconsistent behaviour:
In my opinion this could be fixed by two means:
BR. And thanks for the existing code, bug or not :)
Beta Was this translation helpful? Give feedback.
All reactions