Replies: 3 comments 2 replies
-
I do not think this is a known issue. And likely you should open a "Real issue" for - however it might be that this is simply a problem with the client (or maybe timeout is not propagated as error?). Hard to say - we have few thousands of those operators, but a chance is that if you open an issue somoene will pick it up and look closely. |
Beta Was this translation helpful? Give feedback.
-
FWIW I'd wait / am waiting for #30067 to be merged. I've had several of these weird errors occur as well and until we can update to the latest SDKs I'm just chalking it up to the fact that we are stuck on old versions of the sdk. |
Beta Was this translation helpful? Give feedback.
-
Do you have any solution for this problem? I've encountered the very same issue. Also found that the timestamp when operator marking as success is the same as the last log of getQueryResults in BigQuery logs. The weird tasks is also a multi-statement sql with transaction. Thought it might be some bigquery python sdk bugs with detecting state of multi-statement job with transaction. |
Beta Was this translation helpful? Give feedback.
-
What happened
It is true that the
apache-airflow-providers-google
version is 6.8.0 (apr 2022), and Airflow2.5.02.4.2, so maybe updating the packages could fix the error.The error happened with a long query (~ 3h), in which the task marked as successfully before the task really finished. In the logs we can see both started almost at the same time, but airflow thought it was already finished at 7:17 UTC, but Bigquery didn't really finish until 8:05 UTC. The query was the first step of a transactional query, in which it calculates some st_intersetcs and cross join between 2 tables.
The log from the task in Airflow:
Bigquery Job information
The code used to run the operator:
Because of Airflow saying the data was already processed, the following task failed saying that the table was empty. I only had to wait for it to really finish and clear the failed tasks, but not sure why or when this can happen.
What you think should happen instead
I think the task should run until the task is finished, since in this version, there wasn't a deferrable parameter, but maybe it didn't work quite well, and it got detached at some moment, and thought it was completed because of something I'm not seeing.
How to reproduce
Didn't try to test it because it took too long to run. My idea was to ask if you knew about this or had some theory to test it further, and if not, upgrade
apache-airflow-providers-google
to 10.0.0 and hope it doesn't happen more, or only happens in really long task runs.Operating System
Ubuntu 20.04
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==6.0.0
apache-airflow-providers-cncf-kubernetes==4.4.0
apache-airflow-providers-common-sql==1.2.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-google==6.8.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-postgres==5.2.2
apache-airflow-providers-sendgrid==3.0.0
apache-airflow-providers-slack==6.0.0
apache-airflow-providers-sqlite==3.2.1
Beta Was this translation helpful? Give feedback.
All reactions