You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SageMakerProcessingJobs have a hard limit of 64 characters for the ProcessingJobName.
In the SageMakerBaseOperator there is a check for uniqueness for the name.
In the case that a name is not unique it adds a timestamp to prevent a potential collision, however there is no check to prevent the updated - from exceeding 64 characters. This causes the creation of the SageMakerProcessingJob to fail.
What you think should happen instead
In the SageMaker Pipelines SDK they truncate the base name before adding the timestamp, therefor we recommend taking a similar approach for consistency purposes.
How to reproduce
Create a SageMaker ProcessingJob using the SageMakerProcessingOperator with a name of longer than 50 characters, and trigger it more than once. On the second time it is triggered the time stamp will be added and in the airflow logs it will show the error stating the sagemaker processing job failed to create due to the name exceeding the character limit
Anything else
Every time that a scheduled run occurs with the same name (after the first run)
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
8.27.0
Apache Airflow version
2.7.2
Operating System
Amazon Linux
Deployment
Amazon (AWS) MWAA
Deployment details
Standard MWAA deployment - V2.7.2
What happened
SageMakerProcessingJobs have a hard limit of 64 characters for the ProcessingJobName.
In the SageMakerBaseOperator there is a check for uniqueness for the name.
In the case that a name is not unique it adds a timestamp to prevent a potential collision, however there is no check to prevent the updated - from exceeding 64 characters. This causes the creation of the SageMakerProcessingJob to fail.
What you think should happen instead
In the SageMaker Pipelines SDK they truncate the base name before adding the timestamp, therefor we recommend taking a similar approach for consistency purposes.
How to reproduce
Create a SageMaker ProcessingJob using the SageMakerProcessingOperator with a name of longer than 50 characters, and trigger it more than once. On the second time it is triggered the time stamp will be added and in the airflow logs it will show the error stating the sagemaker processing job failed to create due to the name exceeding the character limit
Anything else
Every time that a scheduled run occurs with the same name (after the first run)
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: