Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3_object fails to copy in AWS when source is larger than 5GiB #2117

Closed
1 task done
colin-nolan opened this issue May 29, 2024 · 3 comments · Fixed by #2509
Closed
1 task done

s3_object fails to copy in AWS when source is larger than 5GiB #2117

colin-nolan opened this issue May 29, 2024 · 3 comments · Fixed by #2509
Assignees
Labels
bug This issue/PR relates to a bug

Comments

@colin-nolan
Copy link

colin-nolan commented May 29, 2024

Summary

amazon.aws.s3_object fails to copy files within AWS when they are larger than 5GiB. The use-case where we encountered this issue was when copying between buckets (mode: copy with copy_src set) - but it likely effects all copy usage.

I'd guess the switch to a multi-part upload strategy is required for files over 5GiB.

Issue Type

Bug Report

Component Name

s3_object

Ansible Version

$ ansible --version
ansible [core 2.16.7]
  config file = <redacted>/ansible/ansible.cfg
  configured module search path = ['<redacted>/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = <redacted>/ansible/.venv/lib/python3.12/site-packages/ansible
  ansible collection location = <redacted>/.ansible/collections:/usr/share/ansible/collections
  executable location = <redacted>/ansible/.venv/bin/ansible
  python version = 3.12.3 (main, Apr  9 2024, 08:09:14) [Clang 15.0.0 (clang-1500.3.9.4)] (<redacted>/ansible/.venv/bin/python)
  jinja version = 3.1.4
  libyaml = True

Collection Versions

$ ansible-galaxy collection list
# <redacted>/.ansible/collections/ansible_collections
Collection                               Version
---------------------------------------- -------
amazon.aws                               8.0.0  

# <redacted>/ansible/.venv/lib/python3.12/site-packages/ansible_collections
Collection                               Version
---------------------------------------- -------
amazon.aws                               7.6.0  
ansible.netcommon                        5.3.0  
ansible.posix                            1.5.4  
ansible.utils                            2.12.0 
ansible.windows                          2.3.0  
arista.eos                               6.2.2  
awx.awx                                  23.9.0 
azure.azcollection                       1.19.0 
check_point.mgmt                         5.2.3  
chocolatey.chocolatey                    1.5.1  
cisco.aci                                2.9.0  
cisco.asa                                4.0.3  
cisco.dnac                               6.13.3 
cisco.intersight                         2.0.9  
cisco.ios                                5.3.0  
cisco.iosxr                              6.1.1  
cisco.ise                                2.9.1  
cisco.meraki                             2.18.1 
cisco.mso                                2.6.0  
cisco.nxos                               5.3.0  
cisco.ucs                                1.10.0 
cloud.common                             2.1.4  
cloudscale_ch.cloud                      2.3.1  
community.aws                            7.2.0  
community.azure                          2.0.0  
community.ciscosmb                       1.0.9  
community.crypto                         2.20.0 
community.digitalocean                   1.26.0 
community.dns                            2.9.1  
community.docker                         3.10.1 
community.general                        8.6.1  
community.grafana                        1.8.0  
community.hashi_vault                    6.2.0  
community.hrobot                         1.9.2  
community.library_inventory_filtering_v1 1.0.1  
community.libvirt                        1.3.0  
community.mongodb                        1.7.4  
community.mysql                          3.9.0  
community.network                        5.0.2  
community.okd                            2.3.0  
community.postgresql                     3.4.1  
community.proxysql                       1.5.1  
community.rabbitmq                       1.3.0  
community.routeros                       2.15.0 
community.sap                            2.0.0  
community.sap_libs                       1.4.2  
community.sops                           1.6.7  
community.vmware                         4.4.0  
community.windows                        2.2.0  
community.zabbix                         2.4.0  
containers.podman                        1.13.0 
cyberark.conjur                          1.2.2  
cyberark.pas                             1.0.25 
dellemc.enterprise_sonic                 2.4.0  
dellemc.openmanage                       8.7.0  
dellemc.powerflex                        2.4.0  
dellemc.unity                            1.7.1  
f5networks.f5_modules                    1.28.0 
fortinet.fortimanager                    2.5.0  
fortinet.fortios                         2.3.6  
frr.frr                                  2.0.2  
gluster.gluster                          1.0.2  
google.cloud                             1.3.0  
grafana.grafana                          2.2.5  
hetzner.hcloud                           2.5.0  
hpe.nimble                               1.1.4  
ibm.qradar                               2.1.0  
ibm.spectrum_virtualize                  2.0.0  
ibm.storage_virtualize                   2.3.1  
infinidat.infinibox                      1.4.5  
infoblox.nios_modules                    1.6.1  
inspur.ispim                             2.2.1  
inspur.sm                                2.3.0  
junipernetworks.junos                    5.3.1  
kaytus.ksmanage                          1.2.1  
kubernetes.core                          2.4.2  
lowlydba.sqlserver                       2.3.2  
microsoft.ad                             1.5.0  
netapp.aws                               21.7.1 
netapp.azure                             21.10.1
netapp.cloudmanager                      21.22.1
netapp.elementsw                         21.7.0 
netapp.ontap                             22.11.0
netapp.storagegrid                       21.12.0
netapp.um_info                           21.8.1 
netapp_eseries.santricity                1.4.0  
netbox.netbox                            3.18.0 
ngine_io.cloudstack                      2.3.0  
ngine_io.exoscale                        1.1.0  
openstack.cloud                          2.2.0  
openvswitch.openvswitch                  2.1.1  
ovirt.ovirt                              3.2.0  
purestorage.flasharray                   1.28.0 
purestorage.flashblade                   1.17.0 
purestorage.fusion                       1.6.1  
sensu.sensu_go                           1.14.0 
splunk.es                                2.1.2  
t_systems_mms.icinga_director            2.0.1  
telekom_mms.icinga_director              1.35.0 
theforeman.foreman                       3.15.0 
vmware.vmware_rest                       2.3.1  
vultr.cloud                              1.12.1 
vyos.vyos                                4.1.0  
wti.remote                               1.0.5  

AWS SDK versions

$ pip show boto boto3 botocore
WARNING: Package(s) not found: boto
Name: boto3
Version: 1.34.99
Summary: The AWS SDK for Python
Home-page: https://github.com/boto/boto3
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: <redacted>/ansible/.venv/lib/python3.12/site-packages
Requires: botocore, jmespath, s3transfer
Required-by: 
---
Name: botocore
Version: 1.34.99
Summary: Low-level, data-driven core of boto 3.
Home-page: https://github.com/boto/botocore
Author: Amazon Web Services
Author-email: 
License: Apache License 2.0
Location: <redacted>/ansible/.venv/lib/python3.12/site-packages
Requires: jmespath, python-dateutil, urllib3
Required-by: boto3, s3transfer

Configuration

$ ansible-config dump --only-changed
CONFIG_FILE() = <redacted>/ansible/ansible.cfg
DEFAULT_INVENTORY_PLUGIN_PATH(<redacted>/ansible/ansible.cfg) = ['<redacted>/ansible/plugins/inventory']
DUPLICATE_YAML_DICT_KEY(<redacted>/ansible/ansible.cfg) = ignore
INVENTORY_IGNORE_EXTS(<redacted>/ansible/ansible.cfg) = ["{{(REJECT_EXTS + ('.orig'", '.cfg', "'.retry'))}}"]
INVENTORY_UNPARSED_IS_FAILED(<redacted>/ansible/ansible.cfg) = True

OS / Environment

N/A

Steps to Reproduce

- amazon.aws.s3_object:
    bucket: bucket-wanting-big-file
    mode: copy
    copy_src:
      bucket: bucket-with-big-file

Expected Results

Expected to copy any files over 5GiB to the destination bucket in an idempotent manor.

Actual Results

Task failure, resulting in the traceback:

The full traceback is:
Traceback (most recent call last):
  File "/var/folders/hh/s685n1156js1ll921mgz3b8r0000gn/T/ansible_amazon.aws.s3_object_payload_p8li_bbd/ansible_amazon.aws.s3_object_payload.zip/ansible_collections/amazon/aws/plugins/modules/s3_object.py", line 1320, in copy_object_to_bucket
    s3.copy_object(aws_retry=True, **params)
  File "/var/folders/hh/s685n1156js1ll921mgz3b8r0000gn/T/ansible_amazon.aws.s3_object_payload_p8li_bbd/ansible_amazon.aws.s3_object_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/retries.py", line 105, in deciding_wrapper
    return retrying_wrapper(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/folders/hh/s685n1156js1ll921mgz3b8r0000gn/T/ansible_amazon.aws.s3_object_payload_p8li_bbd/ansible_amazon.aws.s3_object_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/cloud.py", line 119, in _retry_wrapper
    return _retry_func(
           ^^^^^^^^^^^^
  File "/var/folders/hh/s685n1156js1ll921mgz3b8r0000gn/T/ansible_amazon.aws.s3_object_payload_p8li_bbd/ansible_amazon.aws.s3_object_payload.zip/ansible_collections/amazon/aws/plugins/module_utils/cloud.py", line 68, in _retry_func
    return func()
           ^^^^^^
  File "<redacted>/ansible/.venv/lib/python3.12/site-packages/botocore/client.py", line 565, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<redacted>/ansible/.venv/lib/python3.12/site-packages/botocore/client.py", line 1021, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidRequest) when calling the CopyObject operation: The specified copy source is larger than the maximum allowable size for a copy source: 5368709120

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/folders/hh/s685n1156js1ll921mgz3b8r0000gn/T/ansible_amazon.aws.s3_object_payload_p8li_bbd/ansible_amazon.aws.s3_object_payload.zip/ansible_collections/amazon/aws/plugins/modules/s3_object.py", line 1579, in main
    func(module, s3, s3_v4, s3_object_params)
  File "/var/folders/hh/s685n1156js1ll921mgz3b8r0000gn/T/ansible_amazon.aws.s3_object_payload_p8li_bbd/ansible_amazon.aws.s3_object_payload.zip/ansible_collections/amazon/aws/plugins/modules/s3_object.py", line 1354, in s3_object_do_copy
    updated, result = copy_object_to_bucket(
                      ^^^^^^^^^^^^^^^^^^^^^^
  File "/var/folders/hh/s685n1156js1ll921mgz3b8r0000gn/T/ansible_amazon.aws.s3_object_payload_p8li_bbd/ansible_amazon.aws.s3_object_payload.zip/ansible_collections/amazon/aws/plugins/modules/s3_object.py", line 1331, in copy_object_to_bucket
    raise S3ObjectFailure(
S3ObjectFailure: Failed while copying object 7G.bin from bucket None.
fatal: [staging]: FAILED! => {
    "boto3_version": "1.34.99",
    "botocore_version": "1.34.99",
    "changed": false,
    "error": {
        "code": "InvalidRequest",
        "message": "The specified copy source is larger than the maximum allowable size for a copy source: 5368709120"
    },
    "invocation": {
        "module_args": {
            "access_key": "<redacted>",
            "aws_ca_bundle": null,
            "aws_config": null,
            "bucket": "<redacted>",
            "ceph": false,
            "content": null,
            "content_base64": null,
            "copy_src": {
                "bucket": "<redacted>",
                "object": null,
                "prefix": "",
                "version_id": null
            },
            "debug_botocore_endpoint_logs": false,
            "dest": null,
            "dualstack": false,
            "encrypt": true,
            "encryption_kms_key_id": null,
            "encryption_mode": "AES256",
            "endpoint_url": null,
            "expiry": 600,
            "headers": null,
            "ignore_nonexistent_bucket": false,
            "marker": "",
            "max_keys": 1000,
            "metadata": null,
            "mode": "copy",
            "object": null,
            "overwrite": "different",
            "permission": [],
            "prefix": "",
            "profile": null,
            "purge_tags": true,
            "region": null,
            "retries": 0,
            "secret_key": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "session_token": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "sig_v4": true,
            "src": null,
            "tags": null,
            "validate_bucket_name": true,
            "validate_certs": true,
            "version": null
        }
    },
    "msg": "Failed while copying object 7G.bin from bucket None.: An error occurred (InvalidRequest) when calling the CopyObject operation: The specified copy source is larger than the maximum allowable size for a copy source: 5368709120",
    "response_metadata": {
        "host_id": "<redacted>",
        "http_headers": {
            "connection": "close",
            "content-type": "application/xml",
            "date": "Wed, 29 May 2024 12:59:34 GMT",
            "server": "AmazonS3",
            "transfer-encoding": "chunked",
            "x-amz-id-2": "<redacted>",
            "x-amz-request-id": "<redacted>"
        },
        "http_status_code": 400,
        "request_id": "<redacted>",
        "retry_attempts": 0
    }
}

Code of Conduct

  • I agree to follow the Ansible Code of Conduct
@colin-nolan colin-nolan changed the title s3_object fails to copy in AWS when source is larger than 5GiB s3_object fails to copy in AWS when source is larger than 5GiB May 29, 2024
@gravesm gravesm added bug This issue/PR relates to a bug and removed needs_triage labels Jun 4, 2024
@alinabuzachis alinabuzachis self-assigned this Jun 26, 2024
@colin-nolan
Copy link
Author

@alinabuzachis many thanks for assigning on this one. I absolutely understand that you undoubtedly have a lot to do - but I just wondered if you could give an indication on whether/when this might sit on your roadmap?

Ideally, we would push a fix up from our side but we're not currently in a great position to do this. I'm trying to determine whether we should invest in a temp work-around, or just put up with manually syncing some of our larger data until a fix is in place.

Thanks again.

@fuggla
Copy link

fuggla commented Feb 4, 2025

I had the same problem on 7.6.0. However, everything works as expected after updating to 9.1.1!

@tremble
Copy link
Contributor

tremble commented Feb 4, 2025

@fuggla There was a separate bug that could trigger out of memory errors (#2107), which was fixed in 8.0.1.

@colin-nolan I've pushed #2509 which should address this issue.

The underlying problem for this issue is that the API doesn't allow server-side copies for 5G+ files, this can only be fixed by using download/upload instead of copy_object. #2509 switches over to the boto3's S3 "copy" method which switches between the two mechanisms depending on the size of the file. If you're able to verify this fixes your issue it would be appreciated.

patchback bot pushed a commit that referenced this issue Feb 4, 2025
SUMMARY
fixes: #2117
The copy_object API call has a built-in limit of 5G when copying objects.  The copy method is aware of this limit and performs a multipart download/upload instead when the 5G limit has been exceeded.
ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME
s3_object
ADDITIONAL INFORMATION
See also: boto/boto3#1715

Reviewed-by: Bikouo Aubin
(cherry picked from commit 21306fd)
softwarefactory-project-zuul bot pushed a commit that referenced this issue Feb 4, 2025
This is a backport of PR #2509 as merged into main (21306fd).
SUMMARY
fixes: #2117
The copy_object API call has a built-in limit of 5G when copying objects.  The copy method is aware of this limit and performs a multipart download/upload instead when the 5G limit has been exceeded.
ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME
s3_object
ADDITIONAL INFORMATION
See also: boto/boto3#1715

Reviewed-by: Mark Chappell
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue/PR relates to a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants