Skip to content

Discrepancy between playback and actual headers when dealing with binary data  #194

@kfcaio

Description

@kfcaio

@sigmavirus24 I wrote a test for one function that downloads a large zip file using requests module. I've found discrepancy in Content-Length when comparing test execution with betamax and without it. Using Betamax, the length of the binary string extracted is way larger. Besides that, I need to pass that binary string to BytesIO and then to zipfile.ZipFile, but got zipfile.BadZipFile: Bad magic number for central directory exception.

My test setup:

import betamax
from betamax.fixtures import unittest
import os


mode = os.getenv('BETAMAX_RECORD_MODE')
with betamax.Betamax.configure() as config:
    config.cassette_library_dir = 'tests/test_funcs/cassettes'
    config.default_cassette_options['record_mode'] = mode
    print(f'Using record mode <{mode}>')


def the_function(session):
    # session = requests.Session()
    from io import BytesIO
    from zipfile import ZipFile

    response = session.get("https://ww2.stj.jus.br/docs_internet/processo/dje/xml/stj_dje_20211011_xml.zip")

    zip_in_memory = BytesIO(response.content)

    try:
        my_zip = ZipFile(zip_in_memory, 'r')
        my_zip.testzip()
        result = True
    except Exception:
        result = False

    return result


class BaseTest(unittest.BetamaxTestCase):
    custom_headers = None
    custom_proxies = None
    _path_to_ignore = None
    _no_generator_return_search = False

    def setUp(self):
        super(BaseTest, self).setUp()
        if self.custom_headers:
            self.session.headers.update(self.custom_headers)
        if self.custom_proxies:
            self.session.proxies.update(self.custom_proxies)
        self.worker_under_test = self.worker_class()
        self.worker_under_test._session = self.session

    def test_search(self):
        result = the_function(self.session)
        assert result

I pass the self.session to function under test and use it to get a endpoint. Through that endpoint, I get the zip file in the form of bytes string (response.content). I found that test runs without errors if I don't use the Betamax session.

Test

Session headers

{'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

Request headers

{'Accept-Ranges': 'bytes', 'ETag': 'W/"159406-1633990217000"', 'Last-Modified': 'Mon, 11 Oct 2021 22:10:17 GMT', 'Content-Type': 'application/zip', 'Content-Length': '159406', 'Date': 'Thu, 21 Oct 2021 14:37:27 GMT', 'Set-Cookie': 'BIGipServerpool_wserv=973081866.20480.0000; path=/; Httponly, TS01dc523b=016a5b383346ca02628a7c1dd47ef26e8cadf4a1b22fa9261c6b9ac1de8ac5665e99bd4a42c5b1d0af72b97105f57020b5e0f78fa7452df6080bf5ea3ee7a85d2de98968a2; Path=/; Domain=.www.stj.jus.br', 'Strict-Transport-Security': 'max-age=604800; includeSubDomains', 'Content-Security-Policy': "upgrade-insecure-requests; frame-ancestors 'self' https://*.stj.jus.br https://*.web.stj.jus.br https://stjjus.sharepoint.com/"}

Actual content length

len(response.content) == 288055

Script execution

Session headers

{'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

Request headers

{'Accept-Ranges': 'bytes', 'ETag': 'W/"159406-1633990217000"', 'Last-Modified': 'Mon, 11 Oct 2021 22:10:17 GMT', 'Content-Type': 'application/zip', 'Content-Length': '159406', 'Date': 'Thu, 21 Oct 2021 14:39:24 GMT', 'Set-Cookie': 'BIGipServerpool_wserv=973081866.20480.0000; path=/; Httponly, TS01dc523b=016a5b3833746a54a2d1276a2b3de87f48f672e9cd7c18c4dad842ddddeac244bcbcf1a470b59eecf83bd6a3bdeffc7c7017210981de929d01df6c054118625399d2b04ad2; Path=/; Domain=.www.stj.jus.br', 'Strict-Transport-Security': 'max-age=604800; includeSubDomains', 'Content-Security-Policy': "upgrade-insecure-requests; frame-ancestors 'self' https://*.stj.jus.br https://*.web.stj.jus.br https://stjjus.sharepoint.com/"}

Actual content length

len(response.content) == 159406

I'm using Python 3.8.2, Betamax 0.8.1, Pytest 5.4.1 to run test and Requests 2.25.1

Related question: https://stackoverflow.com/questions/69653406/how-to-mock-a-function-that-downloads-a-large-binary-content-using-betamax

Related issue: #122

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions