Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache installers (e.g. miniconda one) locally #211

Open
yarikoptic opened this issue Nov 7, 2024 · 0 comments
Open

cache installers (e.g. miniconda one) locally #211

yarikoptic opened this issue Nov 7, 2024 · 0 comments

Comments

@yarikoptic
Copy link
Member

avoiding redownload if not changed on the server.

we have

❯ curl -I https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
HTTP/2 200 
date: Thu, 07 Nov 2024 16:46:22 GMT
content-type: application/octet-stream
content-length: 148337011
x-amz-id-2: VLknl+KFa51qeFWTDSlHWCfiRg2FxD80aEU09CdhCYmTOZgsa6yDC13RaJA1Kv+ycx/Na4FebvM=
x-amz-request-id: NMX8JNVVAD0C8QV7
last-modified: Wed, 23 Oct 2024 09:21:04 GMT
x-amz-version-id: K6gTIgQxXjLAU4upfUiB1tKY8plWPo0w
etag: "2114faf08535c9ca25f31e0d17638506-18"
cf-cache-status: HIT
age: 1312399
expires: Thu, 07 Nov 2024 16:46:52 GMT
cache-control: public, max-age=30
accept-ranges: bytes
set-cookie: __cf_bm=xDK8MEi2Qz2R2PpnSIKSlLMx5vPYd2hp93Wh.RUEWK4-1730997982-1.0.1.1-wh9vFWkx8a4X78iDzEiaTPqzrCj2.3samxW7YYqhEPW.I9IH7qAM4BVAfNPsUFJZdFEB10gUK4AIjSrgP9TjVA; path=/; expires=Thu, 07-Nov-24 17:16:22 GMT; domain=.anaconda.com; HttpOnly; Secure; SameSite=None
content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://content.anaconda.com/
x-robots-tag: noindex
server: cloudflare
cf-ray: 8deecc90bdda6ac5-BOS

so there is etag and cache-control.

some info on the matter from chatgpt

Python’s standard libraries don’t offer built-in support for handling ETag or Cache-Control headers directly for caching HTTP responses. However, you can implement this logic by combining modules like requests (third-party) or http.client with custom caching logic. Here’s an overview of how you can handle caching with ETag and Cache-Control:

  1. ETag Handling: When you first request a resource, the server might return an ETag header. For subsequent requests, you can send this ETag as an If-None-Match header. If the server responds with 304 Not Modified, you know the content hasn’t changed, and you can use your cached version.

  2. Cache-Control Header Handling: The server might provide a Cache-Control or Expires header that specifies how long the response is valid. You can store the expiration time alongside the cached response and check this before re-requesting.

Here’s a simplified example using requests and shelve to cache responses with ETag and Cache-Control:

import requests
import shelve
import time

# Initialize a simple cache using shelve
cache = shelve.open('http_cache')

def get_with_cache(url):
    cached_response = cache.get(url)
    headers = {}

    # Check if we have an ETag and expiration
    if cached_response:
        etag = cached_response.get('etag')
        expires = cached_response.get('expires')

        # If expired, we’ll revalidate with the server
        if expires and expires < time.time():
            headers['If-None-Match'] = etag

    # Make the request
    response = requests.get(url, headers=headers)

    # Handle 304 Not Modified
    if response.status_code == 304:
        print("Using cached version")
        return cached_response['content']
    else:
        print("Fetched new version")
        # Update cache with new ETag and expiration
        cache[url] = {
            'content': response.text,
            'etag': response.headers.get('ETag'),
            'expires': time.time() + parse_cache_control(response.headers.get('Cache-Control', ''))
        }
        return response.text

def parse_cache_control(cache_control):
    # Very simple Cache-Control parser for max-age
    if 'max-age' in cache_control:
        max_age = int(cache_control.split('=')[1])
        return max_age
    return 0  # No caching if Cache-Control header isn’t usable

# Usage
url = 'https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh'
content = get_with_cache(url)

# Close the cache when done
cache.close()

Explanation

  • etag: Stores the ETag from the server and sends it as If-None-Match to avoid redownloading if unchanged.
  • expires: Determines when to skip the server call if max-age is provided in Cache-Control.

This code does require requests, which is a third-party library but greatly simplifies the HTTP requests and response handling. For built-in libraries, you’d need to use http.client or urllib.request and implement caching manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant