Skip to content

Download of images with redirects from http to https fails #247

@CodenameGreyFox

Description

@CodenameGreyFox

Java’s HttpURLConnection (which image-service uses) does not follow http to https. It can follow redirects only as long as the protocol doesn’t change.

Take the following example (from this dataset):

ubuntu@risky-images:~$ wget http://www.boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg
--2025-10-31 22:27:09--  http://www.boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg
Resolving www.boldsystems.org (www.boldsystems.org)... 172.234.202.175
Connecting to www.boldsystems.org (www.boldsystems.org)|172.234.202.175|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg [following]
--2025-10-31 22:27:09--  https://www.boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg
Connecting to www.boldsystems.org (www.boldsystems.org)|172.234.202.175|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg [following]
--2025-10-31 22:27:10--  https://boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg
Resolving boldsystems.org (boldsystems.org)... 172.234.202.175
Connecting to boldsystems.org (boldsystems.org)|172.234.202.175|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://bench.boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg [following]
--2025-10-31 22:27:11--  https://bench.boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg
Resolving bench.boldsystems.org (bench.boldsystems.org)... 131.104.63.48
Connecting to bench.boldsystems.org (bench.boldsystems.org)|131.104.63.48|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31271 (31K) [image/jpeg]
Saving to: ‘CBG-A02096-D09+1681236992.jpg’

In this case, the download works if we start the chain with https://www.boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg but not with http://www.boldsystems.org/pics/GMPTA/CBG-A02096-D09+1681236992.jpg . (http /https is the only difference)

HttpURLConnection has a method (setInstanceFollowRedirects) where you can select if it follows redirects (default true). But even with that set to true, it will never follow protocol changes. Java engineers did this intentionally for security reasons. What about our use case? Is this intentional due to security reasons? Or something to be fixed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions