Skip to content
This repository has been archived by the owner on Jan 6, 2022. It is now read-only.

Feed results into HTTPS Everywhere #140

Open
Hainish opened this issue Oct 15, 2017 · 3 comments
Open

Feed results into HTTPS Everywhere #140

Hainish opened this issue Oct 15, 2017 · 3 comments

Comments

@Hainish
Copy link

Hainish commented Oct 15, 2017

Since you're already maintaining a list of news sites with HTTPS support, you could easily auto-generate rulesets for HTTPS Everywhere.

Any site which is fully available over HTTPS (e.g. no content is unavailable when a domain is only loaded over HTTPS), but not HSTS preloaded is eligible for inclusion in HTTPS Everywhere.

To create a new HTTPS Everywhere ruleset, you can clone the repo and run a simple ruleset generation script:

git clone [email protected]:EFForg/https-everywhere.git
cd https-everywhere/rules
./make-trivial-rule ExampleNewsSite.com

You can follow the common format generated from this example to create rules for other sites and subdomains. Refer to https://github.com/EFForg/https-everywhere/blob/master/CONTRIBUTING.md for full contribution documentation.

@thisisparker
Copy link

Hi Bill! Excited to get a chance to work on this :) It's pretty close to an opportunity I identified back in February so really I feel like I'm already six months behind on delivering.

I hope that identifying which domains are eligible for new rules is as easy as you suggest, but I'm worried about how we could pick out which sites are "fully available over HTTPS." Before we figure out how to automate this, I'd like to walk through what I'd do to generate a one-time dump of new rules.

The scorecard has a field called "Available over HTTPS" which is actually a combination of the scraper properties "valid_https" (which must be true) and "downgrades_https" (which must be false). That's certainly a start — of the 131 sites on the scorecard, fully half (66) are in the Goldilocks zone of being available over HTTPS but not HSTS preloaded.

Of those 66, about 44 look like they already have rules. Without delving into the contents of the XML, these domains or their slug is already found in the name of a rule. I've listed them at the bottom of this issue.

That leaves about 20 sites that might be rule-eligible. That seems like the ceiling, too, as poking around will certainly shake loose rules that are slightly irregularly named, or in some cases sites that our scanner identifies as "available over HTTPS" but which aren't "fully available over HTTPS," as you specify.

This is probably a small enough number that it makes sense to check those 20 or so domains to confirm that (a) they actually don't have a rule, and (b) confirm that everything works over HTTPS. Unless that number changes dramatically, because STN starts tracking many more sites or something like that, my inclination would be to just manually repeat this process every once in a while.

How does that work?

Eligible domains that may not have existing rules

axios.com
cbsnews.com
cnet.com
indiatimes.com
infobae.com
oglobo.globo.com
gazzetta.it
lastampa.it
mic.com
nzz.ch
onet.pl
scroll.in
techcrunch.com
theatlantic.com
theguardian.com
nytimes.com
thetimes.co.uk
thestar.com
theundefeated.com
univision.com
usnews.com
washingtonpost.com

Eligible domains with existing rules found

abcnews.go.com
alarabiya.net
arstechnica.com                                                                                                                                                     
ap.org
bloomberg.com
bostonglobe.com
buzzfeed.com
cnbc.com
cnn.com
welt.de
elpais.com
ft.com
forbes.com
foxnews.com
gizmodo.com
golem.de
heise.de
hongkongfp.com
lemonde.fr
nbcnews.com
nypost.com
nj.com
nrk.no
politico.com
propublica.org
qz.com
reuters.com
salon.com
taz.de
thedailybeast.com
theglobeandmail.com
independent.co.uk
themoscowtimes.com
newyorker.com
theverge.com
wsj.com
weather.com
usatoday.com
vanityfair.com
vice.com
vox.com
washingtontimes.com
wired.com
wp.pl

@thisisparker
Copy link

Sorry, didn't mean to close!

@thisisparker thisisparker reopened this Oct 16, 2017
@brainwane
Copy link

@thisisparker We spoke a couple weeks ago about your PRs and whether any of the ruleset generation scripts helped you get further -- any progress?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants