Skip to content

Scraper for permalinked check-reports made on check-host.net

Notifications You must be signed in to change notification settings

chrisji/checkhost-report-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

checkhost-report-scraper

Coverage

Scraper for permalinked check-reports made on check-host.net, e.g., https://check-host.net/check-report/23d4f6aekc8

check-host.net is a website that provides free services used for network diagnostics, such as checking the status of a website. There is an API for creating reports, but not for fetching existing reports. This package facilitates the latter.

Usage

Standalone

python cli.py --report_id 23d4f6aekc8
{
    "report_id": "23d4f6aekc8",
    "permalink": "https://check-host.net/check-report/23d4f6aekc8",
    "report_type": "check-http",
    "target": "http://google.com:80",
    "date": "2025-03-08T19:45:31",
    "results": [
        {
            "country_code": "BR",
            "location": "Brazil, Sao Paulo",
            "result": "OK",
            "time": "0.076 s",
            "code": "301 (Moved Permanently)",
            "ip": "1.1.1.1",
        }
    ]
}

As a package

from checkhost_scraper.scraper import CheckHostReportScraper

scraper = CheckHostReportScraper()
report = scraper.scrape("23d63999k1c2")

# report
CheckHostReport(
    report_id='23d63999k1c2',
    permalink='https://check-host.net/check-report/23d63999k1c2',
    report_type='check-http',
    target='http://google.com:80',
    date='2025-03-08T19:45:31',
    results=[
        CheckHostResult(
            country_code='BR',
            location='Brazil, Sao Paulo',
            result='OK',
            time='0.076 s',
            code='301 (Moved Permanently)',
            ip='1.1.1.1'
        ),
        ...
        CheckHostResult(
            country_code='BG',
            location='Bulgaria, Sofia',
            result='OK',
            time='0.293 s',
            code='301 (Moved Permanently)',
            ip='1.1.1.1'
        )
    ]
)

Install

Add to requirements.txt

checkhost_scraper @ git+https://github.com/chrisji/checkhost-report-scraper.git@main

or clone and install

git clone https://github.com/chrisji/checkhost-report-scraper.git
cd checkhost-report-scraper
pip install -e .

Data structure

Report metadata is available and consistent for all report types:

  • The report_type. One of http-check, dns-check, ping-check, tcp-check, udp-check.
  • The target of the report, Example: https://example.com:443
  • The date the report was made (iso format). Example: 2021-01-01T00:00:00

The results structure changes depending on the report type, and parsing currently covers:

  • http-check
  • dns-check
  • ping-check
  • tcp-check
  • udp-check

See checkhost_scraper/models.py for details on the results structure for each type.

Example http-check jsonified result.

[
    {
        "country_code": "BR",
        "location": "Brazil, Sao Paulo",
        "result": "OK",
        "time": "0.076 s",
        "code": "301 (Moved Permanently)",
        "ip": "142.251.129.78"
    },
    {
        "country_code": "BG",
        "location": "Bulgaria, Sofia",
        "result": "OK",
        "time": "0.293 s",
        "code": "301 (Moved Permanently)",
        "ip": "142.250.187.110"
    }
]

Testing and development

  1. Pull known test data from the site: python tests/generate_test_data.py
  2. Run tests with coverage: pytest --cov=checkhost_scraper

Run tests / checks and update coverage badge: sh run-checks.sh

Disclaimer

I am not responsible for the use of this scraper. While all requested data is public, I recommend (a) being respectful of the site's resources and (b) using the data ethically.

About

Scraper for permalinked check-reports made on check-host.net

Resources

Stars

Watchers

Forks

Packages

No packages published