Skip to content

Need help with python -m ichrome.web #140

@juanfrilla

Description

@juanfrilla

If i launch a browser as a service:
python -m ichrome.web
Then

import requests
from bs4 import BeautifulSoup

headers = {
   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
   'Accept-Language': 'es-ES,es;q=0.9',
   'Cache-Control': 'max-age=0',
   'Connection': 'keep-alive',
   'Sec-Fetch-Dest': 'document',
   'Sec-Fetch-Mode': 'navigate',
   'Sec-Fetch-Site': 'none',
   'Sec-Fetch-User': '?1',
   'Upgrade-Insecure-Requests': '1',
   'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36',
   'sec-ch-ua': '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"',
   'sec-ch-ua-mobile': '?0',
   'sec-ch-ua-platform': '"macOS"',
}

params = (
   ('url', "https://oficinajudicialvirtual.pjud.cl/home/index.php"),
)

response = requests.get('http://127.0.0.1:8080/chrome/preview', headers=headers, params=params)

soup = BeautifulSoup(response.text, 'html.parser')

recaptcha_url = soup.select('iframe[title="reCAPTCHA"]')[0]["src"]

I have the recaptcha url

But If I do it like this:

from bs4 import BeautifulSoup
from torequests import tPool
from inspect import getsource
req = tPool()



async def tab_callback(task, tab, data, timeout):
    await tab.wait_loading(20)
    return await tab.html

json = {
    'tab_callback': getsource(tab_callback),
    "timeout": 20,
    "incognito_args": {
        "url": "https://oficinajudicialvirtual.pjud.cl/home/index.php",
        "proxyServer": "37.19.220.129:8443"
    }
}

response = req.post('http://127.0.0.1:8080/chrome/do',json=json)

soup = BeautifulSoup(response.text, 'html.parser')

recaptcha_url = soup.select('iframe[title="reCAPTCHA"]')[0]["src"]

I'm not having the fully load soup, I guess it could be some security measure of the origin website im scraping.
Any help?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions