-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
If i launch a browser as a service:
python -m ichrome.web
Then
import requests
from bs4 import BeautifulSoup
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
'Accept-Language': 'es-ES,es;q=0.9',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Sec-Fetch-Dest': 'document',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-Site': 'none',
'Sec-Fetch-User': '?1',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36',
'sec-ch-ua': '"Google Chrome";v="117", "Not;A=Brand";v="8", "Chromium";v="117"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"macOS"',
}
params = (
('url', "https://oficinajudicialvirtual.pjud.cl/home/index.php"),
)
response = requests.get('http://127.0.0.1:8080/chrome/preview', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
recaptcha_url = soup.select('iframe[title="reCAPTCHA"]')[0]["src"]I have the recaptcha url
But If I do it like this:
from bs4 import BeautifulSoup
from torequests import tPool
from inspect import getsource
req = tPool()
async def tab_callback(task, tab, data, timeout):
await tab.wait_loading(20)
return await tab.html
json = {
'tab_callback': getsource(tab_callback),
"timeout": 20,
"incognito_args": {
"url": "https://oficinajudicialvirtual.pjud.cl/home/index.php",
"proxyServer": "37.19.220.129:8443"
}
}
response = req.post('http://127.0.0.1:8080/chrome/do',json=json)
soup = BeautifulSoup(response.text, 'html.parser')
recaptcha_url = soup.select('iframe[title="reCAPTCHA"]')[0]["src"]I'm not having the fully load soup, I guess it could be some security measure of the origin website im scraping.
Any help?
Metadata
Metadata
Assignees
Labels
No labels