Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to pass in user agent header, connection timeout etc.? #2

Open
giorgio79 opened this issue Jun 19, 2014 · 2 comments
Open

Ability to pass in user agent header, connection timeout etc.? #2

giorgio79 opened this issue Jun 19, 2014 · 2 comments

Comments

@giorgio79
Copy link

Hello,

I would like to pass in user agent, connect timeout etc. with the varioius drivers. Perhaps, also check if robots.txt allows spidering.
Such opts can be handled well in curl, I am unaware of the rest.

Re RequestFacory https://github.com/OpenBuildings/spiderling/blob/3f2da1a3bc6b8a7b48639ce159e3668ae65e10b8/src/Openbuildings/Spiderling/Driver/Simple/RequestFactory/HTTP.php

@giorgio79 giorgio79 changed the title Ability to pass in curl opts? Ability to pass in user agent header, connection timeout etc.? Jun 19, 2014
@ivank
Copy link
Contributor

ivank commented Jun 19, 2014

Hi
nice to see some interest in this library :) It was mainly developed to facilitate testing not crawling, so I didn't really have those concerns.
All the drivers already support setting user_agent so thats one thing crossed from your list.
You can easily add a method to pass arbitrary curl options in the class you referenced, and make a pull request out of it.
Connection timeout and robots.txt checking could also be added to other drivers, but that's work that I don't really have time to do ATM, sorry. I will be very appreciative of pull requests though.

@giorgio79
Copy link
Author

Thanks!

nice to see some interest in this library :)

Yes, Spiderling is awesome.

Connection timeout and robots.txt checking could also be added to other drivers, but that's work that I don't really have time to do ATM, sorry. I will be very appreciative of pull requests though.

Meanwhile, I notice there are plenty of robots.txt classes on github... I might just throw sg together and run with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants