Simplify your web scrapping tasks with ScraperAPI. This SDK works with Guzzle client, providing asynchronous multithreading requests and convenient PSR-7 response.
Requires PHP version 7.2 or higher
$ composer require evgeek/scraperapi-sdk
<?php
use Evgeek\Scraperapi\Client;
require '/path/to/you/vendor/autoload.php';
//Create and configure a new SDK client
$sdk = new Client('YOUR_TOKEN');
//Send request
$response = $sdk->get('https://example.com');
//Work with \Psr\Http\Message\ResponseInterface
echo $response->getBody()->getContents();
The client is configured through the constructor parameters:
$apiKey
(required) - your API key from ScraperAPI dashboard.$defaultApiParams
- default API parameters for requests$defaultHeaders
- default HTTP headers$timeout
(default60
) - request timeout.$tries
(default3
) - number of request attempts.$delayMultiplier
(default1
) - delay multiplier before new request attempt in seconds. Multiplier 3 means 2nd attempt will be in 3 sec, 3rd attempt in 6 sec, etc.$printDebugInfo
(defaultfalse
) - if true, debug messages will be printed. Useful for debugging async requests.$showApiKey
(defaultfalse
) - if false, API key in debug messages will be replaced by 'API_KEY' string.$logger
(defaultnull
) - PSR-3 compatible logger for debug messages. If null, they will be sent to the STDOUT.$logLevel
(defaultnull
) - log level for PSR-3 logger. If null, debug messages will be sent to theDEBUG
level.$maxExceptionsLength
(default120
) - maximum length of Guzzle exceptions messages.
Configuring default API functionality according to ScraperAPI documentation . The default settings apply to all requests, unless they are overridden at the request level. You can set the default options only from constructor (SDK client is immutable), using the second parameter:
$defaultApiParams = [
'country_code' => 'us', //activate country geotargetting
'render' => true, //activate javascript rendering
'premium' => false, //activate premium residential and mobile IPs
'session_number' => 123, //reuse the same proxy
'keep_headers' => true, //use your own custom headers
'device_type' => 'mobile', //set mobile or desktop user agents
'autoparse' => 'false', //activate auto parsing for select websites
];
$sdk = new Client('YOU_TOKEN', $defaultApiParams);
You can add default headers with the third parameter of the constructor. Don't forget to enable keep_headers
to
allow your headers to be used!
$defaultHeaders = [
'Referer' => 'https://example.com/',
'Accept' => 'application/json',
];
$sdk = new Client('YOU_TOKEN', ['keep_headers' => true], $defaultHeaders);
SDK supports GET
, POST
and PUT
HTTP methods. Standard parameters of each request methods:
$url
(required) - url of scrapped resource.$apiParams
(defaultnull
) - to set the API settings for the request. They will override the defaults from the SDK Client (only those that overlap).$headers
(defaultnull
) - to set headers for the request. Just like$apiParams
, they will override the defaults from the SDK Client (only those that overlap).
Pretty simple:
$response = $sdk->get(
'https://example.com',
['keep_headers' => true],
[
'Referer' => 'https://example.com/',
'Accept' => 'application/json',
]
);
$content = $response->getBody()->getContents();
A bit more complicated:
$response = $sdk->post('https://example.com', $apiParams, $headers, $body, $formParams, $json);
$content = $response->getBody()->getContents();
You can use three types of payload:
$body
for rawstring
,fopen()
resource orPsr\Http\Message\StreamInterface
.$formParams
- for form data . Associativearray
of form field names to values where each value is a string or array of strings.$json
- for json. The passed associativearray
will be automatically converted to json data.
There are also short forms of methods for different types of payloads:
$response = $sdk->postBody($url, $body, $apiParams, $headers);
$response = $sdk->postForm($url, $formParams, $apiParams, $headers);
$response = $sdk->postJson($url, $json, $apiParams, $headers);
By the way, it is convenient to pass the GraphQL payload through $json
:
$query = '
query HeroNameAndFriends($episode: Episode) {
hero(episode: $episode) {
name
friends {
name
}
}
}
';
$json = ['query' => $query, 'variables' => ['episode' => 'JEDI']];
$response = $sdk->postJson('https://example.com', $json);
Everything is similar to synchronous, but the work is going not through requests, but through promises:
//Create array with promises
$promises = [
'first' => $sdk->getPromise('https://example.com', ['country_code' => 'us']),
'second' => $sdk->postPromiseBody('https://example.com', 'payload'),
];
//Asynchronous fulfillment of promises
$responses = $sdk->resolvePromises($promises);
//Work with array of responses
foreach ($responses as $response) {
echo $response->getBody()->getContents() . PHP_EOL;
}
You can get your ScraperAPI account information using the accountInfo()
method:
$info = $sdk->accountInfo();
var_dump(json_decode($info, true));
array(5) {
["concurrencyLimit"]=>
int(5)
["concurrentRequests"]=>
int(0)
["failedRequestCount"]=>
int(258)
["requestCount"]=>
int(588)
["requestLimit"]=>
string(4) "1000"
}