Skip to content

Commit 7339ba6

Browse files
author
Ronald Schmidt
committed
fixes for #23 #31
1 parent 4df0495 commit 7339ba6

File tree

3 files changed

+28
-24
lines changed

3 files changed

+28
-24
lines changed

docs/configuration.rst

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -15,23 +15,25 @@ Ensure the executing user has read/write permissions for this folder.
1515
Default configuration
1616
---------------------
1717

18-
* cachedir: '/tmp/.serpscrap/' - path cachefiles
19-
* clean_cache_after: 24 - clean cached files older then x hours
20-
* database_name: '/tmp/serpscrap' - path and name sqlite db (stores scrape results)
21-
* do_caching: True - enable / disable caching
22-
* headers: - dict to customize request header, see below
23-
* num_pages_for_keyword: 2 - number of result pages to scrape
24-
* num_results_per_page: 10 - number results per searchengine page
25-
* proxy_file: '' - path to proxy file, see below
26-
* scrape_urls: False - scrape urls of search results
27-
* search_engines: ['google'] - search engines (google)
28-
* url_threads: 3 - number of threads if scrape_urls is true
29-
* use_own_ip: True - if using proxies set to False
30-
* sleeping_min: 5 - min seconds to sleep between scrapes
31-
* sleeping_max: 15 - max seconds to sleep between scrapes
32-
* screenshot: True - enable screenshots for each query
33-
* dir_screenshot: '/tmp/screenshots' - basedir for saved screenshots
34-
* chrome_headless: True - run chrome in headless mode, default is True
18+
* cachedir: '/tmp/.serpscrap/' - path cachefiles
19+
* chrome_headless: True - run chrome in headless mode, default is True
20+
* clean_cache_after: 24 - clean cached files older then x hours
21+
* database_name: '/tmp/serpscrap' - path and name sqlite db (stores scrape results)
22+
* dir_screenshot: '/tmp/screenshots' - basedir for saved screenshots
23+
* do_caching: True - enable / disable caching
24+
* executable_path: '/usr/local/bin/chromedriver' - path to chromedriver
25+
* google_search_url: 'https://www.google.com/search?' - base search url, modify for other countries
26+
* headers: - dict to customize request header, see below
27+
* num_pages_for_keyword: 2 - number of result pages to scrape
28+
* num_results_per_page: 10 - number results per searchengine page
29+
* proxy_file: '' - path to proxy file, see below
30+
* scrape_urls: False - scrape urls of search results
31+
* screenshot: True - enable screenshots for each query
32+
* search_engines: ['google'] - search engines (google)
33+
* sleeping_max: 15 - max seconds to sleep between scrapes
34+
* sleeping_min: 5 - min seconds to sleep between scrapes
35+
* url_threads: 3 - number of threads if scrape_urls is true
36+
* use_own_ip: True - if using proxies set to False
3537

3638
Custom configuration
3739
--------------------
@@ -48,7 +50,9 @@ Change some config params.
4850
scrap = serpscrap.SerpScrap()
4951
scrap.init(config=config.get(), keywords=keywords)
5052
51-
Using your own configuration
53+
You can apply your own config dictionary. It is not required to provide any possible
54+
config key. by applying the default config values will be overwritten by the new values.
55+
for not provided config keys the deault values still exists.
5256

5357
.. code-block:: python
5458
@@ -61,10 +65,10 @@ Using your own configuration
6165
'database_name': '/tmp/serpscrap',
6266
'do_caching': True,
6367
'num_pages_for_keyword': 2,
64-
'proxy_file': '',
6568
'scrape_urls': True,
6669
'search_engines': ['google'],
67-
'url_threads': 3,
70+
'google_search_url': 'https://www.google.com/search?',
71+
'executable_path', '/usr/local/bin/chromedriver',
6872
}
6973
7074
config.apply(config_new)

docs/results.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ If you prefer to save the results use the as_csv() method.
1010
1111
{
1212
'query': 'example',
13-
'query_num_results total': 'Ungefähr 1.740.000.000 Ergebnisse (0,50 '
13+
'query_num_results_total': 'Ungefähr 1.740.000.000 Ergebnisse (0,50 '
1414
'Sekunden)\xa0',
1515
'query_num_results_page': 10,
1616
'query_page_number': 1,

serpscrap/config.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -88,10 +88,10 @@ def set(self, key, value):
8888
self.config.__setitem__(key, value)
8989

9090
def apply(self, config):
91-
"""apply an individual conig
91+
"""apply an individual config, replace default config
92+
by values of new config
9293
9394
Args:
9495
config (dict): new configuration
9596
"""
96-
97-
self.config = config
97+
self.config = {**self.config, **config}

0 commit comments

Comments
 (0)