Skip to content

Google search for Amazon Data results crawler, get google search results that you need

Notifications You must be signed in to change notification settings

iotwlw/MagicGoogle

This branch is 37 commits ahead of, 11 commits behind howie6879/magic_google:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
iotwlw
Feb 26, 2019
89ac477 · Feb 26, 2019

History

67 Commits
May 1, 2018
Apr 16, 2018
Sep 4, 2018
May 9, 2018
Sep 10, 2018
Apr 28, 2018
May 1, 2018
Sep 4, 2018
May 9, 2018
Nov 3, 2017
Feb 26, 2019
Feb 26, 2019
Sep 4, 2018
Dec 9, 2017

Repository files navigation

MagicGoogle

PyPI

1.What's MagicGoogle

This is an easy Google Searching crawler that you can get anything you want in the page by using it.

During the process of  crawling,you need to pay attention to the limitation from google towards ip address and the warning of exception , so I suggest that you should pause running the program and own the Proxy ip

php - MagicGoogle

2.How to Use?

Run

pip install MagicGoogle
# Or
pip install git+https://github.com/howie6879/MagicGoogle.git
# Or
git clone https://github.com/howie6879/MagicGoogle.git
cd MagicGoogle
vim google_search.py
# Or 
python setup.py install

Example

from MagicGoogle import MagicGoogle
import pprint

# Or PROXIES = None
PROXIES = [{
    'http': 'http://192.168.2.207:1080',
    'https': 'http://192.168.2.207:1080'
}]

# Or MagicGoogle()
mg = MagicGoogle(PROXIES)

#  Crawling the whole page
result = mg.search_page(query='python')

# Crawling url
for url in mg.search_url(query='python'):
    pprint.pprint(url)
    
# Output
# 'https://www.python.org/'
# 'https://www.python.org/downloads/'
# 'https://www.python.org/about/gettingstarted/'
# 'https://docs.python.org/2/tutorial/'
# 'https://docs.python.org/'
# 'https://en.wikipedia.org/wiki/Python_(programming_language)'
# 'https://www.codecademy.com/courses/introduction-to-python-6WeG3/0?curriculum_id=4f89dab3d788890003000096'
# 'https://www.codecademy.com/learn/python'
# 'https://developers.google.com/edu/python/'
# 'https://learnpythonthehardway.org/book/'
# 'https://www.continuum.io/downloads'

# Get {'title','url','text'}
for i in mg.search(query='python', num=1):
    pprint.pprint(i)
    
# Output
# {'text': 'The official home of the Python Programming Language.',
# 'title': 'Welcome to Python .org',
# 'url': 'https://www.python.org/'}

You can see google_search.py

If  you need a big amount of querie but only having an ip address,I suggest  you can have a time lapse between 5s ~ 30s.

The reason that it always return empty might be as follows:

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://ipv4.google.com/sorry/index?continue=https://www.google.me/s****">here</A>.
</BODY></HTML>

About

Google search for Amazon Data results crawler, get google search results that you need

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 78.6%
  • Python 21.4%