GitHub - iotwlw/MagicGoogle: Google search for Amazon Data results crawler, get google search results that you need

MagicGoogle

1.What's MagicGoogle

This is an easy Google Searching crawler that you can get anything you want in the page by using it.

During the process of crawling,you need to pay attention to the limitation from google towards ip address and the warning of exception , so I suggest that you should pause running the program and own the Proxy ip

php - MagicGoogle

2.How to Use?

Run

pip install MagicGoogle
# Or
pip install git+https://github.com/howie6879/MagicGoogle.git
# Or
git clone https://github.com/howie6879/MagicGoogle.git
cd MagicGoogle
vim google_search.py
# Or 
python setup.py install

Example

from MagicGoogle import MagicGoogle
import pprint

# Or PROXIES = None
PROXIES = [{
    'http': 'http://192.168.2.207:1080',
    'https': 'http://192.168.2.207:1080'
}]

# Or MagicGoogle()
mg = MagicGoogle(PROXIES)

#  Crawling the whole page
result = mg.search_page(query='python')

# Crawling url
for url in mg.search_url(query='python'):
    pprint.pprint(url)
    
# Output
# 'https://www.python.org/'
# 'https://www.python.org/downloads/'
# 'https://www.python.org/about/gettingstarted/'
# 'https://docs.python.org/2/tutorial/'
# 'https://docs.python.org/'
# 'https://en.wikipedia.org/wiki/Python_(programming_language)'
# 'https://www.codecademy.com/courses/introduction-to-python-6WeG3/0?curriculum_id=4f89dab3d788890003000096'
# 'https://www.codecademy.com/learn/python'
# 'https://developers.google.com/edu/python/'
# 'https://learnpythonthehardway.org/book/'
# 'https://www.continuum.io/downloads'

# Get {'title','url','text'}
for i in mg.search(query='python', num=1):
    pprint.pprint(i)
    
# Output
# {'text': 'The official home of the Python Programming Language.',
# 'title': 'Welcome to Python .org',
# 'url': 'https://www.python.org/'}

You can see google_search.py

If you need a big amount of querie but only having an ip address,I suggest you can have a time lapse between 5s ~ 30s.

The reason that it always return empty might be as follows:

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="https://ipv4.google.com/sorry/index?continue=https://www.google.me/s****">here</A>.
</BODY></HTML>

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
Bing		Bing
Examples		Examples
Google		Google
MagicBing		MagicBing
MagicGoogle		MagicGoogle
Tools		Tools
html		html
tests		tests
.gitignore		.gitignore
README.md		README.md
TodoList.md		TodoList.md
google_search.py		google_search.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MagicGoogle

1.What's MagicGoogle

2.How to Use?

About

Uh oh!

Releases

Packages

Languages

iotwlw/MagicGoogle

Folders and files

Latest commit

History

Repository files navigation

MagicGoogle

1.What's MagicGoogle

2.How to Use?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages