Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue with links not being found #298

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Commits on Feb 5, 2020

  1. Fixed issue with links not being found

    Google recently changed the way they present the image data, and so the links were no longer being scraped.
    I figured out how to get the image urls with the new system and made the appropriate changes so it would work. 
    
    Unfortunately, google no longer provides file format data so I had to try and retrieve it from the url of the image, which does not work in some cases.
    Joeclinton1 authored Feb 5, 2020
    Configuration menu
    Copy the full SHA
    aa1f012 View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2020

  1. Fixed None type

    By filtering out the image objects which had data[0]==2, I have removed the null items and it will no longer give the error: "TypeError: 'NoneType' object is not subscriptable".
    Joeclinton1 authored Feb 9, 2020
    Configuration menu
    Copy the full SHA
    66f69d6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8b794e0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a36a378 View commit details
    Browse the repository at this point in the history

Commits on Feb 10, 2020

  1. Fix more none type errors

    This system is not very flexible, it seems google does not keep the same positions of target items, so sometimes it doens't work. I added a try-except just in case there are more problems
    Joeclinton1 authored Feb 10, 2020
    Configuration menu
    Copy the full SHA
    fbc4a16 View commit details
    Browse the repository at this point in the history

Commits on Mar 14, 2020

  1. Fix download of >100 items

    It is based on patch by https://github.com/Joeclinton1, but for some
    reason we get escaped string when getting the results page directly
    (limit < 101) and unescaped one when getting the results page using
    selenium. This is not the most elegant solution, but it works for me.
    voins committed Mar 14, 2020
    Configuration menu
    Copy the full SHA
    ef577fc View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2020

  1. Intercept ajax calls

    voins committed Mar 24, 2020
    Configuration menu
    Copy the full SHA
    90e52a4 View commit details
    Browse the repository at this point in the history
  2. Decode data from ajax calls

    voins committed Mar 24, 2020
    Configuration menu
    Copy the full SHA
    7db9a46 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2020

  1. Configuration menu
    Copy the full SHA
    2cd6817 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    068712b View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2020

  1. google changed their format a little. again

    Alexey Voinov committed Jun 17, 2020
    Configuration menu
    Copy the full SHA
    d8dd8a9 View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2020

  1. Merge pull request #2 from Joeclinton1/master

    Merged master and patch-1
    Joeclinton1 authored Jun 27, 2020
    Configuration menu
    Copy the full SHA
    18b0e45 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    36f798f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    620e7f5 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2020

  1. Fixed end_object find code

    Previously the end_object for the data pack was found by searching for '</script>' and then going 4 characters back, however google in a recent update has added , 'sideChannel: {}});' to the end of the data pack, which throws it off. To fix this the end_object finding script first searches for '</script>' and then searches for the first ']' to the left of that closing script tag. This should be more flexible.
    Joeclinton1 authored Sep 6, 2020
    Configuration menu
    Copy the full SHA
    bcb2af3 View commit details
    Browse the repository at this point in the history
  2. Improved exception handling

    Previously if the data unpacking failed it would tell the user that the URL could not be opened. But this is the wrong exception. So i fixed this by splitting up the data un packing and url opening into seperate parts so each can have their own exception. This should make it easier to identify what has gone wrong.
    Joeclinton1 authored Sep 6, 2020
    Configuration menu
    Copy the full SHA
    58a190b View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2021

  1. Configuration menu
    Copy the full SHA
    aa817df View commit details
    Browse the repository at this point in the history

Commits on May 25, 2021

  1. Add files via upload

    estuhr1206 authored May 25, 2021
    Configuration menu
    Copy the full SHA
    2a310f1 View commit details
    Browse the repository at this point in the history
  2. Delete google_images_download.py

    just added to wrong directory by accident
    estuhr1206 authored May 25, 2021
    Configuration menu
    Copy the full SHA
    c17c55d View commit details
    Browse the repository at this point in the history
  3. Add files via upload

    estuhr1206 authored May 25, 2021
    Configuration menu
    Copy the full SHA
    4c5e6a4 View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2021

  1. Merge pull request #6 from estuhr1206/patch-1

    Adding offset
    Joeclinton1 authored Jun 1, 2021
    Configuration menu
    Copy the full SHA
    dd0b83d View commit details
    Browse the repository at this point in the history

Commits on Jun 16, 2021

  1. Get more than 400 images

    Fix clicking on the "Show more results" button with Selenium.
    
    - The button has no more "smb" id
    - We need to do more scroll down before clicking
    NicolasGrosjean authored Jun 16, 2021
    Configuration menu
    Copy the full SHA
    2f9f801 View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2021

  1. Fix JSONDecodeError: Extra Data

    This may have been caused by Google changing their Ajax response. Looking at the response, lines[4] only contained a single number and not any JSON. Removing it and simply pulling from lines[3] seems to fix the issue. The problem only manifested when downloading more than 100 images, which required launching ChromeDriver.
    matthewlehew authored Jun 30, 2021
    Configuration menu
    Copy the full SHA
    df2e289 View commit details
    Browse the repository at this point in the history

Commits on Aug 25, 2021

  1. Manage API change

    We extracted images from json.loads(data)[31][0]... because in json.loads(data)[31] was a list of 1 value.
    Now json.loads(data)[31] is a list of 2 values and we want the last.
    So replacing 0 by -1 manage this new case and the old one if Google revert this change.
    NicolasGrosjean authored Aug 25, 2021
    Configuration menu
    Copy the full SHA
    a8e28e2 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2021

  1. Fix time_range argument

    The time range feature has changed, I used this tweet thread to fix it : https://twitter.com/i/events/1174066444029419520.
    
    We can imagine work on the time_range format to avoid changing the "API".
    NicolasGrosjean authored Sep 20, 2021
    Configuration menu
    Copy the full SHA
    375b6bb View commit details
    Browse the repository at this point in the history

Commits on Sep 22, 2021

  1. Remove time range from directoriy names

    It is not very useful to have the time range expression in the image directory names.
    NicolasGrosjean authored Sep 22, 2021
    Configuration menu
    Copy the full SHA
    a0c18fd View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2021

  1. Merge pull request #7 from NicolasGrosjean/patch-3

    Get more than 400 images
    Joeclinton1 authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    7c91e00 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #8 from matthewlehew/patch-1

    Fix JSONDecodeError: Extra Data
    Joeclinton1 authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    9a0008d View commit details
    Browse the repository at this point in the history
  3. Merge pull request #9 from NicolasGrosjean/patch-4

    Manage API change
    Joeclinton1 authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    9070776 View commit details
    Browse the repository at this point in the history
  4. Merge pull request #10 from NicolasGrosjean/patch-6

    Fix time_range argument
    Joeclinton1 authored Sep 26, 2021
    Configuration menu
    Copy the full SHA
    e13cc55 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c773e1c View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2022

  1. Fix exact_size parameter #11

    Update the url building to the new way to get the exact image size thanks to this article :
    https://www.labnol.org/internet/google-image-size-search/26902/
    NicolasGrosjean authored Feb 23, 2022
    Configuration menu
    Copy the full SHA
    36e5c06 View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2022

  1. Merge pull request #12 from NicolasGrosjean/patch-7

    Fix exact_size parameter
    Joeclinton1 authored Mar 3, 2022
    Configuration menu
    Copy the full SHA
    ce512d9 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2022

  1. Support Firefox

    voronaam committed Aug 5, 2022
    Configuration menu
    Copy the full SHA
    cf190d8 View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2022

  1. Merge pull request #21 from voronaam/patch-1

    Support Firefox
    Joeclinton1 authored Aug 15, 2022
    Configuration menu
    Copy the full SHA
    ae03d01 View commit details
    Browse the repository at this point in the history

Commits on Aug 18, 2022

  1. Configuration menu
    Copy the full SHA
    dcb4619 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #23 from Joeclinton1/patch-3

    Fix issue #20
    Joeclinton1 authored Aug 18, 2022
    Configuration menu
    Copy the full SHA
    03671f3 View commit details
    Browse the repository at this point in the history

Commits on Sep 23, 2022

  1. Merge pull request #1 from Joeclinton1/patch-1

    upstream updates
    ellisbrown authored Sep 23, 2022
    Configuration menu
    Copy the full SHA
    945aeff View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    dffca08 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3f58a9a View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2022

  1. fix chromium downloads

    ellisbrown committed Sep 24, 2022
    Configuration menu
    Copy the full SHA
    219b850 View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2022

  1. Configuration menu
    Copy the full SHA
    1421a43 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2022

  1. revert rollback from 9/26

    ellisbrown committed Sep 30, 2022
    Configuration menu
    Copy the full SHA
    2e117f3 View commit details
    Browse the repository at this point in the history
  2. Merge pull request #26 from ellisbrown/patch-1

    fix breaking change due to google's response format
    Joeclinton1 authored Sep 30, 2022
    Configuration menu
    Copy the full SHA
    e91e6a3 View commit details
    Browse the repository at this point in the history