Fixed issue with links not being found #298

Joeclinton1 · 2020-02-05T22:57:13Z

Google recently changed the way they present the image data, and so the links were no longer being scraped.
I figured out how to get the image urls with the new system and made the appropriate changes so it would work.

Unfortunately, google no longer provides file format data so I had to try and retrieve it from the url of the image, which does not work in some cases.

EDIT: Since this keeps being asked, here's the code to download the patch for windows:

git clone https://github.com/Joeclinton1/google-images-download.git
cd google-images-download && python setup.py install

Google recently changed the way they present the image data, and so the links were no longer being scraped. I figured out how to get the image urls with the new system and made the appropriate changes so it would work. Unfortunately, google no longer provides file format data so I had to try and retrieve it from the url of the image, which does not work in some cases.

landing-insights-bot

Seems like this will only get the first 100 images, correct?
The rest of the images get dynamically loaded through the batchexecute call.

Joeclinton1 · 2020-02-05T23:31:22Z

Seems like this will only get the first 100 images, correct?
The rest of the images get dynamically loaded through the batch execute call.

Sorry, I wasn't downloading more than 100, so I didn't think about this. I have not tested if this works with above 100, but my guess is it will not.

However, I know the below 100 does not work without these changes.

landing-insights-bot · 2020-02-05T23:32:25Z

cool, well 100 is much better than 0 :)

…

On Wed, Feb 5, 2020 at 3:31 PM Joe Clinton ***@***.***> wrote: Seems like this will only get the first 100 images, correct? The rest of the images get dynamically loaded through the batch execute call. Sorry, I wasn't downloading more than 100, so I didn't think about this. I have not tested if this works with above 100, but my guess is it will not. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#298?email_source=notifications&email_token=ANEQBTLQ4B477L5555465TTRBND4XA5CNFSM4KQTN5ZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK5LZ6Y#issuecomment-582663419>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ANEQBTJTRZPLUTUBB3YYNGDRBND4XANCNFSM4KQTN5ZA> .

MarlonHie · 2020-02-06T12:26:27Z

I got everytime this error after circa 20 downloaded images.
I tried from command line and with a python file

Traceback (most recent call last):
File "/home/user/.local/bin/googleimagesdownload", line 11, in
load_entry_point('google-images-download==2.8.0', 'console_scripts', 'googleimagesdownload')()
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 1005, in main
paths,errors = response.download(arguments) #wrapping response in a variable just for consistency
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 832, in download
paths, errors = self.download_executor(arguments)
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 959, in download_executor
items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments) #get all image items and download images
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 769, in _get_all_items
object = self.format_object(image_objects[i])
File "/home/user/.local/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 276, in format_object
main = data[3]
TypeError: 'NoneType' object is not subscriptable

vk379 · 2020-02-08T17:43:27Z

Hey, Much like MarlonHie,
I have also received the same error.
Could you please advise? It keeps saying NoneType object is not subscriptable.
Thanks,

Rian-T · 2020-02-08T22:30:14Z

I made a quick fix for the NoneType error. I was working on a project using this so I needed it to work again rapidly. Still working only under 100 images though.

Joeclinton1#1

Joeclinton1 · 2020-02-09T15:34:59Z

Sorry, for not replying faster, the none-type thing is because every so often a item with a null value for the image data is given. Fortunately, all of these items are marked with 2 in the data[0] column, so I will just remove them. This should fix the problem. Rian-T's solution also works.

By filtering out the image objects which had data[0]==2, I have removed the null items and it will no longer give the error: "TypeError: 'NoneType' object is not subscriptable".

greg-oz · 2020-02-10T01:13:50Z

I am still getting these errors with the latest Joeclinton1 version:

File "google_images_download.py", line 1017, in
main()
File "google_images_download.py", line 1006, in main
paths,errors = response.download(arguments) #wrapping response in a variable just for consistency
File "google_images_download.py", line 842, in download
paths, errors = self.download_executor(arguments)
File "google_images_download.py", line 960, in download_executor
items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments) #get all image items and download images
File "google_images_download.py", line 763, in _get_all_items
image_objects = self._get_image_objects(page)
File "google_images_download.py", line 752, in _get_image_objects
object_decode = bytes(object_raw, "utf-8").decode("unicode_escape")
TypeError: str() takes at most 1 argument (2 given)

This system is not very flexible, it seems google does not keep the same positions of target items, so sometimes it doens't work. I added a try-except just in case there are more problems

hodsonus · 2020-02-10T19:29:43Z

Doesn't seem to work with more than 100 photos, I attempted with 1000 and it gave me this.

edit: Oops, read a little bit closer and that's a known issue

edgabaldi · 2020-02-10T19:49:55Z

I ran with 20 queries and some returns this exception:

Traceback (most recent call last):
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/celery/app/trace.py", line 648, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/deploy/curador/releases/20200201133617/apps/ean/tasks.py", line 14, in download_image
    cmd.download()
  File "/home/deploy/curador/releases/20200201133617/apps/ean/domain/googleimages.py", line 32, in download
    response.download(config_dict)
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 838, in download
    paths, errors = self.download_executor(arguments)
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 965, in download_executor
    items,errorCount,abs_path = self._get_all_items(raw_html,main_directory,dir_name,limit,arguments)    #get all image items and download images
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 768, in _get_all_items
    image_objects = self._get_image_objects(page)
  File "/home/deploy/curador/venv/lib/python3.6/site-packages/google_images_download/google_images_download.py", line 758, in _get_image_objects
    image_objects = json.loads(object_decode)[31][0][12][2]
IndexError: list index out of range

codefreak404 · 2020-02-10T21:03:15Z

Hi all,

For time being the probable fix is to add image downloader extension to your chrome browser (https://chrome.google.com/webstore/detail/image-downloader/cnpniohnfphhjihaiiggeabnkjhpaldj?hl=en-US).
I am working to fix the issue, will give an update shortly.

Thanks.

Joeclinton1 · 2020-02-11T10:47:23Z

I believe the solution I have is too inflexible for deployment, as google does not seem to keep a stable enough structure to the databack send in the callback. A different solution, perhaps one which collects links which are not thumbnails inside the callback might work better.

decidev22 · 2020-02-18T03:12:49Z

How do you import this fixed version and run it?

hodsonus · 2020-02-18T03:31:19Z

there isn't a working solution right now.

seth814 · 2020-02-18T04:20:57Z

I've been trying to get limit > 100 to work. It seems selenium's browser.page_source returns lots of new lines compared to the other raw_html you typically get. I've tried stripping newlines off, but no success. Eventually it will search for: "AF_initDataCallback({key: \'ds:2\'" but returns -1. If I search just "AF_initDataCallback" I can get a start index, but this will still just result in JSONDecodeError. So it seems the entire raw_html from download_extended_page is getting parsed incorrectly.

EDIT: Converting the string to a bytearray and back to a string allowed the image_objects to parse correctly. len(image_objects) was only 100 though so maybe selenium isn't scrolling far down enough? Will keep looking...

EDIT2: It seems my string from download_extended_page is larger, but object length staying at 100. Running with short length vs length > 100, the delta between the start and stop indexes is ~122400 for both raw_html after parsing. So no new images seem to be actually included with the expanded page_source despite it being a larger string.

Unfortunately, it appears the google image formatting has been changed this is a temporary solution from "hardikvasa/google-images-download#298" Change-Id: Iadcfa995e6b7c6229505ec0872810876575d738e

Unfortunately, it appears the google image formatting has been changed this is a temporary solution from "hardikvasa/google-images-download#298" Change-Id: Iadcfa995e6b7c6229505ec0872810876575d738e Signed-off-by: goodmeow <[email protected]>

Unfortunately, it appears the google image formatting has been changed this is a temporary solution from "hardikvasa/google-images-download#298" Change-Id: Iadcfa995e6b7c6229505ec0872810876575d738e Signed-off-by: goodmeow <[email protected]> scrappers.py:

RetroSeasons · 2022-08-21T20:31:00Z

Getting an error with every try, for example:

googleimagesdownload --keywords "ty cobb" --limit 10
Item no.: 1 --> Item name = ty cobb
Evaluating...
str() takes at most 1 argument (2 given)
Image objects data unpacking failed. Please leave ...

Python 2.7.17
Ubuntu 18.04.4 LTS
Selenium 3.141.0

mrclean789 · 2022-08-24T13:46:44Z

@RetroSeasons pip uninstall google-images-download and then run setup.py again

mrclean789 · 2022-08-26T18:29:46Z

I'm now getting this error too. I've run the command multiple times and it always works in the beginning but then the error appears at random. Sometimes it's after the first 1 or 2 keywords - the most its gone up to is around 30 keywords before it gives me the error.

list index out of range
Image objects data unpacking failed.

Jerick5555 · 2022-09-23T07:12:25Z

I am also getting this error.

list index out of range
Image objects data unpacking failed.

It seems to happen after at least 2 keywords then it fails somewhat randomly at the start of any keyword afterwards.

ignaciodamiang · 2022-09-23T12:45:12Z

im getting this error, the same as @Jerick5555 .

Evaluating...
list index out of range
Image objects data unpacking failed.

I've proved in a virtual machine and I'm getting the same error. It's very strange because yesterday I used the program and it worked fine... if anyone comes up with something let me know.

Btw I have Ubuntu 22.

Update:

I executed the test provided in the project like this:
python3 -m unittest test_google_images_download.py
and obtained this output:
Looks like we cannot locate the path the 'chromedriver' (use the '--chromedriver' argument to specify the path to the executable.) or google chrome browser is not installed on your machine (exception: Message: 'chromedriver.exe' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home

Joeclinton1 · 2022-09-23T16:28:28Z

@mrclean789
@Jerick5555
@ignaciodamiang

I ran test_google_images_downloads.py and was able to reproduce the error. Thank you for alerting me!
The problem occurs for both <100 and >100 images.

The issue is likely caused by google once again changing the way they format their image object array.
I'll try to fix the issue when I have more time.

ignaciodamiang · 2022-09-23T16:47:37Z

Great. I hope you find the time. If I knew Python I would try to fix it. Thank you!

upstream updates

ellisbrown · 2022-09-23T23:13:28Z

@mrclean789 @Jerick5555 @ignaciodamiang

I ran test_google_images_downloads.py and was able to reproduce the error. Thank you for alerting me! The problem occurs for both <100 and >100 images.

The issue is likely caused by google once again changing the way they format their image object array. I'll try to fix the issue when I have more time.

@Joeclinton1 looks like they changed it. I found the issue and am fixing it, I'll raise a PR.

Update: Joeclinton1#26

eamonnkenny · 2022-09-30T08:27:59Z

It seems that the download list is always empty now since yesterday or the day before. This is using the joe clinton version. It was working for quite some time with some strange periodic problems that would occur for 1/2 a day at a time and then disappear, but since yesterday no search term has downloaded anything for me. Are others finding this?

ellisbrown · 2022-09-30T16:51:27Z

It seems that the download list is always empty now since yesterday or the day before. This is using the joe clinton version. It was working for quite some time with some strange periodic problems that would occur for 1/2 a day at a time and then disappear, but since yesterday no search term has downloaded anything for me. Are others finding this?

@eamonnkenny see #298 (comment)

fix breaking change due to google's response format

modikush80 · 2022-10-12T18:11:21Z

Getting this error , did anyone find solution to this?

Evaluating...
'NoneType' object is not subscriptable
Image objects data unpacking failed.

tallevy22 · 2022-10-23T14:01:58Z

is there a way to encode the returned metadata, I get \u05de\u05d3\u05d5\u05d6\u05d4 \u05d7\u05d5\u05e3 \u05d0\u05e9\u05d3\u05d5\u05d3 instead of Hebrew, i tried adding
in lines 1130
json_file = open("logs/" + search_keyword[i] + ".json", "w",encoding="utf-8")
json.dump(items, json_file, indent=4, sort_keys=True, ensure_ascii=False)
json_file.close()
but it didn't help

Jerick5555 · 2022-10-27T08:44:34Z

Evaluating...
'NoneType' object is not subscriptable
Image objects data unpacking failed.

Got this error too

Jerick5555 · 2022-10-28T00:53:01Z

Evaluating... 'NoneType' object is not subscriptable Image objects data unpacking failed.

Got this error too

nvm, i updated to the latest version and it is working now. @modikush80

Jerick5555 · 2022-10-28T00:54:03Z

just pull from the repo and do the setup again

Joeclinton1 · 2023-02-23T23:52:58Z

As of currently, I think google has changed their JSON again and it no longer works. I am currently very busy and have not had a chance to fix it, but on the github there are a few PR's which claim to have fixed the problem: https://github.com/Joeclinton1/google-images-download/pulls

I will test these at some point, but in the mean time if you need it to work you may consider one of their forks. If it works for you please tell me and I'll just merge their fork.

Thank you for your understanding.

galantra · 2023-02-24T13:28:48Z

I've tried Joeclinton1#35 and it works for me (using it as part of https://github.com/galantra/FluentForeverVocabBuilder/)

ellisbrown · 2023-02-27T16:49:42Z

I am working on a project that depends heavily on this functionality. I refactored it and am maintaining it here https://github.com/ellisbrown/google-images-download/tree/wrapperless if it helps anyone

copperwiring · 2024-03-20T12:09:07Z

Doesnt work for me

what are the correct instructions to use the updated version? I used the following:

git clone https://github.com/d0codesoft/google-images-download.git
cd google-images-download
git checkout patch-1
python setup.py install
googleimagesdownload -k "children in park" -l 10

I get


Item no.: 1 --> Item name = children in park
Evaluating...
Starting Download...
'NoneType' object is not subscriptable
Traceback (most recent call last):
  File "/Users/srishtiy/anaconda3/bin/googleimagesdownload", line 33, in <module>
    sys.exit(load_entry_point('google-images-download==2.8.0', 'console_scripts', 'googleimagesdownload')())
  File "/Users/srishtiy/anaconda3/lib/python3.10/site-packages/google_images_download-2.8.0-py3.10.egg/google_images_download/google_images_download.py", line 1167, in main
  File "/Users/srishtiy/anaconda3/lib/python3.10/site-packages/google_images_download-2.8.0-py3.10.egg/google_images_download/google_images_download.py", line 971, in download
  File "/Users/srishtiy/anaconda3/lib/python3.10/site-packages/google_images_download-2.8.0-py3.10.egg/google_images_download/google_images_download.py", line 1119, in download_executor
  File "/Users/srishtiy/anaconda3/lib/python3.10/site-packages/google_images_download-2.8.0-py3.10.egg/google_images_download/google_images_download.py", line 907, in _get_all_items
TypeError: 'NoneType' object is not subscriptable

ellisbrown · 2024-03-20T17:30:51Z

@copperwiring see my above comment for a working fork. the following worked for me just now:

git clone [email protected]:ellisbrown/google-images-download.git

cd google-images-download

pip install .

python tests/test_google_images_download.py --limit 10

landing-insights-bot reviewed Feb 5, 2020

View reviewed changes

RiddlerQ mentioned this pull request Feb 6, 2020

Cannot Find Images for this Search Filter #280

Open

Joeclinton1 added 3 commits February 9, 2020 16:53

Fixed None type

66f69d6

By filtering out the image objects which had data[0]==2, I have removed the null items and it will no longer give the error: "TypeError: 'NoneType' object is not subscriptable".

Merge branch 'patch-1' into master

8b794e0

Update google_images_download.py

a36a378

Joeclinton1 mentioned this pull request Feb 9, 2020

Reverse image feature not working any more #297

Open

Fix more none type errors

fbc4a16

This system is not very flexible, it seems google does not keep the same positions of target items, so sometimes it doens't work. I added a try-except just in case there are more problems

hodsonus mentioned this pull request Feb 11, 2020

error when run the example code with vpn #300

Open

yashshahdata approved these changes Feb 13, 2020

View reviewed changes

gblue1223 mentioned this pull request Feb 14, 2020

Unfortunately all 20 could not be downloaded because some images were not downloadable #301

Open

adekmaulana mentioned this pull request Feb 28, 2020

scrappers: .img: fallback to temporary solution mkaraniya/OpenUserBot#16

Merged

Merge pull request #1 from Joeclinton1/patch-1

945aeff

upstream updates

ellisbrown added 5 commits September 23, 2022 23:31

fix breaking change due to google's response format

dffca08

update error message to point to this PR

3f58a9a

fix chromium downloads

219b850

fix again after new update 9/26

1421a43

revert rollback from 9/26

2e117f3

Merge pull request #26 from ellisbrown/patch-1

e91e6a3

fix breaking change due to google's response format

NicolasGrosjean mentioned this pull request Oct 5, 2022

Could not open URL #366

Open

Joeclinton1 requested a review from landing-insights-bot November 6, 2022 15:38

NicolasGrosjean mentioned this pull request Nov 13, 2022

Does not work with any search terms. #370

Open

Fixed issue with links not being found #298

Are you sure you want to change the base?

Fixed issue with links not being found #298

Conversation

Joeclinton1 commented Feb 5, 2020 • edited Loading

landing-insights-bot left a comment

Choose a reason for hiding this comment

Joeclinton1 commented Feb 5, 2020 • edited Loading

landing-insights-bot commented Feb 5, 2020 via email

MarlonHie commented Feb 6, 2020

vk379 commented Feb 8, 2020

Rian-T commented Feb 8, 2020 • edited Loading

Joeclinton1 commented Feb 9, 2020

greg-oz commented Feb 10, 2020

hodsonus commented Feb 10, 2020 • edited Loading

edgabaldi commented Feb 10, 2020

codefreak404 commented Feb 10, 2020

Joeclinton1 commented Feb 11, 2020 • edited Loading

decidev22 commented Feb 18, 2020

hodsonus commented Feb 18, 2020

seth814 commented Feb 18, 2020 • edited Loading

RetroSeasons commented Aug 21, 2022 • edited Loading

mrclean789 commented Aug 24, 2022

mrclean789 commented Aug 26, 2022 • edited Loading

Jerick5555 commented Sep 23, 2022

ignaciodamiang commented Sep 23, 2022 • edited Loading

Joeclinton1 commented Sep 23, 2022 • edited Loading

ignaciodamiang commented Sep 23, 2022

ellisbrown commented Sep 23, 2022 • edited Loading

eamonnkenny commented Sep 30, 2022

ellisbrown commented Sep 30, 2022

modikush80 commented Oct 12, 2022

tallevy22 commented Oct 23, 2022

Jerick5555 commented Oct 27, 2022

Jerick5555 commented Oct 28, 2022

Jerick5555 commented Oct 28, 2022

Joeclinton1 commented Feb 23, 2023

galantra commented Feb 24, 2023

ellisbrown commented Feb 27, 2023

copperwiring commented Mar 20, 2024

ellisbrown commented Mar 20, 2024

Joeclinton1 commented Feb 5, 2020 •

edited

Loading

Joeclinton1 commented Feb 5, 2020 •

edited

Loading

Rian-T commented Feb 8, 2020 •

edited

Loading

hodsonus commented Feb 10, 2020 •

edited

Loading

Joeclinton1 commented Feb 11, 2020 •

edited

Loading

seth814 commented Feb 18, 2020 •

edited

Loading

RetroSeasons commented Aug 21, 2022 •

edited

Loading

mrclean789 commented Aug 26, 2022 •

edited

Loading

ignaciodamiang commented Sep 23, 2022 •

edited

Loading

Joeclinton1 commented Sep 23, 2022 •

edited

Loading

ellisbrown commented Sep 23, 2022 •

edited

Loading