Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for downloading from collections (folders, favourites, and galleries) #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shemetz
Copy link

@shemetz shemetz commented Oct 9, 2019

No description provided.

@kent-lee
Copy link
Owner

kent-lee commented Oct 13, 2019

@itamarcu thank you for your contribution; I am very glad that you took the time to improve the functionality of this program :)

I have tested the code and it is working as expected, but there are a few minor problems:

  • the parser function seems incomplete
  • the feature to download a particular gallery folder is somewhat already implemented. In the user_artworks function, I have set the folder to All by default, but it can be modified to accept parameters that point to other gallery folders
  • the output folder structure need some changes for the new collection feature to work properly (please see here for the issue)

Due to the above issues, I will change some of your codes after merging the pull request. Please let me know if this is okay for you. Also, for the last point, I have made some design choices on the issue, please have a look at here for more details. I am most likely going to implement version 2, but am open to any suggestion.

Thank you.

@chrsmlls333
Copy link

Regarding your design doc, I agree that version 2 is the best approach, but perhaps favourites collections could go in a subfolder separate from the gallery files, under each User! They too have a kind of "All" folder which could interfere with the "All" gallery. This also allows for distinct separation of favs and works.

@kent-lee
Copy link
Owner

If you like distinction between favs and works, then version 3 is actually better. The main reasons I am leaning towards version 2 are:

  1. it requires minimal changes to the current code, so it would be faster to implement
  2. I assume that people want to download all artworks from a given user, so there is no point in providing options to choose other gallery folders as you are downloading everything

I don't know if this assumption works for the majority of people, so in version 3 and 4, I provide the option to select which gallery folder to download. This has the best output file structure in my opinion, but there are two problems:

  1. as pointed in the doc, it would require a lot more work than version 2
  2. suppose you want to download the artworks in gallery folders A and B from a user, some of the images in folder A may exist in folder B and vice versa, meaning you are downloading duplicate files. This may not be desirable and is especially bad if the gallery folder is All, because then you will be duplicating most, if not all of the files across all other gallery folders

@shemetz
Copy link
Author

shemetz commented Oct 14, 2019

I am okay with any changes to my code - this is open source after all :)

The use case that the collection-downloading feature is trying to solve is, well, downloading specific collections. Many users (like me) just want to download a specific collection of images - usually either their favorite images or a list of images that fit into a certain theme.

People who are using this feature will probably not want to get extra artworks that they didn't ask for (would only make the process longer and require more storage space). For example, this user has a collection called "Landscapes" with about 40 pictures in it, but the user has many many other collections, so their "all" folder has nearly 10000 pictures!

Therefore, whatever approach you pick, it is quite important that you allow downloading specific collections without downloading all of the artworks/collections of a particular user.

I'll slightly prefer versions 1 and 2, but no strong preference. There is however an extra option - you could have it be like version 1 except the collection names are prefixed by the username. for example, "souveraines - Landscapes". It's not a huge difference from just having a "Landscapes" folder within a "souveraines" folder, though.

@chrsmlls333
Copy link

The main downside of version 2 is the lack of labeling in the filesystem, in this case version 3 is way better, you're right. Based on @itamarcu, I don't think you can assume everyone wants to download all, all the time, so perhaps some folder handling and sorting in the filesystem is necessary.

I would recommend you don't overwhelm yourself and parse all separate galleries by default. Like you say, duplicates and sorting becomes complex. If unspecified in the config or command line, the files could be saved to where they are now or User/Gallery-All/file.jpg? This may allow you to keep your current functionality as the default. Otherwise you risk a user dumping multiple galleries into the root folder for that user and everything getting mixed up.

Regarding config, perhaps this is a good structure?

{
    "save_directory": "D:\\Pictures\\deviantart",
    "users": [
        "GUWEIZ": {
             "galleries": [ "Landscapes" ],
             "collections": [ "All" ]
        },
        "wataboku"
    ]
}

Then wataboku downloads all galleries and no collections by default, the behaviour you already have, and the other is self explanatory. You could check to see if each user in the list is a simple string or object, so they can be input simply or with more definition.

@kent-lee
Copy link
Owner

kent-lee commented Oct 15, 2019

Sorry if I wasn't clear, but when I talked about collection folders and gallery folders, those two mean different things.

Collection folders are the folders under FAVOURITES tab on the website. I think it makes sense to download user specified folders and not all collection folders; hence in all versions in the design doc, the collection folders have names collection A, collection B, etc, indicating that they are specific collection folders provided by the users. For example:

save directory
├── souveraines
│   ├── Landscapes
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   ├── image3.jpg
│   │   ...
│   ├── Characters
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   ├── image3.jpg
│   │   ...
│   ...
...

Gallery folders are the folders under GALLERY tab on the website. The original program is set to download the gallery folder All by default; hence in the design doc version 1 and 2, there is no folders like gallery A, gallery B in user A folder, because there is no option to download other specific gallery folders.

save directory
├── souveraines
│   ├── Landscapes
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   ├── image3.jpg
│   │   ...
│   ├── Characters
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   ├── image3.jpg
│   │   ...
│   ├── image1.jpg ──┐
│   ├── image2.jpg ──┼── # these are artworks in gallery all folder
│   ├── image3.jpg ──┘
│   ...
...

So the question I had was that I am not sure if I should allow users to download specific gallery folders, as it adds more complexity to the program and has the duplicate file problem I mentioned before.

As for the potential file structure for version 3 suggested by @chrsmlls333, it looks good to me, but I would probably keep all users consistent like so:

{
    "save_directory": "D:\\Pictures\\deviantart",
    "users": [
        "souveraines": {
             "galleries": [ ],
             "collections": [ "Landscapes" ]
        },
        "wataboku": {
             "galleries": [ "All" ],
             "collections": [ ]
        }
    ]
}

However, with this approach, I am have some difficulties deciding the command line input. For example, what should be the input for adding a user? Something like these?

python main.py artwork -a wataboku-All
python main.py collection -a souveraines-Landscapes

What if the folder name contains spaces or dashes? Is the above proposed json file structure too inconvenient to edit manually?

@shemetz
Copy link
Author

shemetz commented Oct 15, 2019 via email

@chrsmlls333
Copy link

chrsmlls333 commented Oct 16, 2019

I think the key to convenience is having fallback behavior. So accept python main.py collection -a souveraines:Landscapes and tokenize based on colons (or another char not allowed in DeviantArt usernames)
or
python main.py collection -a souveraines which is equivalent to "All"

I understand being wary of too much functionality but to accept individual collections and not individual galleries seems very counter-intuitive. And if you are worried about file duplication, the solution is straightforward, keep "All" in the user root or its own subfolder, and then make subfolders for each gallery. So use version 2 when galleries are not specified and version 3 when they are. This seems the most alike to other scraper tools like dagr.py that have fallen off the wayside recently.

The JSON you specify (more consistent) seems very logical and well arranged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants