-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added support for downloading from collections (folders, favourites, and galleries) #5
base: master
Are you sure you want to change the base?
Conversation
@itamarcu thank you for your contribution; I am very glad that you took the time to improve the functionality of this program :) I have tested the code and it is working as expected, but there are a few minor problems:
Due to the above issues, I will change some of your codes after merging the pull request. Please let me know if this is okay for you. Also, for the last point, I have made some design choices on the issue, please have a look at here for more details. I am most likely going to implement version 2, but am open to any suggestion. Thank you. |
Regarding your design doc, I agree that version 2 is the best approach, but perhaps favourites collections could go in a subfolder separate from the gallery files, under each User! They too have a kind of "All" folder which could interfere with the "All" gallery. This also allows for distinct separation of favs and works. |
If you like distinction between favs and works, then version 3 is actually better. The main reasons I am leaning towards version 2 are:
I don't know if this assumption works for the majority of people, so in version 3 and 4, I provide the option to select which gallery folder to download. This has the best output file structure in my opinion, but there are two problems:
|
I am okay with any changes to my code - this is open source after all :) The use case that the collection-downloading feature is trying to solve is, well, downloading specific collections. Many users (like me) just want to download a specific collection of images - usually either their favorite images or a list of images that fit into a certain theme. People who are using this feature will probably not want to get extra artworks that they didn't ask for (would only make the process longer and require more storage space). For example, this user has a collection called "Landscapes" with about 40 pictures in it, but the user has many many other collections, so their "all" folder has nearly 10000 pictures! Therefore, whatever approach you pick, it is quite important that you allow downloading specific collections without downloading all of the artworks/collections of a particular user. I'll slightly prefer versions 1 and 2, but no strong preference. There is however an extra option - you could have it be like version 1 except the collection names are prefixed by the username. for example, "souveraines - Landscapes". It's not a huge difference from just having a "Landscapes" folder within a "souveraines" folder, though. |
The main downside of version 2 is the lack of labeling in the filesystem, in this case version 3 is way better, you're right. Based on @itamarcu, I don't think you can assume everyone wants to download all, all the time, so perhaps some folder handling and sorting in the filesystem is necessary. I would recommend you don't overwhelm yourself and parse all separate galleries by default. Like you say, duplicates and sorting becomes complex. If unspecified in the config or command line, the files could be saved to where they are now or Regarding config, perhaps this is a good structure? {
"save_directory": "D:\\Pictures\\deviantart",
"users": [
"GUWEIZ": {
"galleries": [ "Landscapes" ],
"collections": [ "All" ]
},
"wataboku"
]
} Then wataboku downloads all galleries and no collections by default, the behaviour you already have, and the other is self explanatory. You could check to see if each user in the list is a simple string or object, so they can be input simply or with more definition. |
Sorry if I wasn't clear, but when I talked about collection folders and gallery folders, those two mean different things. Collection folders are the folders under save directory
├── souveraines
│ ├── Landscapes
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ ...
│ ├── Characters
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ ...
│ ...
... Gallery folders are the folders under save directory
├── souveraines
│ ├── Landscapes
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ ...
│ ├── Characters
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ ...
│ ├── image1.jpg ──┐
│ ├── image2.jpg ──┼── # these are artworks in gallery all folder
│ ├── image3.jpg ──┘
│ ...
... So the question I had was that I am not sure if I should allow users to download specific gallery folders, as it adds more complexity to the program and has the duplicate file problem I mentioned before. As for the potential file structure for version 3 suggested by @chrsmlls333, it looks good to me, but I would probably keep all {
"save_directory": "D:\\Pictures\\deviantart",
"users": [
"souveraines": {
"galleries": [ ],
"collections": [ "Landscapes" ]
},
"wataboku": {
"galleries": [ "All" ],
"collections": [ ]
}
]
} However, with this approach, I am have some difficulties deciding the command line input. For example, what should be the input for adding a python main.py artwork -a wataboku-All python main.py collection -a souveraines-Landscapes What if the folder name contains spaces or dashes? Is the above proposed |
Can't users wrap arguments with quotation marks to handle spaces?
…On Tue, Oct 15, 2019, 09:53 Kent Lee ***@***.***> wrote:
Sorry if I wasn't clear, but when I talked about collection folders and
gallery folders, those two mean different things.
Collection folders are the folders under FAVOURITES tab on the website. I
think it makes sense to download user specified folders and not all
collection folders; hence in all versions in the design doc, the collection
folders have names collection A, collection B, etc, indicating that they
are specific collection folders provided by the users. For example:
save directory
├── souveraines
│ ├── Landscapes
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ ...
│ ├── Characters
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ ...
│ ...
...
Gallery folders are the folders under GALLERY tab on the website. The
original program is set to download the gallery folder All by default;
hence in the design doc version 1 and 2, there is no folders like gallery
A, gallery B in user A folder, because there is no option to download
other specific gallery folders.
save directory
├── souveraines
│ ├── Landscapes
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ ...
│ ├── Characters
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ ├── image3.jpg
│ │ ...
│ ├── image1.jpg ──┐
│ ├── image2.jpg ──┼── # these are artworks in gallery all folder
│ ├── image3.jpg ──┘
│ ...
...
So the question I had was that I am not sure if I should allow users to
download specific gallery folders, as it adds more complexity to the
program and will have the duplicate file problem I mentioned before.
As for the potential file structure for version 3 suggested by
@chrsmlls333 <https://github.com/chrsmlls333>, it looks good to me, but I
would probably keep all users consistent like so:
{
"save_directory": "D:\\Pictures\\deviantart",
"users": [
"souveraines": {
"galleries": [ ],
"collections": [ "Landscapes" ]
},
"wataboku": {
"galleries": [ "All" ],
"collections": [ ]
}
]
}
However, with this approach, I am have some difficulties deciding the
command line input. For example, what should be the input for adding a
user? Something like these? What if the folder name contains spaces or
dashes?
python main.py artwork -a wataboku-All
python main.py collection -a souveraines-Landscapes
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABRW7DINJHTQCBX3CUEUMSLQOVSFXANCNFSM4I7BPDYA>
.
|
I think the key to convenience is having fallback behavior. So accept I understand being wary of too much functionality but to accept individual collections and not individual galleries seems very counter-intuitive. And if you are worried about file duplication, the solution is straightforward, keep "All" in the user root or its own subfolder, and then make subfolders for each gallery. So use version 2 when galleries are not specified and version 3 when they are. This seems the most alike to other scraper tools like dagr.py that have fallen off the wayside recently. The JSON you specify (more consistent) seems very logical and well arranged. |
No description provided.