Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge: Download user favorites (and create recommendations) #2

Open
DonaldTsang opened this issue Jun 19, 2019 · 6 comments
Open

Comments

@DonaldTsang
Copy link

DonaldTsang commented Jun 19, 2019

  1. Given a user account, or a user's collection, download all of the images within the favorites/collection
    • For user favorites https://www.deviantart.com/<user>/favourites/
    • For user collections https://www.deviantart.com/<user>/favourites/<collectionID>
    • create an API that output a JSON list of favorites URLs, along with artist name/ID based on these two types, along with a marker that states which favorite/collection it is from
  2. All images downloaded should have retained its metadata, including:
    • Name of the piece and tags (ease of search)
    • Description (good for finding collaborations and descriptions)
    • Artist name/ID (will be important for point no.3)
    • The metadata as a JSON file should be stored in pairs along side the image
    • The artist/name ID should be easily accessible through the filename or some other means
  3. Given a user account, or a set of multiple user accounts, recommend a list of artists based on one of these criteria:
    • quantity based (the user(s) favs/collection contains more than X amount of art from artist Y)
    • percentage based (the user(s) favs/collections contains more than X% of art from artist Y)
    • Other strategy that is applicable
    • create an API that outputs a JSON list of recommended artists based on the data, along with a marker listing the input of user or list of users
  4. Given a user account, or a set of multiple user accounts, find as many artist as possible, and draw a network diagram of user favorites
    • there will be two variables, X and N (X in point no.3, N for depth)
    • the system will find all artists that is N artists "away" from the main search group
    • e.g. if A like artist B, B like artist C, then B is 1-away, C is 2-away
    • (extra credit) Use Matplotlib + NetworkX to achieve the result
    • (extra credit) cluster different artists in the network into sub-groups
    • create an API that outputs a list of all nodes (users/artists/collections containing list of all art pieces) and connections (favorites, which has an artist, a faver, and a list of art pieces that is being liked)
@kent-lee
Copy link
Owner

@DonaldTsang

First of all, thank you for suggesting these interesting challenges! I have looked into it already and implemented some codes for (1) and (2). However, I found that my program was producing inconsistent results. For example, the regex for certain elements (e.g. the validation token csrf) cannot be found at times.

Upon further investigation, I realized that the problems were caused by the new UI changes that has been rolling out recently. For more details, please have a look at the readme of this repository.

Due to the above issue, I have decided to pause any development of this project for now until the new UI is stable and released publicly. At the meantime, I will look into pixiv_scraper. Thank you again for these challenges and I will certainly return to them when the new UI is available.

@DonaldTsang
Copy link
Author

@kent-lee thanks for understanding the issue, and I respect that you paused dA until the UI is "safe".
Do you think such a feature would be useful for art discovery? And do you think APIs are useful?

Also, on a side note: Do you have a Discord account? If not, Matrix/IRC handle?

@DonaldTsang
Copy link
Author

@kent-lee please check the DMs in Discord, there are some hints as to how the UI changes can be fixed.

@kent-lee
Copy link
Owner

kent-lee commented Sep 3, 2019

@DonaldTsang sorry for the long wait; I have been busy with other stuff lately. Anyways, the program is now updated for the new UI, so it should work like before. I will look into your suggestions soon (if there is no other major things happening in real life that is). Thank you.

@DonaldTsang
Copy link
Author

Don't worry, hope that you are doing well in school/work. The programs is starting to take form.
Also be aware that the Pixiv scraper's results should be similar to dA for ease of cross-checking.

@DonaldTsang
Copy link
Author

DonaldTsang commented Dec 13, 2019

And even better, is that now you can scrape people's "watching" list from the new "Eclipse"/dark mode!
https://www.deviantart.com/<username>/about#watching e.g. https://www.deviantart.com/tonibabelony/about#watching and repeatedly clicking the <button> within <div> inside <div> inside <div id="watching">, then get all the <a> inside <span> inside <div> and <div> and <div> before inside the same <div id="watching">
From that we can pull some tricks from "Twitter Following Graphs" (who follows who on Twitter) and rank people based on what they liked (link prediction and community detection).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants