-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: blog How to scrape Bluesky with Python #2784
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty nice one. Add a few comments.
Also please follow this: https://www.notion.so/apify/Apify-tone-and-style-cheat-sheet-0fe6873372e44d88a1bd029d5fd76cea
Basic rules for writing, A is big in Actor always, never use Title case in titles when writing for Apify i.e, Making An BlueSky Actor Using Crawlee -> Making an BlueSky Actor using Crawlee.
and few more attached. please fix all of them.
And also please try always to link to the relevant docs/blog/resources by Apify or Crawlee wherever possible.
Missing section for GitHub Star CTA too.
|
||
### Project setup | ||
|
||
1. If you don't have UV installed yet, follow the [guide](https://docs.astral.sh/uv/getting-started/installation/) or use this command: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little about UV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean i dont know as a new reader what it is and why we need it
When first exploring Bluesky, it might be disconcerting to find that the [main page](https://bsky.app/) lacks a search function without authentication. The same applies when trying to access individual [posts](https://bsky.app/profile/github-trending-js.bsky.social/post/3ldbe7b3ict2v). | ||
|
||
Even if you navigate directly to the [search page](https://bsky.app/search?q=apify), while you'll see data, you'll encounter a limitation - the site doesn't allow viewing results beyond the first page. | ||
|
||
Fortunately, Bluesky provides a well-documented [API](https://docs.bsky.app/docs/get-started) that's accessible to any registered user without additional permissions. This is what we'll use for data collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe adding screenshots will explain more
|
||
### 5. Saving data to files | ||
|
||
For saving results, we'll use the `write_to_json` method in Dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to the method doc
--- | ||
|
||
[Bluesky](https://bsky.app/) is an emerging social network developed by former members of the [Twitter](https://x.com/) development team. The platform has been showing significant growth recently, reaching 132.9 million visits according to [SimilarWeb](https://www.similarweb.com/website/bsky.app/#traffic). Like Twitter, Bluesky generates a vast amount of data that can be used for analysis. In this article, we'll explore how to collect this data using [Crawlee for Python](https://github.com/apify/crawlee-python). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing part where you list all the sections of the blog from intro to making an Actor.
|
||
data:image/s3,"s3://crabby-images/1e153/1e153cb74bd4de3c6f5381f1d0719da26a0baa20" alt="Users Example" | ||
|
||
## Create Apify actor for Bluesky crawler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link to what is an Actor, and a little explanation why are we making Actor, because its the easiest way to deploy a software on cloud, etc, etc.
also loved it :)
View results in the Dataset: | ||
|
||
data:image/s3,"s3://crabby-images/c3831/c383112f5816682048a5335387cb13ee43353d2a" alt="Dataset Results" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also show how to publish it on Apify Store
Co-authored-by: Saurav Jain <[email protected]>
Co-authored-by: Saurav Jain <[email protected]>
Co-authored-by: Saurav Jain <[email protected]>
Co-authored-by: Saurav Jain <[email protected]>
new draft @souravjain540