Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: blog How to scrape Bluesky with Python #2784

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Mantisus
Copy link
Contributor

new draft @souravjain540

Copy link
Collaborator

@souravjain540 souravjain540 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-02-17 at 11 42 01 PM
Pretty nice one. Add a few comments.

Also please follow this: https://www.notion.so/apify/Apify-tone-and-style-cheat-sheet-0fe6873372e44d88a1bd029d5fd76cea

Basic rules for writing, A is big in Actor always, never use Title case in titles when writing for Apify i.e, Making An BlueSky Actor Using Crawlee -> Making an BlueSky Actor using Crawlee.

and few more attached. please fix all of them.

And also please try always to link to the relevant docs/blog/resources by Apify or Crawlee wherever possible.

Missing section for GitHub Star CTA too.


### Project setup

1. If you don't have UV installed yet, follow the [guide](https://docs.astral.sh/uv/getting-started/installation/) or use this command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little about UV?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean i dont know as a new reader what it is and why we need it

Comment on lines 53 to 57
When first exploring Bluesky, it might be disconcerting to find that the [main page](https://bsky.app/) lacks a search function without authentication. The same applies when trying to access individual [posts](https://bsky.app/profile/github-trending-js.bsky.social/post/3ldbe7b3ict2v).

Even if you navigate directly to the [search page](https://bsky.app/search?q=apify), while you'll see data, you'll encounter a limitation - the site doesn't allow viewing results beyond the first page.

Fortunately, Bluesky provides a well-documented [API](https://docs.bsky.app/docs/get-started) that's accessible to any registered user without additional permissions. This is what we'll use for data collection.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adding screenshots will explain more


### 5. Saving data to files

For saving results, we'll use the `write_to_json` method in Dataset.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to the method doc

---

[Bluesky](https://bsky.app/) is an emerging social network developed by former members of the [Twitter](https://x.com/) development team. The platform has been showing significant growth recently, reaching 132.9 million visits according to [SimilarWeb](https://www.similarweb.com/website/bsky.app/#traffic). Like Twitter, Bluesky generates a vast amount of data that can be used for analysis. In this article, we'll explore how to collect this data using [Crawlee for Python](https://github.com/apify/crawlee-python).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing part where you list all the sections of the blog from intro to making an Actor.


![Users Example](./img/users.webp)

## Create Apify actor for Bluesky crawler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to what is an Actor, and a little explanation why are we making Actor, because its the easiest way to deploy a software on cloud, etc, etc.

also loved it :)

View results in the Dataset:

![Dataset Results](img/actor_results.webp)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also show how to publish it on Apify Store

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants