New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: blog How to scrape Bluesky with Python #2784

Draft

Mantisus wants to merge 10 commits into apify:master from Mantisus:blog-bsky-crawler

Contributor

Mantisus commented Dec 28, 2024

new draft @souravjain540

Mantisus added 5 commits

December 28, 2024 03:58


          docs: blog How to scrape Bluesky with Python

7f92da5


          Merge branch 'master' into blog-bsky-crawler

9290ba4


          update folder

ef96744


          fix image text

00e0a47


          add section for apify actor

8d00ab3

souravjain540 reviewed

View reviewed changes

Collaborator

souravjain540 left a comment

Pretty nice one. Add a few comments.

Also please follow this: https://www.notion.so/apify/Apify-tone-and-style-cheat-sheet-0fe6873372e44d88a1bd029d5fd76cea

Basic rules for writing, A is big in Actor always, never use Title case in titles when writing for Apify i.e, Making An BlueSky Actor Using Crawlee -> Making an BlueSky Actor using Crawlee.

and few more attached. please fix all of them.

And also please try always to link to the relevant docs/blog/resources by Apify or Crawlee wherever possible.

Missing section for GitHub Star CTA too.

website/blog/2025/02-12-scrape-bluesky-using-python/index.md Outdated Show resolved Hide resolved

website/blog/2025/02-12-scrape-bluesky-using-python/index.md Show resolved Hide resolved

website/blog/2025/02-12-scrape-bluesky-using-python/index.md Outdated


		### Project setup

		1. If you don't have UV installed yet, follow the [guide](https://docs.astral.sh/uv/getting-started/installation/) or use this command:

Collaborator

souravjain540 Feb 17, 2025

a little about UV?

Collaborator

souravjain540 Feb 17, 2025

i mean i dont know as a new reader what it is and why we need it

website/blog/2025/02-12-scrape-bluesky-using-python/index.md Outdated Show resolved Hide resolved

website/blog/2025/02-12-scrape-bluesky-using-python/index.md Outdated

Comment on lines 53 to 57

+              When first exploring Bluesky, it might be disconcerting to find that the [main page](https://bsky.app/) lacks a search function without authentication. The same applies when trying to access individual [posts](https://bsky.app/profile/github-trending-js.bsky.social/post/3ldbe7b3ict2v).
+              Even if you navigate directly to the [search page](https://bsky.app/search?q=apify), while you'll see data, you'll encounter a limitation - the site doesn't allow viewing results beyond the first page.
+              Fortunately, Bluesky provides a well-documented [API](https://docs.bsky.app/docs/get-started) that's accessible to any registered user without additional permissions. This is what we'll use for data collection.

Collaborator

souravjain540 Feb 17, 2025

maybe adding screenshots will explain more

website/blog/2025/02-12-scrape-bluesky-using-python/index.md Outdated Show resolved Hide resolved

website/blog/2025/02-12-scrape-bluesky-using-python/index.md Outdated


		### 5. Saving data to files

		For saving results, we'll use the `write_to_json` method in Dataset.

Collaborator

souravjain540 Feb 17, 2025

link to the method doc

website/blog/2025/02-12-scrape-bluesky-using-python/index.md

		---

		[Bluesky](https://bsky.app/) is an emerging social network developed by former members of the [Twitter](https://x.com/) development team. The platform has been showing significant growth recently, reaching 132.9 million visits according to [SimilarWeb](https://www.similarweb.com/website/bsky.app/#traffic). Like Twitter, Bluesky generates a vast amount of data that can be used for analysis. In this article, we'll explore how to collect this data using [Crawlee for Python](https://github.com/apify/crawlee-python).

Collaborator

souravjain540 Feb 17, 2025

missing part where you list all the sections of the blog from intro to making an Actor.

website/blog/2025/02-12-scrape-bluesky-using-python/index.md Outdated


		![Users Example](./img/users.webp)

		## Create Apify actor for Bluesky crawler

Collaborator

souravjain540 Feb 17, 2025

link to what is an Actor, and a little explanation why are we making Actor, because its the easiest way to deploy a software on cloud, etc, etc.

also loved it :)

website/blog/2025/02-12-scrape-bluesky-using-python/index.md

		View results in the Dataset:

		![Dataset Results](img/actor_results.webp)

Collaborator

souravjain540 Feb 17, 2025

maybe also show how to publish it on Apify Store

Mantisus and others added 5 commits

February 17, 2025 20:18


          Update website/blog/2025/02-12-scrape-bluesky-using-python/index.md

ea3fb7b

Co-authored-by: Saurav Jain <[email protected]>


          Update website/blog/2025/02-12-scrape-bluesky-using-python/index.md

Co-authored-by: Saurav Jain <[email protected]>


          Update website/blog/2025/02-12-scrape-bluesky-using-python/index.md

cb73315

Co-authored-by: Saurav Jain <[email protected]>


          Update website/blog/2025/02-12-scrape-bluesky-using-python/index.md

e8a2fc9

Co-authored-by: Saurav Jain <[email protected]>


          update bluesky article

f7a4fac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet