Challenge for a Web Crawling Novice: How to Download Annual Reports from the Official Website of Bursa Malaysia #1687

laiyunhui · 2025-12-31T09:23:51Z

laiyunhui
Dec 31, 2025

Hello everyone!
I'm a Java developer who has recently tried writing a few web crawlers in Java to scrape news articles and official announcement documents. To be frank, I'm still a novice when it comes to web crawling. As a beginner, I can currently only manage to scrape basic information and files successfully. However, when running the crawler multiple times in succession, it will encounter temporary connection failures and be unable to proceed with scraping.
I've tried using some low-cost proxy pools, but the results have been less than satisfactory.
Now, on recommendation, I'm looking to experiment with the powerful tool crawl4AI to tackle a task in my development workflow: scraping and downloading annual reports from the official website of Bursa Malaysia (https://www.bursamalaysia.com/) for analytical purposes. However, I have no idea where to start with this.
If implementing this in Java, here are the steps I would follow:
Step 1: Call the list retrieval API endpoint

https://www.bursamalaysia.com/api/v1/announcements/search?ann_type=company&company=0129&keyword=&dt_ht=&dt_lt=&cat=AR%2CARCO&sub_type=&mkt=&sec=&subsec=&per_page=20&page=1&_=1767088639073
The 13-digit number at the very end of the URL is the current 13-digit timestamp.

This list API returns the following response:

{
"recordsTotal": 22,
"recordsFiltered": 22,
"category_message": "",
"data": [
[
1,
"\u003cdiv class='d-lg-none'\u003e31 Oct\u003cbr/\u003e2025\u003c/div\u003e\u003cdiv class='d-lg-inline-block d-none'\u003e31 Oct 2025\u003c/div\u003e",
"\u003ca href='/trade/trading_resources/listing_directory/company-profile?stock_code=0129' target=_blank\u003eSILVER RIDGE HOLDINGS BHD\u003c/a\u003e",
"\u003ca href='/market_information/announcements/company_announcement/announcement_details?ann_id=3604910' target=_blank\u003eAnnual Report \u0026 CG Report - 2025\u003c/a\u003e"
],
[
2,
"\u003cdiv class='d-lg-none'\u003e30 Oct\u003cbr/\u003e2024\u003c/div\u003e\u003cdiv class='d-lg-inline-block d-none'\u003e30 Oct 2024\u003c/div\u003e",
"\u003ca href='/trade/trading_resources/listing_directory/company-profile?stock_code=0129' target=_blank\u003eSILVER RIDGE HOLDINGS BHD\u003c/a\u003e",
"\u003ca href='/market_information/announcements/company_announcement/announcement_details?ann_id=3496498' target=_blank\u003eAnnual Report \u0026 CG Report - 2024\u003c/a\u003e"
]...
]
}

Step 2: Call the detail retrieval API endpoint

https://disclosure.bursamalaysia.com/FileAccess/viewHtml?e=3604910
Here, the parameter 3604910 corresponds to the ann_id=3604910 returned by the list API in Step 1.
This endpoint returns an HTML page, which contains the following snippet (with embedded download links):

Attachments

                            <a href='/FileAccess/apbursaweb/download?id=247312&name=EA_DS_ATTACHMENTS'>
                                Silver Ridge Bhd-2025.pdf</a>
                            <br />
                            <span class='FootNote'>1.7 MB</span>
                        </p>
                    	<p class='att_download_pdf'>
                    		
                            <a href='/FileAccess/apbursaweb/download?id=247313&name=EA_DS_ATTACHMENTS'>
                                SRHB-CG Report.pdf</a>
                            <br />
                            <span class='FootNote'>551.9 kB</span>
                        </p>

Step 3: Call the download API endpoint

https://disclosure.bursamalaysia.com/FileAccess/apbursaweb/download?id=247313&name=EA_DS_ATTACHMENTS
The URL for this download endpoint is constructed from the return value from Step 2: /FileAccess/apbursaweb/download?id=247312&name=EA_DS_ATTACHMENTS.

These three steps enable the querying and downloading of the corresponding annual reports. I'm wondering if crawl4AI offers a simpler, more streamlined solution for this task. Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Challenge for a Web Crawling Novice: How to Download Annual Reports from the Official Website of Bursa Malaysia #1687

Uh oh!

{{title}}

Uh oh!

Attachments

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Challenge for a Web Crawling Novice: How to Download Annual Reports from the Official Website of Bursa Malaysia #1687

Uh oh!

laiyunhui Dec 31, 2025

Attachments

Replies: 0 comments

laiyunhui
Dec 31, 2025