Skip to content

Conversation

@Luis-manzur
Copy link
Contributor

fix issue #1714

@Luis-manzur Luis-manzur requested a review from grossir December 18, 2025 21:02
@Luis-manzur Luis-manzur linked an issue Dec 18, 2025 that may be closed by this pull request
@Luis-manzur Luis-manzur moved this to PRs to Review in Case Law Sprint Dec 18, 2025
Copy link
Contributor

@grossir grossir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the court filter is not working

@grossir grossir assigned Luis-manzur and unassigned grossir Dec 18, 2025
Luis-manzur and others added 6 commits December 19, 2025 08:46
…ted-url' into 1714-masssuperct-changed-or-deleted-url

# Conflicts:
#	tests/examples/opinions/united_states/massappct_example.compare.json
#	tests/examples/opinions/united_states/masssuperct_example.compare.json
…ted-url' into 1714-masssuperct-changed-or-deleted-url
@Luis-manzur Luis-manzur assigned grossir and unassigned Luis-manzur Dec 19, 2025
Copy link
Contributor

@grossir grossir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Cleanup the example file, it has data it shouldn't have

  2. Have you tried running the scraper? I am getting 403, both locally and in the server. Seems like this will need further research, probably some intermediate requests to get cookies

@grossir grossir assigned Luis-manzur and unassigned grossir Dec 19, 2025
@Luis-manzur
Copy link
Contributor Author

Luis-manzur commented Dec 19, 2025

  1. Have you tried running the scraper? I am getting 403, both locally and in the server. Seems like this will need further research, probably some intermediate requests to get cookies

yes, is responding ok locally

I don't see any 403 messages in sentry
Screenshot 2025-12-19 at 4 46 16 PM

@Luis-manzur Luis-manzur assigned grossir and unassigned Luis-manzur Dec 19, 2025
@Luis-manzur Luis-manzur requested a review from grossir December 19, 2025 20:47
@grossir
Copy link
Contributor

grossir commented Dec 22, 2025

Not sure why it doesn't show up on Sentry (even when running in a ./manage.py shell) , but this happens on the server; and it happens locally for me too, with and without VPN. It's a Cloudflare block, btw

Same result with python sample_caller.py -c juriscraper.opinions.united_states.state.mass -vvv --save-responses

Is it not happening to you?

import requests

requests.get("https://www.socialaw.com/services/slip-opinions/", params={"Court":  "Supreme Judicial Court"})
requests.get("https://www.socialaw.com/services/slip-opinions/", params={"Court":  "Supreme Judicial Court", "Month": "July 2025"})
requests.get("https://www.socialaw.com/services/slip-opinions/", params={"Court":  "Supreme Judicial Court"}, headers={"User-Agent": "Juriscraper"})
image
<!DOCTYPE html><html lang="en-US"><head><title>Just a moment...</title><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta name="robots" content="noindex,nofollow"><meta name="viewport" content="width=device-width,initial-scale=1"><style>*{box-sizing:border-box;margin:0;padding:0}html{line-height:1.15;-webkit-text-size-adjust:100%;color:#313131;font-family:system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,"Noto Sans",sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol","Noto Color Emoji"}body{display:flex;flex-direction:column;height:100vh;min-height:100vh}.main-content{margin:8rem auto;padding-left:1.5rem;max-width:60rem}@media (width <= 720px){.main-content{margin-top:4rem}}.h2{line-height:2.25rem;font-size:1.5rem;font-weight:500}@media (width <= 720px){.h2{line-height:1.5rem;font-size:1.25rem}}#challenge-error-text{background-image:url("data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIzMiIgaGVpZ2h0PSIzMiIgZmlsbD0ibm9uZSI+PHBhdGggZmlsbD0iI0IyMEYwMyIgZD0iTTE2IDNhMTMgMTMgMCAxIDAgMTMgMTNBMTMuMDE1IDEzLjAxNSAwIDAgMCAxNiAzbTAgMjRhMTEgMTEgMCAxIDEgMTEtMTEgMTEuMDEgMTEuMDEgMCAwIDEtMTEgMTEiLz48cGF0aCBmaWxsPSIjQjIwRjAzIiBkPSJNMTcuMDM4IDE4LjYxNUgxNC44N0wxNC41NjMgOS41aDIuNzgzem0tMS4wODQgMS40MjdxLjY2IDAgMS4wNTcuMzg4LjQwNy4zODkuNDA3Ljk5NCAwIC41OTYtLjQwNy45ODQtLjM5Ny4zOS0xLjA1Ny4zODktLjY1IDAtMS4wNTYtLjM4OS0uMzk4LS4zODktLjM5OC0uOTg0IDAtLjU5Ny4zOTgtLjk4NS40MDYtLjM5NyAxLjA1Ni0uMzk3Ii8+PC9zdmc+");background-repeat:no-repeat;background-size:contain;padding-left:34px}@media (prefers-color-scheme: dark){body{background-color:#222;color:#d9d9d9}}</style><meta http-equiv="refresh" content="360"></head><body><div class="main-wrapper" role="main"><div class="main-content"><noscript><div class="h2"><span id="challenge-error-text">Enable JavaScript and cookies to continue</span></div></noscript></div></div><script>(function(){window._cf_chl_opt = {cvId: \'3\',cZone: \'www.socialaw.com\',cType: \'interactive\',cRay: \'9b213906b862ff01\',cH: \'r8bHjhiO0xr1FqVjJoK2UYJwI6gzlpKZmtYy2TkLfXM-1766423322-1.2.1.1-rteYpe94pSiZFICYCD4TPqdo5U5Ks.DcrJMmSN4eQHHuUu42sRmC19OvavedmUtr\',cUPMDTk:"\\/services\\/slip-opinions\\/?Court=Supreme+Judicial+Court&Month=July+2025&__cf_chl_tk=pFxGi66KgA40oFEraFj9Oe7tbuoRR7qU92bIZYziaBU-1766423322-1.0.1.1-ye0MYOmXIWONln1tclzJV83Bm8LAaNoKt2vIXiazdD8",cFPWv: \'g\',cITimeS: \'1766423322\',cTplC:0,cTplV:5,cTplB: \'0\',fa:"\\/services\\/slip-opinions\\/?Court=Supreme+Judicial+Court&Month=July+2025&__cf_chl_f_tk=pFxGi66KgA40oFEraFj9Oe7tbuoRR7qU92bIZYziaBU-1766423322-1.0.1.1-ye0MYOmXIWONln1tclzJV83Bm8LAaNoKt2vIXiazdD8",md: \'T.tUb2OB2_1Y1wYdZL2ZGvEqITkyt3YIg5dxp4B3A40-1766423322-1.2.1.1-Mv3agC4z3GSasvTJhEoe9fP7gTfEQ96s_Dh7sL1sWWAVVqzNfDxeS.NJhoKfca6MYlZPNw2.o7Al3otcUnR6_q9RcMbitUe36r304e9ZNbdfqEj8AnvG.VCPfY47oXSwGveQLKc0_pD3i9EbNhO5K2SpFq1J16ZST47tbakXsVqVNmZqduCLvVCilWOFUwO9WkhBacJmLHju6eYDF9OEiTguSaXOA4YZnaHbYt0kq2tjtkMP4o6xBrcup4rdBMs1k4veyTuvh0dbCYvnoGWhOoPUblVAvsBXH9U4t.nrN1WCqDdn5lo9vib5UMQ3MzXAcHm.ajICBUwXuj7u9Kn2AOFi7HYrA57FeTrfUFPg0r7ibH4q4E4kzzmmzKEFEL1p.h4LxdOnvGgJLI0dugLL8XGD95TedCabglpkLZsqdsPMRUJCR_ycKaE9HP1JmwO9L7_Hd9whlEc8nQFzrgGsnhwFk8zcdiUQA_G1jdPp0bg10y_6NUzVF2cFP4hWXh7j2jXaTRosXLnVia.4FGHxft1qboomzq4VmQH2s4STVyXsee8E7oYRew_DL4qb3NswTiqprpo.QfaWK0vOvk4nKCP9XQEg0DHCGSsNTXE4eo7X0m757eQumJz6.1LDofYkQ7IBqJZdJhWYgoHG1QlJKAQokdSIJkbHN526Vpujax.OIrSSzRxXpudUqHPfE2Qobwr5oWUJlMbRfCbgCVnNdXzQpBHHe7kAMpD79zwsQkJRyh.Na.CVT9fXk5MRQescPAN9smIKFIprqF3ymS_BD6NxIY206hJVZ.cHfF_6hvA9LFZ2Lu6E68yit0uPpS1IreYs3FmzD6sDPdnFWEYx4_ZrmPaqSvq2sHHzIJng_3RGsEK66di.Sih8wstGu.UanfNpssFbe.T7dzq3lFBolAKzMSsuoxeEzx9MlUCaKJcmrjStBCfeQel_W1c7wq5hVp4kLqzG2zqv2xLvlbFqrBuoS9OZ_RATGg7.tBcfn0stHACrTUSInxwYr0jPmLVE\',mdrd: \'CsT1jk_AYaC.EZQg.eVb5zupJzVtA0vSflu5s6wGVyM-1766423322-1.2.1.1-DtKVk2V2pTk3TZMHHu_7rZTQDeYt.7ImwEoheuixPI7P.i957yNUS68SCY0BZl0W5f_PQTgdvzAkGQSEgS1ZWe9u8hhxYJqi_4gu3UgPKpXjLG3gVNc_AsuDOmqfyl8O7R_IBcNB0FkkStopjnTOGTvFsWjeJBvqhdPkeTpArcrR1slTDZ22ObO7W_n3oIUQyz96dXyYyYFwgoz4p978lgRA1GkvytbqQJ0rh1mNf9HOCCpyMrhBA1H.R8vWVlN7TPfj8OG7mP362aS_H7uvVuNxfCJRh5oO19T_CRUfahwv2dExDUYxqRh1iCAi6oRLPXiUiUa9CG7bf9Fshlng_OSb_2bHcy2a3eWPSiyl__yvoTBk2EYpmn4mrhLLC6K.wuZ3z9nymvMX7uhx4vHCwq9u8CQASGP1BUsk_UXre6Pu2GHn5nia.N14kdKKX98CGLRL4.NUsi_1wqqGbhc5GmzBor0e6KOnFMUK7nnNO3hBqyjj0.2c4gBDycOVhT2re_nIgNBs_4Zcw_wiveKYSx1Ra1zOwOb5pxbD5D7uerft5XAd72s4PoJtZylrnTP6npZQ4HXx2pr5UtQXS85CQxqMEWwsjvjSKBV4rAfzAqJdgtBy0.aOdeWId.pkruVBeVK0gC6K6LAri4fV3dL9AkkfipOJ06lyxNYUBG130U1TdoaYV82zN3WmL8u7ugtKbKA07VHCvFujQdMpvu4pV7iHHYKHcHeFcUfL.bMIsOfD_4HOWEIuUWt1r.MKAm1WND_C3fpORbmLBcaQ3EXuQi3wN0YIwi4PpFX.Zo8Xy6tFA.aYzTi4J2YnOz7fTVSE.R4jvqJBL73BJfO4mGw9xaxRu7iF9yEsmSXn7kwEqKQf0b7r33LJxIdHx5cPPoML8g2__iBynsgA4TsEi3XcQKV9XqcQdZjL95rAGSM07PlPCWp2Bv7k9bXRaH9fwbfzZBZ5kvzm_IsSW4riZoSN1Hd8Sa3CBikop7tFRNWxJ7xmvrb6HQ0rmEW2WM2B6r5tA9Ldno5Dg6bv.Lbgoslu7dXXYQIDrcc_Xzt9HFmL3LlsmOfwOLvJO_rDRd0Xa67_fPNn5diqVTVldjxckDp_FIhq4LE7bgs8mInqDx0uTbXjkdB6rhT1PmH1iQSvdhQU4mU3.fRVFL_YL0MyGDuTVF2b.eoB2mP_JnTa4dd9kH9MnuVfr_7SLfXvfwrM6T_iLfQ7yfRB2JPXUknidCbYPZgU5Z0sBDp_UivRqZC5X8l91eLNj6CVcoplvYAZ19xzid61q7zcR6XINFMaKeU3FMqxUmdhT3BRBKylG0rCwaG6FoF5KzJCCGTu27pDyPndrPDyfRTLPsPHHPyaJxinABgQnY8PT0bemE0rm7dRL73f.FiwUA5Bkv7lzqYDp2.Cl.fHOCR00KBBPTDBr1uak2EwOhdTOoGlfkWTqHRvZG1P7dmj..UE2l3mm_trmYDwWCyzGFKadIHsOxaz0PCGc3cWWXKozTzZnGWRtg0XrQXwDlSnyQVxo6fRHQfffxoiEeYNVNNwH3co1JZV6iW8wHZ7Cyzi774x8x7EUKSYo1wrjA1WNSofgRDL4oGY4S_FQ2By7OO60dcV1LXVddb9eeMtzAIFqvQCGnQrCws50ZOhK7dIxenxBXSOqMwCjaUpPzgKCwG7VzBheCkCo37WL9egu3sSb2AK4xTs1w2OOXPfEa6x0WLYGdB8mxkkM8hVlaKiWIeAiSrOi1lD4haRch2BJc0B2kMBieLJN2p5C6JqhypvKrUkNsntGDDdhX2U2Oj1txs5L64DR25_2XaLtg.U1rJWeivF9bYEAMBaFx9L9eglKlU3nZgGzX8_VN5CViKlSEWaptr8chRzlOIwcro6.dj_O2FZLVLE0SmAdAFZobdOdXVLLwRsRKbYqUcucb9O6VW2TdSJ1L9iLS8cA1sd09hwLyBmOhX8dFDIHlCawCmcj489GkPJuuj3bM4hBu9D3oDvvlOKmnSdgx6sUBQ1ONBJ8kq2o5cAL7LOA_ZMoKF8dhrkLAeHfKIW4ms8Rfii0h5FS0QT8L.ba4lLQ1yMsy20b9Pp5dDCfexwm6N9yjY_9D24Y_p9KMuCSha94KxxvCgQlLccVHmt8HtSfbQ25uEAe5MvDSWvxvn6OHozmQ9khmoT6Hy5mLe1IQNRM5XPhHjtViD2stKCM7ehMr7qpBTx46Gv2Riy1fSRjVgpmVii57y3.ZGIIN6mHJSgx0Lqyqq2_mtl7IoNlSQS1ukhHpPXkv1.9gv7530fQIZveoVuMdPrdE_NAzgn3pV62SGQdXmUAoGYKKPhtxoOjPO1kXbrkzfsd6.jeMKdlDBoPLU0jbiJ2GY3meMpBIcWO3Qi7WpwNxIo9v3ZpkANDMo1bLQiqaCfocfgGPcbek10i1eBlpBywbjaX8fzMh81AbWnyggOZ7dLeoz5TMXQqB.WlfMyzEd5fmVIlhTBEQk_LfEyAj8ods8MbDXk3ILvOy1uLhLVnokAxVWfn8uF1DO2B85XyM1SlTVfn0kdvbWyorTihMcBRD_7KQ4qcG6ntqqplINavF9apOA778Iyg.t0v3OZFlqbOrlA4zi4KofPLYIzp4rDP2.bwbZA_S3G_Mz0ATS58.xV9WsUrTRkyiThvryBnE1Ok4OEbkQo20lJP27S6mhsLotmDWUjtzRdB7Abuyiox9QRZd7TKF9XoORbPpSdc_hlFDJ6g.nNgwZI8PvzAHpC_Ui2Xj8JCfd6CvTGe2CWWBQI8AnmLjjCra9BgE2wNsLZdM4z9xbVqDRxWEG5iHYmMTs82UGbig3I6ZqCGWq7U8_W6neSNKmE7pjSUs_FfIDGXBSN7y0Cvp5xMggonfkPV7yYKtT6l3Esd5Qm41_FQaP4Yx8.q.0CeV80MRPa8JBBrOo1qoQ9KsEfLdsChZGmkpGK4HT.iY9qN1L0cdaBux.GUonDVeg2Ggj8g90GZFVGrsJSETp5xwPjl5qD4WwiN2lPJEKiK18mT3EsYhDrBBcwlxyGxrvlArRBjRiNTu.V7neSaKEEKFUv_RRFjl9WH0Nbf11lkibKsdFsJyS97f2VFqnr00zvfZISQCC8Np7HacxwCGwuorDOEsqv0W9svDLpaY7MynN2cAYvezINmBsaAKvYEIqQjZ_8ZHlzbSFacEOYvUVhbpNBfCgp_qKyawVVzbVYbE9VOV24vudp.G2QpIB7ucYr5mgUY_7Hk_xeDr.poWbwFAao6Gw9H85Pfp_zk1.fDvvnEnHdtDOMugvjxmsi9rLKfNoJRK6quzYfqlnPaN_xM4SW_Ie4SyyW_141UlObPR..kfG91smb87jrtKl2UtaA6SQ9LEahWvQY7Y61mG.Ypxc47Ws9uQ5c8kWkZOVRxyx2A33SFukxH8VCbRsZf2medPVJ_w1X7jVLjIbRdKiUy8x0A9z1g5K_3e7wURmWlBvfnfH5XelFXIi9jYQhBH.R.fNzHBCC4VOyzvrcuaXLfVoWqvYMiCegd80Y7wWWKB3DOEaLgoNryS2lfB8gX3ddwGXrt7Yh5wReWRSgOAoZQpOtcAnDDjK7BbfgeVXmSuIe0I49sZDH6MyLpTFqBlvKzgUx4QR9zzHeYCFzaFwoUJJHvlri4SFdIXX1Z17_4.C4fQeGa1ckPtqNmxrfUFtZzUt9cZxSD4DjTwpy2I7Axbbszv45WcjtFsAZAa.xVP_UKoM5.fQpeVjeSmL9EKxK9bM1pylnGZcNByUrw1vGYSfWA0Hjr9vJK4Gn2wojjGSz4nRXrNv6fYGTYKg7EY8KdIlopIf9vALwpKAEbCIHJtBsnyI2ZbgCYLxhyfFIcpGmBnxL1_8ccq.Fn3rpDifXu4tkTFoh8UHbsmdEKzAH327iu8CNbioBrdVQ7AtbDG5kQkbRKM8mllyGVFg_kKOMIDg6Pl9AQ9B7t1OcCRRKKcvliyyGBGwSutN5PZ9RYGELJulX9BjN_N3hEGCboLxS7mNVSCfdISbE266L.tvAqBYyBEaFj7KV8x46CAACFmXhzmG6uVvPKwyFTVqL32KunTTh2Iz7lnKMb8NEuQHN6gy8zZWhRSy3k6k71CDAOjSi1NwhPGdo6X4H_vRf60omS2uWC_mtmheDQLcAuwtVfYc4O.DgBDm4L.SrT6SPam2vU1skhzR8oT78fQ0PYNzJtZB0JlRolksAi7EvqAp4B__i4qUcK39m.hyaMDEdqJ0KyjFJreku8rRbolmQzEbs6.VoLMLBbHZLqGIzyTHjETf_59M6Hfee2Mz.xDivGojnuBj8CKrXlt0swPv7p1YHVQGKbsi4Z7JzDtP3SHyvcFVZtl.ko.hhrL4sFWznHnTq9BUIgYbIH9va_kvxVVDuWh0NoIiwiKINuDoR1hRtcek\',};var a = document.createElement(\'script\');a.src = \'/cdn-cgi/challenge-platform/h/g/orchestrate/chl_page/v1?ray=9b213906b862ff01\';window._cf_chl_opt.cOgUHash = location.hash === \'\' && location.href.indexOf(\'#\') !== -1 ? \'#\' : location.hash;window._cf_chl_opt.cOgUQuery = location.search === \'\' && location.href.slice(0, location.href.length - window._cf_chl_opt.cOgUHash.length).indexOf(\'?\') !== -1 ? \'?\' : location.search;if (window.history && window.history.replaceState) {var ogU = location.pathname + window._cf_chl_opt.cOgUQuery + window._cf_chl_opt.cOgUHash;history.replaceState(null, null,"\\/services\\/slip-opinions\\/?Court=Supreme+Judicial+Court&Month=July+2025&__cf_chl_rt_tk=pFxGi66KgA40oFEraFj9Oe7tbuoRR7qU92bIZYziaBU-1766423322-1.0.1.1-ye0MYOmXIWONln1tclzJV83Bm8LAaNoKt2vIXiazdD8"+ window._cf_chl_opt.cOgUHash);a.onload = function() {history.replaceState(null, null, ogU);}}document.getElementsByTagName(\'head\')[0].appendChild(a);}());</script></body></html

@grossir grossir removed their assignment Dec 22, 2025
@Luis-manzur
Copy link
Contributor Author

Luis-manzur commented Dec 27, 2025

yes, now its blocking me as well (with or without VPN)

The website is using cloudflare bot protection

@grossir grossir moved this from PRs to Review to Blocked in Case Law Sprint Dec 29, 2025
@grossir
Copy link
Contributor

grossir commented Dec 29, 2025

@Luis-manzur Can you try using cookies / replicating headers or something similar and see if that fixes the issue?

@Luis-manzur
Copy link
Contributor Author

I found a request library that may help us to bypass cloudflare bot protection, this library can be use to impersonate any web browser curl-cffi

I tested it with mass and it is responding 200 and with no noticeable running time difference.

solution:
overwrite _request_url_get method

from curl_cffi import requests as curl_requests

    def _request_url_get(self, url):
        """Override to use curl_cffi to bypass Cloudflare protection

        Execute GET request using curl_cffi with browser impersonation
        to bypass Cloudflare's bot detection.
        """
        self.request["url"] = url

        # Use curl_cffi to impersonate a real Chrome browser
        self.request["response"] = curl_requests.get(
            url,
            impersonate="chrome",
            timeout=60,
            **self.request["parameters"],
        )

        if self.save_response:
            self.save_response(self)

@grossir
Copy link
Contributor

grossir commented Dec 29, 2025

If this depends on curl being available, we don't have that on the docker deployment. Try it in your local env. We would have to add that as a dependency

The solution looks promising, but I am not sure if we should "impersonate" something we are not, due to our scraping policy. I know we do change some headers, sometimes, but this is a more complicated step. We should ask in the sprint channel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Blocked

Development

Successfully merging this pull request may close these issues.

masssuperct changed or deleted url

3 participants