This contains the scripts to scrape the courtcases and make the data available through API. Scrapping is a very heavy process, takings days to complete the process. Don't do the scrapping regularly and use the API to use the courtcases data.
The following system diagram shows that the scrapy script pulls the information from court sites and saves to MySQL database and stores files separately. Then there's API which pulls the information from the database and the OCDS portal uses the API to show the court information in the companies pages.
pip install scrapypip install peeweepip install MySQL-python
- Cleanup data for fresh scrapping
mysql -uroot -p -e "DROP DATABASE IF EXISTS moldova_courtcases;CREATE DATABASE moldova_courtcases CHARACTER SET utf8 COLLATE utf8_general_ci;" - copy
dbconfig.py.baktodbconfig.pyand update database information scrapy listshould show the spider namescrapy crawl Caseswill start to crawl, create html file and save to database
- follow https://doc.scrapy.org/en/latest/topics/jobs.html
scrapy crawl Cases -s JOBDIR=crawls/cases-1to pause and restart the jobctrl+cto stop and above command to restart again
- Run
scrapy shell https://cac.instante.justice.md/ro/hot - You may run scrapy code and see the results one by one
- Run the followings one line at a time
from scrapy.selector import Selector decisions = sel.xpath('//table/tbody/tr') courtName = sel.xpath('//h2[contains(@class,"site-name")]/a/text()').extract() courtName- You will see the court name in the shell
- Once it works, then copy the working code in the sourcefile
pip install flaskpip install gunicorn
python api.pywill serve the API in port 8090.
- copy
moldovacourts_api.service.baktomoldovacourts_api.serviceand update the project directory information - create soft-link
ln -s /home/moldova-ocds/pydev/src/moldovacourts/moldovacourts_api.service /etc/systemd/system/moldovacourts_api.service systemctl start moldovacourts_api.serviceto start the moldova_api gunicorn server
domain:8090/courtcasescount?q=namegives the count of cases for the given company namedomain:8090/courtcases?q=namegives the cases lists in json for the given company name
[
{
"caseNumber": "26-2-587-02022017",
"caseType": "Civil",
"court": "Judec\u0103toria Drochia",
"deliveryDate": "",
"theme": "Ac\u021biuni privind \u00eencasarea datoriei",
"title": "AE\u00ce Sofmicrocredit vs R\u0103di\u021b\u0103 Igor Profire, \u021aurlea Violeta, Banu Sergiu - \u00eencasarea datoriei"
},
{
"caseNumber": "20-2c-5683-27022017",
"caseType": "Civil",
"court": "Judec\u0103toria Chi\u0219in\u0103u",
"deliveryDate": "",
"theme": "Litigii privind executarea obligatiilor",
"title": "Casa Nationala de Asigurari So vs SRL Ladita Fermecata"
},
...
]
