KoreaNewsCrawler

This crawler is a crawler that crawls news articles from media organizations posted on NAVER portal.
Crawlable article categories include politics, economy, lifeculture, global, IT/science, society. In the case of sports articles, that include korea baseball, korea soccer, world baseball, world soccer, basketball, volleyball, golf, general sports, e-sports.

How to install

pip install KoreaNewsCrawler

Method

set_category(category_name)

This method is to set the category you want to collect.
The categories that can be included in the parameter are 'politics', 'economy', 'society', 'living_culture', 'IT_science', 'world', and 'opinion'.
You can have multiple parameters.
category_name: politics, economy, society, living_culture, IT_science, world, opinion

set_date_range(startyear, startmonth, endyear, endmonth)

This method refers to the time period of news you want to collect. By default, it collects data from the month of startmonth to the month of endmonth.

start()

This method is the crawl execution method.

Article News Crawler Example

from korea_news_crawler.articlecrawler import ArticleCrawler

Crawler = ArticleCrawler()  
Crawler.set_category("politics", "IT_science", "economy")  
Crawler.set_date_range("2017-01", "2018-04-20") 
Crawler.start()

Perform a parallel crawl of news in the categories Politics, IT Science, and Economy from January 2017 to April 20, 2018 using a multiprocessor.

Sports News Crawler Example

Method is similar to ArticleCrawler().

from korea_news_crawler.sportcrawler import SportCrawler 

Spt_crawler = SportCrawler()
Spt_crawler.set_category('korea baseball','korea soccer')
Spt_crawler.set_date_range("2017-01", "2018-04-20") 
Spt_crawler.start()

Execute a parallel crawl of Korean baseball and Korean soccer news from January 2017 to April 20, 2018 using a multiprocessor.

Results

Colum A: Article date & time
Colum B: Article Category
Colum C: Media Company
Colum D: Article title
Colum E: Article body
Colum F: Article address
All the data you collect is saved with a CSV extension.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
dist		dist
img		img
korea_news_crawler		korea_news_crawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KoreaNewsCrawler

How to install

Method

Article News Crawler Example

Sports News Crawler Example

Results

About

Releases 10

Packages

Contributors 5

Languages

License

lumyjuwon/KoreaNewsCrawler

Folders and files

Latest commit

History

Repository files navigation

KoreaNewsCrawler

How to install

Method

Article News Crawler Example

Sports News Crawler Example

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 5

Languages

Packages