Skip to content

egcodes/aristotle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Aristotle is a highly customizable tool that collects links from sites.

Aristotle

With the properties in the config files, it scans all the defined sites and saves the metadata [title, description, imageLink, publishDate] of the site in the database.

Usage

config/properties.yaml

These settings are basically:

  1. database: Currently, databases in this list ((https://docs.sqlalchemy.org/en/13/dialects/)) are supported. The settings of the DB where the links will be stored are entered here. For the name property, a database must be created in the DB and its name must be entered in this parameter.
  2. locale: According to the language of the sites to be fetched, the feature to be localized must be entered here. For example, in English, en_EN should be entered.
  3. request: General features of the request.
  4. parser: In the parsing phase, if desired, title and description strings can be trimmed as much as the parameter given
database:
  dialect: mysql+pymysql
  url: localhost
  port: 3306
  name: aristotle
  userName: root
  password: root

locale: en_EN

request:
  timeout: 3
  userAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML) Chrome/23.0.1271.97 Safari/537.11

parser:
  titleCharLimit: 100
  descriptionCharLimit: 300

config/sources-{locale}.yaml

article:
  - domain: cnn.com
    active: true
    link: https://edition.cnn.com/
    filterForLink:
      mandatoryWords: ["/politics/"]
      permissibleWords: []
      impermissibleWords: []
    tagForMetadata:
      title:
      description:
      image:
      publishDate:
      publishDateFormat: "%Y-%m-%d"

technology:
  - domain: mashable.com
    active: true
    link: https://mashable.com
    filterForLink:
      mandatoryWords: ["-"]
      permissibleWords: ['/article/']
      impermissibleWords: []
    tagForMetadata:
      title:
      description:
      image:
      publishDate: datetime
      publishDateFormat: "%d.%m.%Y"

Development

If you'd like to contribute the project, feel free to clone a development version of this repository locally:

git clone https://github.com/egcodes/aristotle.git

Once you have a copy of the source, you can embed it in your Python package, or install it into your site-packages easily:

$ pip3 install -r requirements.txt
$ python3 setup.py install

Requirements

  • Python 3.x
    • beautifulsoup4>=4.9.1
    • requests>=2.24.0
    • PyYAML>=5.3.1
    • SQLAlchemy>=1.3.18

For database dialect, you must install the special dialect package for the database you use. For example, if you are using MySQL, the PyMySQL package must be installed.

About

highly customizable news collector

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages