This is a ruby implementation of the crawley framework

Comming Soon Features!

High Speed WebCrawler built on EventMachine.
Supports databases engines like Postgre, Mysql, Oracle, Sqlite.
Command line tools.
Extract data using XPath.
Cookie Handlers.

Write your Models

""" models.rb """

require 'rubygems'
require 'data_mapper'
require 'dm-migrations'    

class Package
    include DataMapper::Resource
        
    property :updated,      String
    property :package,      String
    property :description,  String    
end

Write your Scrapers

""" crawlers.rb """

require 'crawlers'
require 'scrapers'

class PypiScraper < BaseScraper

    @@matching_urls = ["%pypi.python.org/pypi%"]

    def scrape response        
        super response
    end
end

class PypiCrawler < BaseCrawler

    #add your starting urls here
    @@start_urls = ["http://pypi.python.org/pypi"]

    #add your scraper classes here
    @@scrapers = [PypiScraper.new]

    #specify your maximum crawling depth level
    @@max_depth = 1

end

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
bin		bin
crawley		crawley
examples		examples
tests		tests
.gitignore		.gitignore
README.md		README.md
Rakefile		Rakefile
crawley.gemspec		crawley.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This is a ruby implementation of the crawley framework

Comming Soon Features!

Write your Models

Write your Scrapers

About

Releases

Packages

Contributors 2

Languages

crawley-project/crawley-ruby

Folders and files

Latest commit

History

Repository files navigation

This is a ruby implementation of the crawley framework

Comming Soon Features!

Write your Models

Write your Scrapers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages