Skip to content

More details on used datasets and results #3

@nikonyrh

Description

@nikonyrh

Is there more a detailed description available of used datasets, hardware, queries and results than what is described at http://www.revolutionanalytics.com/sites/default/files/revolution-analytics-sas-benchmark-whitepaper-mar2014.pdf and http://www.slideshare.net/RevolutionAnalytics/is-revolution-r-enterprise-faster-than-sas-benchmarking-results-revealed-34739767?

I'd like to reproduce those different datasets (5 million rows x 591 cols, 50 million rows x 21 cols etc.) and all queries which were executed on them. For each experiment it would nice to see:

  • Configuration options for the generation script
  • SHA1 hashes of CSV files (to ensure identical data)
  • Description of queries (in some explicit human-readable description)
  • Results of queries (to ensure identical queries)
  • Running times of queries (not just totals)

Actually for now I am just interested on first three queries (Descriptive statistics, Median and deciles, Frequency distribution) since those are implemented in various database solutions as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions