-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Is there more a detailed description available of used datasets, hardware, queries and results than what is described at http://www.revolutionanalytics.com/sites/default/files/revolution-analytics-sas-benchmark-whitepaper-mar2014.pdf and http://www.slideshare.net/RevolutionAnalytics/is-revolution-r-enterprise-faster-than-sas-benchmarking-results-revealed-34739767?
I'd like to reproduce those different datasets (5 million rows x 591 cols, 50 million rows x 21 cols etc.) and all queries which were executed on them. For each experiment it would nice to see:
- Configuration options for the generation script
- SHA1 hashes of CSV files (to ensure identical data)
- Description of queries (in some explicit human-readable description)
- Results of queries (to ensure identical queries)
- Running times of queries (not just totals)
Actually for now I am just interested on first three queries (Descriptive statistics, Median and deciles, Frequency distribution) since those are implemented in various database solutions as well.