Skip to content
Simon Urbanek edited this page Jan 15, 2014 · 2 revisions

Welcome to the iotools wiki!

See DistributedR for a short document on the concepts.

Purpose

iotools provide a set of tools for streaming data through connections and conversion tools from raw bytes to objects and vice-versa. They also contain a set of experimental functions that use streaming to run map/reduce or divide/recombine jobs on Hadoop as well as a general chunk-wise processing.

Simple examples

Unique entries of field 2:

hmr(hinput("/my/data"),
    map=function(x) unique(x[,2]), reduce=unique)

Distribution of field 2:

library(iotools)
hmr(hinput("/my/data"),
    map=function(x) table(x[,2]),
    reduce=function(x) ctapply(as.numeric(x), names(x), sum))
Clone this wiki locally