-
Notifications
You must be signed in to change notification settings - Fork 11
Home
Simon Urbanek edited this page Jan 15, 2014
·
2 revisions
Welcome to the iotools wiki!
See DistributedR for a short document on the concepts.
iotools
provide a set of tools for streaming data through connections and conversion tools from raw bytes to objects and vice-versa. They also contain a set of experimental functions that use streaming to run map/reduce or divide/recombine jobs on Hadoop as well as a general chunk-wise processing.
Unique entries of field 2:
hmr(hinput("/my/data"),
map=function(x) unique(x[,2]), reduce=unique)
Distribution of field 2:
library(iotools)
hmr(hinput("/my/data"),
map=function(x) table(x[,2]),
reduce=function(x) ctapply(as.numeric(x), names(x), sum))