Skip to content

Consistency of spark and rmr backends #68

@piccolbo

Description

@piccolbo

Because of the deep differences in the backends, despite best efforts some semantic differences have trickled into the API

  • output function path is mandatory for spark, as we don't have a system of temp files as we do for rmr (we use rdds instead), Related is the fact that the output function returns a path on the spark backend and a big data object (temp file) on rmr. The big data object can encapsulate either a temporary or a permanent location. The equivalent on spark is the rdd and is always temporary
  • list of supported formats is different
  • system of custom formats is much more restricted in sparkR

The goal of this issue is to list these differences that spawn specific efforts to reduce or eliminate them, or if necessary document them

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions