-
Notifications
You must be signed in to change notification settings - Fork 161
FAQ
Both TensorFrames and SparkNet provide bridges between Spark and some deep learning frameworks, and both have roots in the AMPLab. The two projects address different needs, though: TensorFrames aims at being an efficient wrapper to seamlessly integrate Spark DataFrames and TensorFlow, while SparkNet is a more general framework geared towards distributed computer vision and deep learning. There are some technical differences as well:
- SparkNet uses the RDD API while TensorFrames uses the DataSet/DataFrame API of Spark, with two major consequences: 1) TensorFrames understands natively all the numerical types supported by Spark DataFrames and it can provide quick and understandable error messages when the shapes of the tensors are not aligned, and 2) TensorFrames has intimate knowledge of the memory-efficient representation of data in Spark, minimizing the memory copies between the two frameworks.
- SparkNet supports both Caffe and TensorFlow as backends
- the SparkNet API is more geared towards research in computer vision at this moment
- SparkNet has dedicated ingestion points for image data, while TensorFrames only communicates with DataFrames and relies on Spark for ingestion
- TensorFrames has (limited) support for expressing TensorFlow operations as a pure scala program, and may be used without python.
Also, from an engineering perspective, the TensorFrames code is well-covered by more than 90 unit tests and is readily available as a Spark package.
Yes, similar to SparkNet, it runs the same model on distributed pieces of data. The main difference between SparkNet is the choice of underlying data representation: SparkNet converts Spark RDDs into C++ buffers, while TensorFrames directly accesses the DataFrames/DataSet internal representations.
It depends on what you mean. If you consider training a single large model spread across multiple machines, this has not been tested and the support for variables may not be complete enough. Models that fit on a single machine are fine though.
So far, the monitoring tools are not deployed, only the binaries. There are some ways to collect the statistics of the worker nodes so there is no big technical difficulty in doing so.
TensorFlow is built around a small set of primitives (exposed in the C++ API) that are used as building blocks by the python API. As such, some features such as as autodifferentiation are purely implemented in python. The TensorFrames scala API covers at least the same set of primitives and is fully compliant with the output of the python API (the generated protocol buffer is strictly identical).
For more complex primitives, it will depend on the requests. In general, since python is the officially recommended language for accessing TensorFlow, all the other languages will lag behind python to some various degrees.
In the current version, all the computations done by TensorFlow are local, and Spark is in charge of the network communications. Spark has a rich collection of communication primitives, so they should be useful for most use cases. Now that TensorFlow 0.8 is out with its own communication stack however, we are thinking of exposing some of it to the user. Like regular TensorFlow programs, you will have to manually configure the cluster to enable inter-node communication.
There is no document for now, however the code overall is fairly well documented in my opinion. Feel free to ask the developer for some pointers.