This package contains all of the classes and functions you need to interact with Splice Machine's scale out, Hadoop on SQL RDBMS from Python. It also contains several machine learning utilities for use with Apache Spark.
(sudo) pip install splicemachine
(sudo) pip install splicemachine[notebook]
(sudo) pip install splicemachine[stats]
(sudo) pip install splicemachine[all]
NOTE: If you use zsh and plan to install extras, you must escape the brackets (pip install splicemachine\[all\]
This package contains 4 main external modules. First, splicemachine.spark.context
, which houses our Python wrapped Native Spark Datasource, as well as our External Native Spark Datasource, for use outside of the Kubernetes Cluster. Second, splicemachine.mlflow_support
which houses our Python interface to MLManager. Lastly, splicemachine.stats
which houses functions/classes which simplify machine learning (by providing functions like Decision Tree Visualizers, Model Evaluators etc.) and splicemachine.notebook
which provides Jupyter Notebook specific functionality like an embedded MLFlow UI and Spark Jobs UI.
-
splicemachine.spark.context
: Native Spark Datasource for interacting with Splice Machine from Spark1.1)
splicemachine.spark.context.ExtPySpliceContext
: External Native Spark Datasource for interacting with Splice Machine from Spark. Usage is mostly identical to above after instantiation (with a few extra functions available). To instantiate, you must provide thekafkaServers
parameter pointing to the Kafka URL of the splice cluster you want to connect to. In Standalone, that url will be the default parameter of the class (localhost:9092
) -
splicemachine.mlflow_support
: MLFlow wrapped MLManager interface from Python. The majority of documentation is identical to MLflow. Additional functions and functionality are available in the docs -
splicemachine.features
: The Python SDK entrypoint to the Splice Machine Feature Store -
Extensions
4.1)
splicemachine.stats
: houses utilities for machine learning4.2)
splicemachine.notebooks
: houses utilities for use in Jupyter Notebooks running in the Kubernetes cloud environment
The docs are managed py readthedocs and Sphinx. See latest docs here
cd docs
make html