Skip to content

Demos for Nessie. Nessie provides Git-like capabilities for your Data Lake.

License

Notifications You must be signed in to change notification settings

projectnessie/nessie-demos

Nessie Binder Demos

These demos run under binder and can be found at:

They are automatically rebuilt every time we push to main. They are unit tested using testbook library to ensure we get the correct results as the underlying libraries continue to grow/mature.

Upgrade instructions

Because of the split between Binder and unit tests it wasn't totally trivial to create a single place to update all versions. Some versions have to be updated in multiple places:

Nessie

Nessie version is set in Binder at docker/binder/requirements_base.txt. Currently, the demos are using 0.74.x of Nessie.

Iceberg

Currently we are using Iceberg 1.4.2 and it is specified in both iceberg notebooks as well as docker/utils/__init__.py

Spark

Only has to be updated in docker/binder/requirements.txt. Currently, Iceberg supports 3.2, 3.3, 3.4 and 3.5, we use Spark 3.2 in the demos.

Flink

Flink version is set in Binder at docker/binder/requirements_flink.txt. Currently, we are using 1.17.1.

Hadoop

Hadoop libs are used by flink and currently specified in docker/utils/__init__.py only. We use 2.10.1 with Flink and Hive.

Hive

Current Hive version that is being used 2.3.9 which supports Hadoop version of 2.10.1. To update the version, it needs to be only updated in docker/utils/__init__.py.

Binder

Binder is a more customizable platform for Jupyter notebooks and more (see their website). Binder generates a Dockerfile + image based on the settings in the source GitHub repository (other sources are possible). It is possible to pre-install both e.g. Ubuntu and/or Python packages into the Docker image generated by Binder.

Of course, Binder just lets a user "simply start" a notebook via a simple "click on a link".

Development

For development, you will need to make sure to have the following installed:

  • Python 3.10+
  • pre-commit

Regarding pre-commit, you will need to make sure is installed through pre-commit install in order to install the hooks locally since this repo executes some several scripts in pre-commit stage.

To run the notebooks unit tests, in notebook folder, run the following commands:

  1. python -m pip install -r requirements_dev.txt
  2. tox

Running the unit tests takes time since it will need to download all the binaries files like Hive, Flink ..etc and then it will run the tests.