Skip to content

Commit

Permalink
Merge pull request #1495 from BlazingDB/branch-0.19
Browse files Browse the repository at this point in the history
 0.19 Release
  • Loading branch information
wmalpica authored Apr 21, 2021
2 parents 44aeef8 + 5964463 commit ff4ece0
Show file tree
Hide file tree
Showing 2,671 changed files with 4,492,660 additions and 4,774 deletions.
5 changes: 5 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ A clear and concise description of what you expected to happen.
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
- Method of BlazingSQL install: [conda, Docker, or from source]
- If method of install is [Docker], provide `docker pull` & `docker run` commands used
- **BlazingSQL Version** which can be obtained by doing as follows:
```
import blazingsql
print(blazingsql.__info__())
```

**Environment details**
Please run and paste the output of the `print_env.sh` script here, to gather any other relevant environment details
Expand Down
48 changes: 48 additions & 0 deletions .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# This is a basic workflow to help you get started with Actions

name: Build docs

# Controls when the action will run.
on:
push:
branches:
- main

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
docs:
# The type of runner that the job will run on
runs-on: ubuntu-latest

# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v2

# Runs a single command using the runners shell
- uses: mattnotmitt/doxygen-action@v1
with:
working-directory: 'docsrc/'
doxyfile-path: 'source/Doxyfile'
- uses: ammaraskar/sphinx-action@master
with:
build-command: "make html -e"
docs-folder: "docsrc/"
- name: Commit documentation changes
run: |
git clone https://github.com/romulo-auccapuclla/blazingsql.git --branch main --single-branch main
cp -a docsrc/build/html/. docs/
cd docs
touch .nojekyll
git config --local user.email "[email protected]"
git config --local user.name "GitHub Action"
git add .
git commit -m "Update documentation" -a || true
- name: Push changes
uses: ad-m/github-push-action@master
with:
branch: main
directory: docs
force: true
github_token: ${{ secrets.GITHUB_TOKEN }}
10 changes: 10 additions & 0 deletions .gitignore
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
.idea/

engine/cmake-build-debug/
cmake-*

algebra.log

Expand Down Expand Up @@ -92,3 +93,12 @@ powerpc/blazingsql.tar.gz
powerpc/developer/requirements.txt
powerpc/developer/core
powerpc/developer/blazingsql.tar.gz

# mac junk
.DS_Store
._.*

# docs build folders
docsrc/build/
docsrc/source/doxyfiles/
docsrc/source/xml
1 change: 1 addition & 0 deletions .nojekyll
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

69 changes: 67 additions & 2 deletions CHANGELOG.md
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,65 @@
# BlazingSQL 0.19.0 (April 21, 2021)

## New Features
- #1367 OverlapAccumulator Kernel
- #1364 Implement the concurrent API (bc.sql with token, bc.status, bc.fetch)
- #1426 Window Functions without partitioning
- #1349 Add e2e test for Hive Partitioned Data
- #1396 Create tables from other RDBMS
- #1427 Support for CONCAT alias operator
- #1424 Add get physical plan with explain

## Improvements
- #1325 Refactored CacheMachine.h and CacheMachine.cpp
- #1322 Updated and enabled several E2E tests
- #1333 Fixing build due to cudf update
- #1344 Removed GPUCacheDataMetadata class
- #1376 Fixing build due to some strings refactor in cudf, undoing the replace workaround
- #1430 Updating GCP to >= version
- #1331 Added flag to enable null e2e testing
- #1418 Adding support for docker image
- #1434 Added documentation for C++ and Python in Sphinx
- #1419 Added concat cache machine timeout
- #1444 Updating GCP to >= version
- #1349 Add e2e test for Hive Partitioned Data
- #1447 Improve getting estimated output num rows
- #1473 Added Warning to Window Functions
- #1480 Improve dependencies script

## Bug Fixes
- #1335 Fixing uninitialized var in orc metadata and handling the parseMetadata exceptions properly
- #1339 Handling properly the nulls in case conditions with strings
- #1346 Delete allocated host chunks
- #1348 Capturing error messages due to exceptions properly
- #1350 Fixed bug where there are no projects in a bindable table scan
- #1359 Avoid cuda issues when free pinned memory
- #1365 Fixed build after sublibs changes on cudf
- #1369 Updated java path for powerpc build
- #1371 Fixed e2e settings
- #1372 Recompute `columns_to_hash` in DistributeAggregationKernel
- #1375 Fix empty row_group_ids for parquet
- #1380 Fixed issue with int64 literal values
- #1379 Remove ProjectRemoveRule
- #1389 Fix issue when CAST a literal
- #1387 Skip getting orc metadata for decimal type
- #1392 Fix substrings with nulls
- #1398 Fix performance regression
- #1401 Fix support for minus unary operation
- #1415 Fixed bug where num_batches was not getting set in BindableTableScan
- #1413 Fix for null tests 13 and 23 of windowFunctionTest
- #1416 Fix full join when both tables contains nulls
- #1423 Fix temporary directory for hive partition test
- #1351 Fixed 'count distinct' related issues
- #1425 Fix for new joins API
- #1400 Fix for Column aliases when exists a Join op
- #1456 Raising exceptions on Python side for RAL
- #1466 SQL providers: update README.md
- #1470 Fix pre compiler flags for sql parsers


## Deprecated Features
- #1394 Disabled support for outer joins with inequalities

# BlazingSQL 0.18.0 (February 24, 2021)

## New Features
Expand All @@ -17,6 +79,10 @@
- #1284 Initial support for Windows Function
- #1303 Add support for INITCAP
- #1313 getting and using ORC metadata
- #1347 Fixing issue when reading orc metadata from DATE dtype
- #1338 Window Function support for LEAD and LAG statements
- #1362 give useful message when file extension is not recognized
- #1361 Supporting first_value and last_value for Window Function


## Improvements
Expand All @@ -38,7 +104,7 @@
- #1314 Added unit tests to verify that OOM error handling works well
- #1320 Revamping cache logger
- #1323 Made progress bar update continuously and stay after query is done

- #1336 Improvements for the cache API

## Bug Fixes
- #1249 Fix compilation with cuda 11
Expand All @@ -52,7 +118,6 @@
- #1312 Fix progress bar for jupyterlab
- #1318 Disabled require acknowledge


# BlazingSQL 0.17.0 (December 10, 2020)

## New Features
Expand Down
67 changes: 67 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
ARG CUDA_VER="10.2"
ARG UBUNTU_VERSION="16.04"
FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VERSION}
LABEL Description="blazingdb/blazingsql is the official BlazingDB environment for BlazingSQL on NIVIDA RAPIDS." Vendor="BlazingSQL" Version="0.4.0"

ARG CUDA_VER=10.2
ARG CONDA_CH="-c blazingsql -c rapidsai -c nvidia"
ARG PYTHON_VERSION="3.7"
ARG RAPIDS_VERSION="0.18"

SHELL ["/bin/bash", "-c"]
ENV PYTHONDONTWRITEBYTECODE=true

RUN apt-get update -qq && \
apt-get install curl git -yqq --no-install-recommends && \
apt-get clean -y && \
curl -s -o /tmp/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash /tmp/miniconda.sh -bfp /usr/local/ && \
rm -rf /tmp/miniconda.sh && \
conda create --no-default-packages python=${PYTHON_VERSION} -y -n bsql && \
conda install -y --freeze-installed -n bsql \
${CONDA_CH} \
-c conda-forge -c defaults \
cugraph=${RAPIDS_VERSION} cuml=${RAPIDS_VERSION} \
cusignal=${RAPIDS_VERSION} \
cuspatial=${RAPIDS_VERSION} \
cuxfilter clx=${RAPIDS_VERSION} \
python=${PYTHON_VERSION} cudatoolkit=${CUDA_VER} \
blazingsql=${RAPIDS_VERSION} \
jupyterlab \
networkx statsmodels xgboost scikit-learn \
geoviews seaborn matplotlib holoviews colorcet && \
conda clean -afy && \
rm -rf /var/cache/apt /var/lib/apt/lists/* /tmp/miniconda.sh /usr/local/pkgs/* && \
rm -rf /usr/local/envs/bsql/conda-meta && \
rm -rf /usr/local/envs/bsql/include && \
rm /usr/local/envs/bsql/lib/libpython3.7m.so.1.0 && \
find /usr/local/envs/bsql -name '__pycache__' -type d -exec rm -rf '{}' '+' && \
find /usr/local/envs/bsql -follow -type f -name '*.pyc' -delete && \
rm -rf /usr/local/envs/bsql/lib/libasan.so.5.0.0 \
/usr/local/envs/bsql/lib/libtsan.so.0.0.0 \
/usr/local/envs/bsql/lib/liblsan.so.0.0.0 \
/usr/local/envs/bsql/lib/libubsan.so.1.0.0 \
/usr/local/envs/bsql/bin/x86_64-conda-linux-gnu-ld \
/usr/local/envs/bsql/bin/sqlite3 \
/usr/local/envs/bsql/bin/openssl \
/usr/local/envs/bsql/share/terminfo \
/usr/local/envs/bsql/bin/postgres \
/usr/local/envs/bsql/bin/pg_* \
/usr/local/envs/bsql/man \
/usr/local/envs/bsql/qml \
/usr/local/envs/bsql/qsci \
/usr/local/envs/bsql/mkspecs && \
find /usr/local/envs/bsql/lib/python3.7/site-packages -name 'tests' -type d -exec rm -rf '{}' '+' && \
find /usr/local/envs/bsql/lib/python3.7/site-packages -name '*.pyx' -delete && \
find /usr/local/envs/bsql -name '*.c' -delete && \
git clone --branch=master https://github.com/BlazingDB/Welcome_to_BlazingSQL_Notebooks /blazingsql && \
rm -rf /blazingsql/.git && \
mkdir /.local /.jupyter /.cupy && chmod 777 /.local /.jupyter /.cupy

WORKDIR /blazingsql
COPY run_jupyter.sh /blazingsql

# Jupyter
EXPOSE 8888
CMD ["/blazingsql/run_jupyter.sh"]

54 changes: 32 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,13 +99,14 @@ conda install -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults bla
```

## Nightly Version
For nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements
```bash
conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=$PYTHON_VERSION cudatoolkit=$CUDA_VERSION
```
Where $CUDA_VERSION is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8
*For example for CUDA 10.1 and Python 3.7:*
Where $CUDA_VERSION is 11.0 or 11.2 and $PYTHON_VERSION is 3.7 or 3.8
*For example for CUDA 11.2 and Python 3.8:*
```bash
conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=3.7 cudatoolkit=10.1
conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=3.8 cudatoolkit=11.2
```

# Build/Install from Source (Conda Environment)
Expand All @@ -117,18 +118,14 @@ This is the recommended way of building all of the BlazingSQL components and dep
```bash
conda create -n bsql python=$PYTHON_VERSION
conda activate bsql
conda install --yes -c conda-forge spdlog=1.7.0 google-cloud-cpp=1.16 ninja
conda install --yes -c rapidsai -c nvidia -c conda-forge -c defaults dask-cuda=0.18 dask-cudf=0.18 cudf=0.18 ucx-py=0.18 ucx-proc=*=gpu python=3.7 cudatoolkit=$CUDA_VERSION
conda install --yes -c conda-forge cmake=3.18 gtest gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive tqdm ipywidgets
./dependencies.sh 0.19 $CUDA_VERSION
```
Where $CUDA_VERSION is is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8
*For example for CUDA 10.1 and Python 3.7:*
```bash
conda create -n bsql python=3.7
conda activate bsql
conda install --yes -c conda-forge spdlog=1.7.0 google-cloud-cpp=1.16 ninja
conda install --yes -c rapidsai -c nvidia -c conda-forge -c defaults dask-cuda=0.18 dask-cudf=0.18 cudf=0.18 ucx-py=0.18 ucx-proc=*=gpu python=3.7 cudatoolkit=10.1
conda install --yes -c conda-forge cmake=3.18 gtest gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive tqdm ipywidgets
./dependencies.sh 0.19 10.1
```

### Build
Expand All @@ -149,21 +146,18 @@ $CONDA_PREFIX now has a folder for the blazingsql repository.
## Nightly Version

### Install build dependencies
For nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements
```bash
conda create -n bsql python=$PYTHON_VERSION
conda activate bsql
conda install --yes -c conda-forge spdlog=1.7.0 google-cloud-cpp=1.16 ninja
conda install --yes -c rapidsai-nightly -c nvidia -c conda-forge -c defaults dask-cuda=0.19 dask-cudf=0.19 cudf=0.19 ucx-py=0.19 ucx-proc=*=gpu python=3.7 cudatoolkit=$CUDA_VERSION
conda install --yes -c conda-forge cmake=3.18 gtest==1.10.0=h0efe328_4 gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive tqdm ipywidgets
./dependencies.sh 0.20 $CUDA_VERSION nightly
```
Where $CUDA_VERSION is is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8
*For example for CUDA 10.1 and Python 3.7:*
Where $CUDA_VERSION is 11.0 or 11.2 and $PYTHON_VERSION is 3.7 or 3.8
*For example for CUDA 11.2 and Python 3.8:*
```bash
conda create -n bsql python=3.7
conda create -n bsql python=3.8
conda activate bsql
conda install --yes -c conda-forge spdlog=1.7.0 google-cloud-cpp=1.16 ninja
conda install --yes -c rapidsai-nightly -c nvidia -c conda-forge -c defaults dask-cuda=0.19 dask-cudf=0.19 cudf=0.19 ucx-py=0.19 ucx-proc=*=gpu python=3.7 cudatoolkit=10.1
conda install --yes -c conda-forge cmake=3.18 gtest==1.10.0=h0efe328_4 gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive tqdm ipywidgets
./dependencies.sh 0.20 11.2 nightly
```

### Build
Expand Down Expand Up @@ -196,18 +190,34 @@ To build without the storage plugins (AWS S3, Google Cloud Storage) use the next
```
NOTE: By disabling the storage plugins you don't need to install previously AWS SDK C++ or Google Cloud Storage (neither any of its dependencies).

#### SQL providers
To build without the SQL providers (MySQL, PostgreSQL, SQLite) use the next arguments:
```bash
# Disable all SQL providers
./build.sh disable-mysql disable-sqlite disable-postgresql

# Disable MySQL provider
./build.sh disable-mysql

...
```
NOTES:
- By disabling the storage plugins you don't need to install mysql-connector-cpp=8.0.23 libpq=13 sqlite=3 (neither any of its dependencies).
- Currenlty we support only MySQL. but PostgreSQL and SQLite will be ready for the next version!

# Documentation
User guides and public APIs documentation can be found at [here](https://docs.blazingdb.com/docs)

Our internal code architecture can be built using Spinx.
```bash
pip install recommonmark exhale
conda install -c conda-forge doxygen
cd $CONDA_PREFIX
cd blazingsql/docs
cd blazingsql/docsrc
pip install -r requirements.txt
make doxygen
make html
```
The generated documentation can be viewed in a browser at `blazingsql/docs/_build/html/index.html`
The generated documentation can be viewed in a browser at `blazingsql/docsrc/build/html/index.html`


# Community
Expand All @@ -230,4 +240,4 @@ The RAPIDS suite of open source software libraries aim to enable execution of en

## Apache Arrow on GPU

The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.
The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.
Binary file not shown.
Loading

0 comments on commit ff4ece0

Please sign in to comment.