Skip to content

Commit ff4ece0

Browse files
authored
Merge pull request #1495 from BlazingDB/branch-0.19
0.19 Release
2 parents 44aeef8 + 5964463 commit ff4ece0

File tree

2,671 files changed

+4492660
-4774
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,671 files changed

+4492660
-4774
lines changed

.github/ISSUE_TEMPLATE/bug_report.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@ A clear and concise description of what you expected to happen.
2323
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
2424
- Method of BlazingSQL install: [conda, Docker, or from source]
2525
- If method of install is [Docker], provide `docker pull` & `docker run` commands used
26+
- **BlazingSQL Version** which can be obtained by doing as follows:
27+
```
28+
import blazingsql
29+
print(blazingsql.__info__())
30+
```
2631

2732
**Environment details**
2833
Please run and paste the output of the `print_env.sh` script here, to gather any other relevant environment details

.github/workflows/build-docs.yml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# This is a basic workflow to help you get started with Actions
2+
3+
name: Build docs
4+
5+
# Controls when the action will run.
6+
on:
7+
push:
8+
branches:
9+
- main
10+
11+
# A workflow run is made up of one or more jobs that can run sequentially or in parallel
12+
jobs:
13+
# This workflow contains a single job called "build"
14+
docs:
15+
# The type of runner that the job will run on
16+
runs-on: ubuntu-latest
17+
18+
# Steps represent a sequence of tasks that will be executed as part of the job
19+
steps:
20+
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
21+
- uses: actions/checkout@v2
22+
23+
# Runs a single command using the runners shell
24+
- uses: mattnotmitt/doxygen-action@v1
25+
with:
26+
working-directory: 'docsrc/'
27+
doxyfile-path: 'source/Doxyfile'
28+
- uses: ammaraskar/sphinx-action@master
29+
with:
30+
build-command: "make html -e"
31+
docs-folder: "docsrc/"
32+
- name: Commit documentation changes
33+
run: |
34+
git clone https://github.com/romulo-auccapuclla/blazingsql.git --branch main --single-branch main
35+
cp -a docsrc/build/html/. docs/
36+
cd docs
37+
touch .nojekyll
38+
git config --local user.email "[email protected]"
39+
git config --local user.name "GitHub Action"
40+
git add .
41+
git commit -m "Update documentation" -a || true
42+
- name: Push changes
43+
uses: ad-m/github-push-action@master
44+
with:
45+
branch: main
46+
directory: docs
47+
force: true
48+
github_token: ${{ secrets.GITHUB_TOKEN }}

.gitignore

100644100755
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
.idea/
77

88
engine/cmake-build-debug/
9+
cmake-*
910

1011
algebra.log
1112

@@ -92,3 +93,12 @@ powerpc/blazingsql.tar.gz
9293
powerpc/developer/requirements.txt
9394
powerpc/developer/core
9495
powerpc/developer/blazingsql.tar.gz
96+
97+
# mac junk
98+
.DS_Store
99+
._.*
100+
101+
# docs build folders
102+
docsrc/build/
103+
docsrc/source/doxyfiles/
104+
docsrc/source/xml

.nojekyll

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

CHANGELOG.md

100644100755
Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,65 @@
1+
# BlazingSQL 0.19.0 (April 21, 2021)
2+
3+
## New Features
4+
- #1367 OverlapAccumulator Kernel
5+
- #1364 Implement the concurrent API (bc.sql with token, bc.status, bc.fetch)
6+
- #1426 Window Functions without partitioning
7+
- #1349 Add e2e test for Hive Partitioned Data
8+
- #1396 Create tables from other RDBMS
9+
- #1427 Support for CONCAT alias operator
10+
- #1424 Add get physical plan with explain
11+
12+
## Improvements
13+
- #1325 Refactored CacheMachine.h and CacheMachine.cpp
14+
- #1322 Updated and enabled several E2E tests
15+
- #1333 Fixing build due to cudf update
16+
- #1344 Removed GPUCacheDataMetadata class
17+
- #1376 Fixing build due to some strings refactor in cudf, undoing the replace workaround
18+
- #1430 Updating GCP to >= version
19+
- #1331 Added flag to enable null e2e testing
20+
- #1418 Adding support for docker image
21+
- #1434 Added documentation for C++ and Python in Sphinx
22+
- #1419 Added concat cache machine timeout
23+
- #1444 Updating GCP to >= version
24+
- #1349 Add e2e test for Hive Partitioned Data
25+
- #1447 Improve getting estimated output num rows
26+
- #1473 Added Warning to Window Functions
27+
- #1480 Improve dependencies script
28+
29+
## Bug Fixes
30+
- #1335 Fixing uninitialized var in orc metadata and handling the parseMetadata exceptions properly
31+
- #1339 Handling properly the nulls in case conditions with strings
32+
- #1346 Delete allocated host chunks
33+
- #1348 Capturing error messages due to exceptions properly
34+
- #1350 Fixed bug where there are no projects in a bindable table scan
35+
- #1359 Avoid cuda issues when free pinned memory
36+
- #1365 Fixed build after sublibs changes on cudf
37+
- #1369 Updated java path for powerpc build
38+
- #1371 Fixed e2e settings
39+
- #1372 Recompute `columns_to_hash` in DistributeAggregationKernel
40+
- #1375 Fix empty row_group_ids for parquet
41+
- #1380 Fixed issue with int64 literal values
42+
- #1379 Remove ProjectRemoveRule
43+
- #1389 Fix issue when CAST a literal
44+
- #1387 Skip getting orc metadata for decimal type
45+
- #1392 Fix substrings with nulls
46+
- #1398 Fix performance regression
47+
- #1401 Fix support for minus unary operation
48+
- #1415 Fixed bug where num_batches was not getting set in BindableTableScan
49+
- #1413 Fix for null tests 13 and 23 of windowFunctionTest
50+
- #1416 Fix full join when both tables contains nulls
51+
- #1423 Fix temporary directory for hive partition test
52+
- #1351 Fixed 'count distinct' related issues
53+
- #1425 Fix for new joins API
54+
- #1400 Fix for Column aliases when exists a Join op
55+
- #1456 Raising exceptions on Python side for RAL
56+
- #1466 SQL providers: update README.md
57+
- #1470 Fix pre compiler flags for sql parsers
58+
59+
60+
## Deprecated Features
61+
- #1394 Disabled support for outer joins with inequalities
62+
163
# BlazingSQL 0.18.0 (February 24, 2021)
264

365
## New Features
@@ -17,6 +79,10 @@
1779
- #1284 Initial support for Windows Function
1880
- #1303 Add support for INITCAP
1981
- #1313 getting and using ORC metadata
82+
- #1347 Fixing issue when reading orc metadata from DATE dtype
83+
- #1338 Window Function support for LEAD and LAG statements
84+
- #1362 give useful message when file extension is not recognized
85+
- #1361 Supporting first_value and last_value for Window Function
2086

2187

2288
## Improvements
@@ -38,7 +104,7 @@
38104
- #1314 Added unit tests to verify that OOM error handling works well
39105
- #1320 Revamping cache logger
40106
- #1323 Made progress bar update continuously and stay after query is done
41-
107+
- #1336 Improvements for the cache API
42108

43109
## Bug Fixes
44110
- #1249 Fix compilation with cuda 11
@@ -52,7 +118,6 @@
52118
- #1312 Fix progress bar for jupyterlab
53119
- #1318 Disabled require acknowledge
54120

55-
56121
# BlazingSQL 0.17.0 (December 10, 2020)
57122

58123
## New Features

Dockerfile

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
ARG CUDA_VER="10.2"
2+
ARG UBUNTU_VERSION="16.04"
3+
FROM nvidia/cuda:${CUDA_VER}-runtime-ubuntu${UBUNTU_VERSION}
4+
LABEL Description="blazingdb/blazingsql is the official BlazingDB environment for BlazingSQL on NIVIDA RAPIDS." Vendor="BlazingSQL" Version="0.4.0"
5+
6+
ARG CUDA_VER=10.2
7+
ARG CONDA_CH="-c blazingsql -c rapidsai -c nvidia"
8+
ARG PYTHON_VERSION="3.7"
9+
ARG RAPIDS_VERSION="0.18"
10+
11+
SHELL ["/bin/bash", "-c"]
12+
ENV PYTHONDONTWRITEBYTECODE=true
13+
14+
RUN apt-get update -qq && \
15+
apt-get install curl git -yqq --no-install-recommends && \
16+
apt-get clean -y && \
17+
curl -s -o /tmp/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
18+
bash /tmp/miniconda.sh -bfp /usr/local/ && \
19+
rm -rf /tmp/miniconda.sh && \
20+
conda create --no-default-packages python=${PYTHON_VERSION} -y -n bsql && \
21+
conda install -y --freeze-installed -n bsql \
22+
${CONDA_CH} \
23+
-c conda-forge -c defaults \
24+
cugraph=${RAPIDS_VERSION} cuml=${RAPIDS_VERSION} \
25+
cusignal=${RAPIDS_VERSION} \
26+
cuspatial=${RAPIDS_VERSION} \
27+
cuxfilter clx=${RAPIDS_VERSION} \
28+
python=${PYTHON_VERSION} cudatoolkit=${CUDA_VER} \
29+
blazingsql=${RAPIDS_VERSION} \
30+
jupyterlab \
31+
networkx statsmodels xgboost scikit-learn \
32+
geoviews seaborn matplotlib holoviews colorcet && \
33+
conda clean -afy && \
34+
rm -rf /var/cache/apt /var/lib/apt/lists/* /tmp/miniconda.sh /usr/local/pkgs/* && \
35+
rm -rf /usr/local/envs/bsql/conda-meta && \
36+
rm -rf /usr/local/envs/bsql/include && \
37+
rm /usr/local/envs/bsql/lib/libpython3.7m.so.1.0 && \
38+
find /usr/local/envs/bsql -name '__pycache__' -type d -exec rm -rf '{}' '+' && \
39+
find /usr/local/envs/bsql -follow -type f -name '*.pyc' -delete && \
40+
rm -rf /usr/local/envs/bsql/lib/libasan.so.5.0.0 \
41+
/usr/local/envs/bsql/lib/libtsan.so.0.0.0 \
42+
/usr/local/envs/bsql/lib/liblsan.so.0.0.0 \
43+
/usr/local/envs/bsql/lib/libubsan.so.1.0.0 \
44+
/usr/local/envs/bsql/bin/x86_64-conda-linux-gnu-ld \
45+
/usr/local/envs/bsql/bin/sqlite3 \
46+
/usr/local/envs/bsql/bin/openssl \
47+
/usr/local/envs/bsql/share/terminfo \
48+
/usr/local/envs/bsql/bin/postgres \
49+
/usr/local/envs/bsql/bin/pg_* \
50+
/usr/local/envs/bsql/man \
51+
/usr/local/envs/bsql/qml \
52+
/usr/local/envs/bsql/qsci \
53+
/usr/local/envs/bsql/mkspecs && \
54+
find /usr/local/envs/bsql/lib/python3.7/site-packages -name 'tests' -type d -exec rm -rf '{}' '+' && \
55+
find /usr/local/envs/bsql/lib/python3.7/site-packages -name '*.pyx' -delete && \
56+
find /usr/local/envs/bsql -name '*.c' -delete && \
57+
git clone --branch=master https://github.com/BlazingDB/Welcome_to_BlazingSQL_Notebooks /blazingsql && \
58+
rm -rf /blazingsql/.git && \
59+
mkdir /.local /.jupyter /.cupy && chmod 777 /.local /.jupyter /.cupy
60+
61+
WORKDIR /blazingsql
62+
COPY run_jupyter.sh /blazingsql
63+
64+
# Jupyter
65+
EXPOSE 8888
66+
CMD ["/blazingsql/run_jupyter.sh"]
67+

README.md

Lines changed: 32 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -99,13 +99,14 @@ conda install -c blazingsql -c rapidsai -c nvidia -c conda-forge -c defaults bla
9999
```
100100

101101
## Nightly Version
102+
For nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements
102103
```bash
103104
conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=$PYTHON_VERSION cudatoolkit=$CUDA_VERSION
104105
```
105-
Where $CUDA_VERSION is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8
106-
*For example for CUDA 10.1 and Python 3.7:*
106+
Where $CUDA_VERSION is 11.0 or 11.2 and $PYTHON_VERSION is 3.7 or 3.8
107+
*For example for CUDA 11.2 and Python 3.8:*
107108
```bash
108-
conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=3.7 cudatoolkit=10.1
109+
conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=3.8 cudatoolkit=11.2
109110
```
110111

111112
# Build/Install from Source (Conda Environment)
@@ -117,18 +118,14 @@ This is the recommended way of building all of the BlazingSQL components and dep
117118
```bash
118119
conda create -n bsql python=$PYTHON_VERSION
119120
conda activate bsql
120-
conda install --yes -c conda-forge spdlog=1.7.0 google-cloud-cpp=1.16 ninja
121-
conda install --yes -c rapidsai -c nvidia -c conda-forge -c defaults dask-cuda=0.18 dask-cudf=0.18 cudf=0.18 ucx-py=0.18 ucx-proc=*=gpu python=3.7 cudatoolkit=$CUDA_VERSION
122-
conda install --yes -c conda-forge cmake=3.18 gtest gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive tqdm ipywidgets
121+
./dependencies.sh 0.19 $CUDA_VERSION
123122
```
124123
Where $CUDA_VERSION is is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8
125124
*For example for CUDA 10.1 and Python 3.7:*
126125
```bash
127126
conda create -n bsql python=3.7
128127
conda activate bsql
129-
conda install --yes -c conda-forge spdlog=1.7.0 google-cloud-cpp=1.16 ninja
130-
conda install --yes -c rapidsai -c nvidia -c conda-forge -c defaults dask-cuda=0.18 dask-cudf=0.18 cudf=0.18 ucx-py=0.18 ucx-proc=*=gpu python=3.7 cudatoolkit=10.1
131-
conda install --yes -c conda-forge cmake=3.18 gtest gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive tqdm ipywidgets
128+
./dependencies.sh 0.19 10.1
132129
```
133130

134131
### Build
@@ -149,21 +146,18 @@ $CONDA_PREFIX now has a folder for the blazingsql repository.
149146
## Nightly Version
150147

151148
### Install build dependencies
149+
For nightly version cuda 11+ are only supported, see https://github.com/rapidsai/cudf#cudagpu-requirements
152150
```bash
153151
conda create -n bsql python=$PYTHON_VERSION
154152
conda activate bsql
155-
conda install --yes -c conda-forge spdlog=1.7.0 google-cloud-cpp=1.16 ninja
156-
conda install --yes -c rapidsai-nightly -c nvidia -c conda-forge -c defaults dask-cuda=0.19 dask-cudf=0.19 cudf=0.19 ucx-py=0.19 ucx-proc=*=gpu python=3.7 cudatoolkit=$CUDA_VERSION
157-
conda install --yes -c conda-forge cmake=3.18 gtest==1.10.0=h0efe328_4 gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive tqdm ipywidgets
153+
./dependencies.sh 0.20 $CUDA_VERSION nightly
158154
```
159-
Where $CUDA_VERSION is is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8
160-
*For example for CUDA 10.1 and Python 3.7:*
155+
Where $CUDA_VERSION is 11.0 or 11.2 and $PYTHON_VERSION is 3.7 or 3.8
156+
*For example for CUDA 11.2 and Python 3.8:*
161157
```bash
162-
conda create -n bsql python=3.7
158+
conda create -n bsql python=3.8
163159
conda activate bsql
164-
conda install --yes -c conda-forge spdlog=1.7.0 google-cloud-cpp=1.16 ninja
165-
conda install --yes -c rapidsai-nightly -c nvidia -c conda-forge -c defaults dask-cuda=0.19 dask-cudf=0.19 cudf=0.19 ucx-py=0.19 ucx-proc=*=gpu python=3.7 cudatoolkit=10.1
166-
conda install --yes -c conda-forge cmake=3.18 gtest==1.10.0=h0efe328_4 gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive tqdm ipywidgets
160+
./dependencies.sh 0.20 11.2 nightly
167161
```
168162

169163
### Build
@@ -196,18 +190,34 @@ To build without the storage plugins (AWS S3, Google Cloud Storage) use the next
196190
```
197191
NOTE: By disabling the storage plugins you don't need to install previously AWS SDK C++ or Google Cloud Storage (neither any of its dependencies).
198192

193+
#### SQL providers
194+
To build without the SQL providers (MySQL, PostgreSQL, SQLite) use the next arguments:
195+
```bash
196+
# Disable all SQL providers
197+
./build.sh disable-mysql disable-sqlite disable-postgresql
198+
199+
# Disable MySQL provider
200+
./build.sh disable-mysql
201+
202+
...
203+
```
204+
NOTES:
205+
- By disabling the storage plugins you don't need to install mysql-connector-cpp=8.0.23 libpq=13 sqlite=3 (neither any of its dependencies).
206+
- Currenlty we support only MySQL. but PostgreSQL and SQLite will be ready for the next version!
207+
199208
# Documentation
200209
User guides and public APIs documentation can be found at [here](https://docs.blazingdb.com/docs)
201210

202211
Our internal code architecture can be built using Spinx.
203212
```bash
204-
pip install recommonmark exhale
205213
conda install -c conda-forge doxygen
206214
cd $CONDA_PREFIX
207-
cd blazingsql/docs
215+
cd blazingsql/docsrc
216+
pip install -r requirements.txt
217+
make doxygen
208218
make html
209219
```
210-
The generated documentation can be viewed in a browser at `blazingsql/docs/_build/html/index.html`
220+
The generated documentation can be viewed in a browser at `blazingsql/docsrc/build/html/index.html`
211221

212222

213223
# Community
@@ -230,4 +240,4 @@ The RAPIDS suite of open source software libraries aim to enable execution of en
230240

231241
## Apache Arrow on GPU
232242

233-
The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.
243+
The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.
Binary file not shown.

0 commit comments

Comments
 (0)