Spark NLP 5.5.0: Launching Llama.cpp Integration, Llama3, QWEN2, Phi-3, StarCoder2, MiniCPM, NLLB, Nomic, Snowflake, MxBai, more ONNX and OpenVino integrations, more than 50,000 new models, and many more! #14417

maziyarpanahi · 2024-09-25T20:20:27Z

maziyarpanahi
Sep 25, 2024
Maintainer

📢 Spark NLP 5.5.0: Unlocking New Horizons with Llama.cpp Integration and More!

We're thrilled to announce the release of Spark NLP 5.5.0, a groundbreaking update that pushes the boundaries of natural language processing! This release is packed with exciting new features, optimizations, and integrations that will transform your NLP workflows. At the heart of this update is our game-changing integration with Llama.cpp, but that's just the beginning of what's in store!

🌟 Spotlight Feature: Llama.cpp Integration

Introducing Llama.cpp Integration: A New Era of Efficient Language Models!

We're proud to present the centerpiece of Spark NLP 5.5.0: the integration of Llama.cpp! This revolutionary addition brings unparalleled efficiency and performance to large language models within the Spark NLP ecosystem.

Optimized Performance: Llama.cpp's C/C++ implementation allows for blazing-fast inference on CPUs, making large language models more accessible than ever.
Reduced Memory Footprint: Enjoy the power of advanced language models with significantly lower RAM requirements.
Quantization Support: Take advantage of various quantization options to further optimize model size and speed without sacrificing quality.
Seamless Integration: Easily incorporate Llama.cpp models into your existing Spark NLP pipelines with our new AutoGGUFModel annotator.

This integration opens up new possibilities for deploying state-of-the-art language models in resource-constrained environments, making advanced NLP capabilities available to a wider range of applications and users.

We extend our heartfelt thanks to all contributors who made this release possible. Your innovative ideas, code contributions, and feedback continue to drive Spark NLP forward. Our Models Hub now contains over 83,000+ free and truly open-source models & pipelines. 🎉

🔥 New Features & Enhancements

Introducing QWEN2Transformer

We have added the QWEN2Transformer annotator, supporting the Qwen-2 model architecture known for its efficiency and performance in various NLP tasks like text generation and summarization.

View Pull Request

Introducing MiniCPM

The MiniCPM annotator is now available, providing support for the MiniCPM model designed for efficient language modeling with smaller parameter sizes without compromising performance.

View Pull Request

Introducing NLLB (No Language Left Behind)

We are excited to include the NLLB annotator, supporting No Language Left Behind models aimed at providing high-quality machine translation capabilities for a wide range of languages, especially low-resource languages.

View Pull Request

Implementing Nomic Embeddings

Introducing support for Nomic Embeddings, which provide robust semantic representations for downstream tasks like clustering and classification.

View Pull Request

Snowflake Integration

We have implemented integration with Snowflake, allowing seamless data transfer and processing between Spark NLP and Snowflake data warehouses.

View Pull Request

Introducing CamemBertForZeroShotClassification

The CamemBertForZeroShotClassification annotator is now available, enabling zero-shot classification capabilities using the CamemBERT model, optimized for French language processing.

View Pull Request

Implementing MxBai Embeddings

We have added support for MxBaiEmbeddings, providing embeddings from the MxBai model designed for multilingual text representation.

View Pull Request

ONNX Support for Vision Annotators

We have extended ONNX support to our vision annotators, allowing for optimized and accelerated inference for image-related NLP tasks.

View Pull Request

OpenVINO and ONNX Support for Additional Annotators

Building upon our commitment to performance optimization, we have added OpenVINO and ONNX support to several additional annotators, ensuring you can leverage hardware acceleration across a broader range of models.

View Pull Request

Introducing AlbertForZeroShotClassification

We are excited to introduce the AlbertForZeroShotClassification annotator, bringing zero-shot classification capabilities using the ALBERT model known for its parameter efficiency and strong performance.

View Pull Request

Introducing Phi-3

We have integrated Phi-3 models into Spark NLP, providing enhanced performance with high-efficiency quantization, supporting INT4 and INT8 quantization for CPUs via OpenVINO.

View Pull Request

Introducing StarCoder2 for Causal Language Modeling

The StarCoder2 model is now supported for causal language modeling tasks, enabling advanced code generation and understanding capabilities.

View Pull Request

Introducing LLAMA 3

Continuing our support for the latest in language modeling, we have introduced support for LLAMA 3, bringing the latest advancements in the LLaMA model series to Spark NLP.

View Pull Request

🐛 Bug Fixes

OpenVINO Installation Instructions: Updated the installation instructions for OpenVINO to ensure a smoother setup process.

View Pull Request

Fixed Default Auto GGUF Pretrained Model: Addressed issues with the default auto GGUF pretrained model in the Llama.cpp integration.

View Pull Requests, View Pull Request

Updated Models Hub: Improved the Models Hub for better accessibility and search functionality.

View Pull Requests, View Pull Request, View Pull Request

Artifact Creation Optimization: Switched to using 7zip instead of vimtor/action-zip for creating artifacts to enhance compatibility and performance.

View Pull Request

📦 Dependencies

Published New OpenVINO Artifacts: Built and published new OpenVINO artifacts for both CPU and GPU to enhance performance and compatibility.
Upgraded ONNX Runtime: Updated onnxruntime to the latest version for improved stability and performance on both CPU and GPU.

📝 Models

We have added more than 50,000 new models and pipelines. The complete list of all 83,000+ models & pipelines in 230+ languages is available on our Models Hub.

❤️ Community support

Slack For live discussion with the Spark NLP community and the team
GitHub Bug reports, feature requests, and contributions
Discussions Engage with other community members, share ideas,
and show off how you use Spark NLP!
Medium Spark NLP articles
JohnSnowLabs official Medium
YouTube Spark NLP video tutorials

Installation

Python

#PyPI

pip install spark-nlp==5.5.0

Spark Packages

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.0

pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:5.5.0

GPU

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.5.0

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:5.5.0

Apple Silicon (M1 & M2)

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.5.0

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-silicon_2.12:5.5.0

AArch64

spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.5.0

pyspark --packages com.johnsnowlabs.nlp:spark-nlp-aarch64_2.12:5.5.0

Maven

spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp_2.12</artifactId>
    <version>5.5.0</version>
</dependency>

spark-nlp-gpu:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-gpu_2.12</artifactId>
    <version>5.5.0</version>
</dependency>

spark-nlp-silicon:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-silicon_2.12</artifactId>
    <version>5.5.0</version>
</dependency>

spark-nlp-aarch64:

<dependency>
    <groupId>com.johnsnowlabs.nlp</groupId>
    <artifactId>spark-nlp-aarch64_2.12</artifactId>
    <version>5.5.0</version>
</dependency>

FAT JARs

CPU on Apache Spark 3.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-assembly-5.5.0.jar
GPU on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-gpu-assembly-5.5.0.jar
M1 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-silicon-assembly-5.5.0.jar
AArch64 on Apache Spark 3.0.x/3.1.x/3.2.x/3.3.x/3.4.x: https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/jars/spark-nlp-aarch64-assembly-5.5.0.jar

What's Changed

SparkNLP 997 Introducing QWEN2Transformer by @prabod in SparkNLP 997 Introducing QWEN2Transformer #14188
SparkNLP 1004 - Introducing MiniCPM by @prabod in SparkNLP 1004 - Introducing MiniCPM #14205
SparkNLP 1018 - Introducing NLLB by @prabod in SparkNLP 1018 - Introducing NLLB #14209
SparkNLP 1005 implement nomic embeddings by @prabod in SparkNLP 1005 implement nomic embeddings #14217
implementing SnowFlake by @ahmedlone127 in implementing SnowFlake #14353
Introducing CamemBertForZeroShotClassification annotator by @danilojsl in Introducing CamemBertForZeroShotClassification annotator #14354
Implementing Mxbai Embeddings by @ahmedlone127 in Implementing Mxbai Embeddings #14355
Introducing onnx support to vision annotators by @ahmedlone127 in Introducing onnx support to vision annotators #14356
Introducing onnx and OpenVino support to Missing Annotators by @ahmedlone127 in Introducing onnx and OpenVino support to Missing Annotators #14359
[SPARKNLP-855] Introducing AlbertForZeroShotClassification by @danilojsl in [SPARKNLP-855] Introducing AlbertForZeroShotClassification #14361
SparkNLP introducing Phi-3 by @prabod in SparkNLP introducing Phi-3 #14373
OpenVINO install instructions by @DevinTDHa in OpenVINO install instructions #14382
SPARKNLP 1034 implement starcoder2 for causal lm by @prabod in SPARKNLP 1034 implement starcoder2 for causal lm #14358
SPARKNLP Introducing LLAMA 3 by @prabod in SPARKNLP Introducing LLAMA 3 #14379
550 rc export notebooks by @prabod in 550 rc export notebooks #14393
[SPARKNLP-1027] llama.cpp integration by @DevinTDHa in [SPARKNLP-1027] llama.cpp integration #14364
Models hub by @maziyarpanahi in Models hub #14383
Update create_search_index.yml by @pabla in Update create_search_index.yml #14406
Adding openvino support to missing annotators by @ahmedlone127 in Adding openvino support to missing annotators #14390
[SPARKNLP-1027] Change Default AutoGGUF pretrained model by @DevinTDHa in [SPARKNLP-1027] Change Default AutoGGUF pretrained model #14411
[SPARKNLP-1027] Fix issue with pretrained model by @DevinTDHa in [SPARKNLP-1027] Fix issue with pretrained model #14413
Use 7zip instead of vimtor/action-zip for creating artifacts by @pabla in Use 7zip instead of vimtor/action-zip for creating artifacts #14414
Models hub by @maziyarpanahi in Models hub #14416
release/550-release-candidate by @maziyarpanahi in release/550-release-candidate #14389

Full Changelog: 5.4.2...5.5.0

This discussion was created from the release Spark NLP 5.5.0: Launching Llama.cpp Integration, Llama3, QWEN2, Phi-3, StarCoder2, MiniCPM, NLLB, Nomic, Snowflake, MxBai, more ONNX and OpenVino integrations, more than 50,000 new models, and many more!.

jonathanapp · 2024-10-24T15:42:17Z

jonathanapp
Oct 24, 2024

Has LLM support or Phi-3 in particular been tested on a non-trivially sized number of rows, or for multi-node clusters?

I ask because in initial evaluation (following this example), it works for 100 or maybe 1000 rows, but never for more.

For example, for 10K rows (which is a tiny Spark dataframe!), it spins for an hour with no result. I see no cluster upsize events (this is on Databricks).

2 replies

maziyarpanahi Oct 24, 2024
Maintainer Author

Which Phi3 feature are you referring to—the ONNX/OpenVINO integration or the GGUF?

As for the specs, I have personally tested the AutoGGUFModel, which uses Phi3 by default, on both CPU and GPU clusters. The number of rows doesn’t pose an issue since everything runs in parallel, preventing bottlenecks. Whether you’re dealing with 10, 100, or 10 million rows, performance remains the same, as long as the cluster is optimized for this kind of workload.

I recommend monitoring the Spark UI closely and being mindful of potential data skew. Some inputs may be disproportionately long or short, causing parts of the cluster to remain idle. Additionally, ensure data is sufficiently partitioned to allow proper scaling.

There are many factors to consider, especially with compute-intensive LLMs. If the model uses long max tokens, processing times can significantly increase.

In summary, the GGUF has been thoroughly tested at scale. If you encounter any issues, please open a new ticket and provide the necessary details to help us reproduce the problem.

jonathanapp Oct 24, 2024

GGUF.

Thanks for the detailed response. This suggests there is something in our a) partitioning or b) cluster sizing. Glad to hear it's performant at scale! Will report back if I find reproducible issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark NLP 5.5.0: Launching Llama.cpp Integration, Llama3, QWEN2, Phi-3, StarCoder2, MiniCPM, NLLB, Nomic, Snowflake, MxBai, more ONNX and OpenVino integrations, more than 50,000 new models, and many more! #14417

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Spark NLP 5.5.0: Launching Llama.cpp Integration, Llama3, QWEN2, Phi-3, StarCoder2, MiniCPM, NLLB, Nomic, Snowflake, MxBai, more ONNX and OpenVino integrations, more than 50,000 new models, and many more! #14417

maziyarpanahi Sep 25, 2024 Maintainer

📢 Spark NLP 5.5.0: Unlocking New Horizons with Llama.cpp Integration and More!

🌟 Spotlight Feature: Llama.cpp Integration

Introducing Llama.cpp Integration: A New Era of Efficient Language Models!

🔥 New Features & Enhancements

Introducing QWEN2Transformer

Introducing MiniCPM

Introducing NLLB (No Language Left Behind)

Implementing Nomic Embeddings

Snowflake Integration

Introducing CamemBertForZeroShotClassification

Implementing MxBai Embeddings

ONNX Support for Vision Annotators

OpenVINO and ONNX Support for Additional Annotators

Introducing AlbertForZeroShotClassification

Introducing Phi-3

Introducing StarCoder2 for Causal Language Modeling

Introducing LLAMA 3

🐛 Bug Fixes

📦 Dependencies

📝 Models

❤️ Community support

Installation

What's Changed

Replies: 1 comment · 2 replies

jonathanapp Oct 24, 2024

maziyarpanahi Oct 24, 2024 Maintainer Author

jonathanapp Oct 24, 2024

maziyarpanahi
Sep 25, 2024
Maintainer

Replies: 1 comment 2 replies

jonathanapp
Oct 24, 2024

maziyarpanahi Oct 24, 2024
Maintainer Author