JohnSnowLabs · C-K-Loan · Nov 17, 2023 · Nov 17, 2023 · Nov 17, 2023 · Nov 17, 2023
diff --git a/docs/_data/navigation.yml b/docs/_data/navigation.yml
@@ -330,6 +330,10 @@ jsl:
         url:    /docs/en/jsl/aws-emr-utils
       - title:  Utilities for AWS Glue
         url:    /docs/en/jsl/aws-glue-utils
+      - title:  Utilities for Haystack
+        url:    /docs/en/jsl/haystack-utils
+      - title:  Utilities for Langchain
+        url:    /docs/en/jsl/langchain-utils
       - title:  Release Testing Utilities
         url:    /docs/en/jsl/testing-utils
       - title:  Module Structure

diff --git a/docs/en/jsl/databricks_utils.md b/docs/en/jsl/databricks_utils.md
diff --git a/docs/en/jsl/haystack_utils.md b/docs/en/jsl/haystack_utils.md
@@ -0,0 +1,66 @@
+---
+layout: docs 
+seotitle: NLP | John Snow Labs
+title: Utilities for Haystack
+permalink: /docs/en/jsl/haystack-utils
+key: docs-install
+modify_date: "2020-05-26"
+header: true
+show_nav: true
+sidebar:
+  nav: jsl
+---
+
+<div class="main-docs" markdown="1">
+
+
+Johnsnowlabs provides the following nodes which can be used inside the [Haystack Framework](https://haystack.deepset.ai/) for scalable pre-processing&embedding on 
+[spark clusters](https://spark.apache.org/). With this you can create Easy-Scalable&Production-Grade LLM&RAG applications.
+See the [Haystack with Johnsnowlabs Tutorial Notebook](https://github.com/JohnSnowLabs/johnsnowlabs/blob/release/master/notebooks/haystack_with_johnsnowlabs.ipynb)
+
+## JohnSnowLabsHaystackProcessor
+Pre-Process you documents in a scalable fashion in Haystack
+based on [Spark-NLP's DocumentCharacterTextSplitter](https://sparknlp.org/docs/en/annotators#documentcharactertextsplitter) and supports all of it's [parameters](https://sparknlp.org/api/python/reference/autosummary/sparknlp/annotator/document_character_text_splitter/index.html#sparknlp.annotator.document_character_text_splitter.DocumentCharacterTextSplitter)
+
+```python
+# Create Pre-Processor which is connected to spark-cluster
+from johnsnowlabs.llm import embedding_retrieval
+processor = embedding_retrieval.JohnSnowLabsHaystackProcessor(
+    chunk_overlap=2,
+    chunk_size=20,
+    explode_splits=True,
+    keep_seperators=True,
+    patterns_are_regex=False,
+    split_patterns=["\n\n", "\n", " ", ""],
+    trim_whitespace=True,
+)
+# Process document distributed on a spark-cluster
+processor.process(some_documents)
+```
+
+## JohnSnowLabsHaystackEmbedder
+Scalable Embedding computation with [any Sentence Embedding](https://nlp.johnsnowlabs.com/models?task=Embeddings) from John Snow Labs in Haystack
+You must provide the **NLU reference** of a sentence embeddings to load it.
+If you want to use GPU with the Embedding Model, set GPU=True on localhost, it will start a spark-session with GPU jars.
+For clusters, you must setup cluster-env correctly, using [nlp.install_to_databricks()](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#into-a-freshly-created-databricks-cluster-automatically) is recommended.
+
+```python 
+from johnsnowlabs.llm import embedding_retrieval
+from haystack.document_stores import InMemoryDocumentStore
+
+# Write some processed data to Doc store, so we can retrieve it later
+document_store = InMemoryDocumentStore(embedding_dim=512)
+document_store.write_documents(some_documents)
+
+# Create Embedder which connects is connected to spark-cluster 
+retriever = embedding_retrieval.JohnSnowLabsHaystackEmbedder(
+    embedding_model='en.embed_sentence.bert_base_uncased',
+    document_store=document_store,
+    use_gpu=False,
+)
+
+# Compute Embeddings distributed in a cluster
+document_store.update_embeddings(retriever)
+
+```
+</div>
diff --git a/docs/en/jsl/install_advanced.md b/docs/en/jsl/install_advanced.md
@@ -177,8 +177,12 @@ Where to find your Databricks Access Token:
 You can set the following parameters on the `nlp.install()` function to define properties of the cluster which will be created.  
 See [Databricks Cluster Creation](https://docs.databricks.com/dev-tools/api/latest/clusters.html#create) for a detailed description of all parameters.
 
+You can use the `extra_pip_installs` parameter to installl a list of additional pypi libraries to the cluster. 
+Just set `nlp.install_to_databricks(extra_pip_installs=['langchain','farm-haystack==1.2.3'])` to install the libraries.
+
 | Cluster creation Parameter | Default Value                              | 
 |----------------------------|--------------------------------------------|
+| extra_pip_installs         | `None`                                     | 
 | block_till_cluster_ready   | `True`                                     | 
 | num_workers                | `1`                                        | 
 | cluster_name               | `John-Snow-Labs-Databricks-Auto-Cluster🚀` | 
@@ -390,7 +394,7 @@ Your can get it from:
 
 ``` python
 # Create a new Cluster with Spark NLP and all licensed libraries ready to go:
-nlp.install(databricks_host='https://your_host.cloud.databricks.com', databricks_token = 'dbapi_token123',)
+nlp.install_to_databricks(databricks_host='https://your_host.cloud.databricks.com', databricks_token = 'dbapi_token123',)
 ```
 </div><div class="h3-box" markdown="1">
 

diff --git a/docs/en/jsl/jsl_release_notes.md b/docs/en/jsl/jsl_release_notes.md
@@ -13,7 +13,25 @@ sidebar:
 
 <div class="main-docs" markdown="1">
 
-See [Github Releases](https://github.com/JohnSnowLabs/johnsnowlabs/releases) for detailed information on Release History and Featuresasdas
+See [Github Releases](https://github.com/JohnSnowLabs/johnsnowlabs/releases) for detailed information on Release History and Features
+
+
+## 5.1.8
+Release date: 17-11-2023
+
+The John Snow Labs 5.1.8 Library released with the following pre-installed and recommended dependencies
+
+
+| Library                                                                                 | Version |
+|-----------------------------------------------------------------------------------------|---------|
+| [Visual NLP](https://nlp.johnsnowlabs.com/docs/en/spark_ocr_versions/ocr_release_notes) | `5.0.2` |
+| [Enterprise NLP](https://nlp.johnsnowlabs.com/docs/en/licensed_annotators)              | `5.1.3` |
+| [Finance NLP](https://nlp.johnsnowlabs.com/docs/en/financial_release_notes)             | `1.X.X` |
+| [Legal NLP](https://nlp.johnsnowlabs.com/docs/en/legal_release_notes)                   | `1.X.X` |
+| [NLU](https://github.com/JohnSnowLabs/nlu/releases)                                     | `5.1.0` |
+| [Spark-NLP-Display](https://sparknlp.org/docs/en/display)                               | `4.4`   |
+| [Spark-NLP](https://github.com/JohnSnowLabs/spark-nlp/releases/)                        | `5.1.4` |
+| [Pyspark](https://spark.apache.org/docs/latest/api/python/)                             | `3.1.2` |
 
 ## 5.1.7
 Release date: 19-10-2023

diff --git a/docs/en/jsl/langchain_utils.md b/docs/en/jsl/langchain_utils.md
@@ -0,0 +1,94 @@
+---
+layout: docs 
+seotitle: NLP | John Snow Labs
+title: Utilities for Langchain
+permalink: /docs/en/jsl/langchain-utils
+key: docs-install
+modify_date: "2020-05-26"
+header: true
+show_nav: true
+sidebar:
+  nav: jsl
+---
+
+<div class="main-docs" markdown="1">
+
+
+
+
+
+Johnsnowlabs provides the following components which can be used inside the [Langchain Framework](https://www.langchain.com/) for scalable pre-processing&embedding on
+[spark clusters](https://spark.apache.org/) as Agent Tools and Pipeline components. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications.
+See the [Langchain with Johnsnowlabs Tutorial Notebook](https://github.com/JohnSnowLabs/johnsnowlabs/blob/release/master/notebooks/langchain_with_johnsnowlabs.ipynb)
+
+## JohnSnowLabsHaystackProcessor
+Pre-Process you documents in a scalable fashion in Langchain
+based on [Spark-NLP's DocumentCharacterTextSplitter](https://sparknlp.org/docs/en/annotators#documentcharactertextsplitter) and supports all of it's [parameters](https://sparknlp.org/api/python/reference/autosummary/sparknlp/annotator/document_character_text_splitter/index.html#sparknlp.annotator.document_character_text_splitter.DocumentCharacterTextSplitter)
+
+```python
+from langchain.document_loaders import TextLoader
+from johnsnowlabs.llm import embedding_retrieval
+
+loader = TextLoader('/content/state_of_the_union.txt')
+documents = loader.load()
+
+
+from johnsnowlabs.llm import embedding_retrieval
+
+# Create Pre-Processor which is connected to spark-cluster
+processor = embedding_retrieval.JohnSnowLabsLangChainCharSplitter(
+    chunk_overlap=2,
+    chunk_size=20,
+    explode_splits=True,
+    keep_seperators=True,
+    patterns_are_regex=False,
+    split_patterns=["\n\n", "\n", " ", ""],
+    trim_whitespace=True,
+)
+# Process document distributed on a spark-cluster
+pre_processed_docs = jsl_splitter.split_documents(documents)
+
+```
+
+## JohnSnowLabsHaystackEmbedder
+Scalable Embedding computation with [any Sentence Embedding](https://nlp.johnsnowlabs.com/models?task=Embeddings) from John Snow Labs.
+You must provide the **NLU reference** of a sentence embeddings to load it.
+You can start a spark session by setting `hardware_target` as one of `cpu`, `gpu`, `apple_silicon`, or `aarch` on localhost environments.
+For clusters, you must setup the cluster-env correctly, using [nlp.install_to_databricks()](https://nlp.johnsnowlabs.com/docs/en/jsl/install_advanced#into-a-freshly-created-databricks-cluster-automatically) is recommended.
+
+```python 
+# Create Embedder which connects is connected to spark-cluster
+from johnsnowlabs.llm import embedding_retrieval
+embeddings =  embedding_retrieval.JohnSnowLabsLangChainEmbedder('en.embed_sentence.bert_base_uncased',hardware_target='cpu')
+
+# Compute Embeddings distributed
+from langchain.vectorstores import FAISS
+retriever = FAISS.from_documents(pre_processed_docs, embeddings).as_retriever()
+
+# Create A tool
+from langchain.agents.agent_toolkits import create_retriever_tool
+tool = create_retriever_tool(
+  retriever,
+  "search_state_of_union",
+  "Searches and returns documents regarding the state-of-the-union."
+)
+
+
+# Use Create LLM Agent with the Tool 
+from langchain.agents.agent_toolkits import create_conversational_retrieval_agent
+from langchain.chat_models import ChatOpenAI
+llm = ChatOpenAI(openai_api_key='YOUR_API_KEY')
+agent_executor = create_conversational_retrieval_agent(llm, [tool], verbose=True)
+result = agent_executor({"input": "what did the president say about going to east of Columbus?"})
+result['output']
+
+>>>
+> Entering new AgentExecutor chain...
+Invoking: `search_state_of_union` with `{'query': 'going to east of Columbus'}`
+[Document(page_content='miles east of', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='in America.', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='out of America.', metadata={'source': '/content/state_of_the_union.txt'}), Document(page_content='upside down.', metadata={'source': '/content/state_of_the_union.txt'})]I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address.
+> Finished chain.
+I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address.
+```
+
+
+</div>
diff --git a/johnsnowlabs/__init__.py b/johnsnowlabs/__init__.py
@@ -13,6 +13,8 @@
 if try_import_lib("sparkocr") and try_import_lib("sparknlp"):
     from johnsnowlabs import visual
 
+from johnsnowlabs import llm
+
 
 def new_version_online():
     from .utils.pip_utils import get_latest_lib_version_on_pypi