Low latency, High Accuracy, Custom Query routers.

Why Route0x? Query Routing cannot be fully solved just with Text classification or just with Semantic similarity in isolation. Here is why: The query "How can I improve my credit score?" is In-Domain (ID) but Out-of-scope (OOS) for a banking chatbot and Out-Of-Domain (OOD) OOS for a e-Commerce chatbot. A good query router should be able to learn to accept ID In-Scope queries and Reject both ID OOS and OOD OOS. The research literature is vast and it hints Chatbots (or more formally) Task-Oriented Dialogue Systems (TODS) and Goal Oriented Dialogue Systems (GODS) have been grappling with this hard problem for a long time now.

route0x is a production-grade solution that takes a comprehensive approach to query routing, leveraging LLMs while optimizing for cost per query ($/query). 0x your latency, cost and learning curve.

Hitherto, route0x is the only (practically) free, low latency (no external I/O needed), DIY + Improvable, low shot (only 2 user samples per route/intent needed), production-grade solution, that is FT for only 100 steps that matches or beats contemporay Query Routing Researches and Solutions. We tested it on 8 different TODS datasets (some are deployed systems) from different domains, granularity and complexity against 2 different researches and 1 library. (detailed table below).*

Contributions

Loss function - Variant of large scale angular margin loss .
Unsupervised OOS detection - novel combination.
Uncertainity based fall-back mechanism - via NN.
Simple trick to do Multi-vector (late interaction) reranking.

Ablations suggest each of these items contribute to the results.

Check out the highlight reel of empirical evals and/or even dig deep with more numbers or get your hands-on with the starter notebook.

Route0x: Getting Started

We've disetangled the resource heavy route building (entails training) from query routing (entails inference)

Training

pip install route0x[build] # (Built with SWEs as target audience)

Build a query router for your routes - Starter Notebook

Inference

pip install route0x[route] # (Light-weight without any heavy torch dependencies)

from route0x.route_finder import RouteFinder
query_router = RouteFinder(<your_route0x_model_path>)
route_obj = query_router.find_route(<your-query>)

Route0x Evals: A Highlight reel.

Dataset	Domain	Has OOS?	Type	Test Size
Curekart	E-commerce	Yes	Real-world TODS queries	991
Powerplay11	Online Gaming	Yes	Real-world TODS queries	983
Sofmattress	Furniture	Yes	Real-world TODS queries	397
CLINC150 OOS - Banking	Banking	Yes	Expert / Synthetic	1850
CLINC150 OOS - Credit Cards	Credit Cards	Yes	Expert / Synthetic	1850
Banking 77	Banking	Yes	Expert / Synthetic	4076
ACID	Insurance	No	Real-world TODS queries	11042
ROSTD	Personal Assistant	Yes	Expert written	11711

1. Route0x vs Amazon Research 2024: HINT3 OOS Dataset

HINT3 Datasets

Goal of this comparison: To show that Route0x beats SetFit + NA and Hosted LLMs in OOS recall while offering best-in class F1 scores across datasets while having miniscule latency needs.

P50 Latency: p50 Latency numbers in the Amazon paper is unclear on couple of levels: 1.) it combines latencies between hosted LLMs and local setfit inferences. For latter case hardware specs are not mentioned 2.) It uses an average across datasets. Our latency numbers based on CPUs ran a Mac M1 Pro 16" machine. While we acknowledge it is not directly comparable (as the hardware specs could be different), but our is assertion if dataset specific route0x p50 latency is lower than a conservative latency estimate of an average across datasets it should be definitely lower than a dataset specific latency. This is shown to just to give a ballpark.

Caveat: Route0x uses SetFit as a base. But not as-is, we added a tweak on how we use it and a few more innovations on top of SetFit. (Jump to "How it works"). But unlike the SetFit + NA option suggested in the paper, We do not perturb positive samples to create hard negatives nor we used any special prompting or CoT for intent/OOS detection. We employ a straight forward Low-shot learning regime that only needs 2-samples from real dataset to simulate real world query routing users (who might find it hard to offer more samples for each route of interest). We augment those 2 samples into to 12 effective samples with LLMs, more on this later. Also we train only for 100 steps. The LLMs used in the paper as for as we can tell are hosted APIs hence by design suffers high latency due to network I/O and incurs $ which makes it infeasible for query routing which might touch many queries even if that is reserved for uncertainity predictions.

Note: All numbers are based on CPU, MPS/CUDA GPU device. Numbers can slightly vary based on the device and seeds. As the paper shows only the best numbers (without any notion of variations denoted usually with ±), we also show the best numbers in this comparison.

2. Route0x vs Salesforce Research 2022: CLINC OOS Dataset

Salesforce Datasets

Goal of this comparison: To show that Route0x offers robust in-scope accuracy in the presence of adversarial ID OOS and OOD OOS, while having miniscule latency needs.

Caveat: The Route0x base model is all-mpnet-base-v2 but it is compared against purpose-built architectures like TOD-BERT which are architected for and trained on TODS style intent detection and yet we outperform them.

Note: All numbers are based on MPS GPU device. Numbers can slightly vary based on the device and seeds. As the paper shows the all numbers with uncertainity we present numbers from 3 runs and denote the variations with ±.

3. Route0x vs Aurelio Labs' Semantic Router

Goal of this comparison: To show that Route0x beats pure embedding similarity + threshold based approach for query routing, while having miniscule latency needs.

Note: All the above numbers are with use_multivec_reranking = False, (explore below sections for more details)

BGE-BASE - BAAI/bge-base-en-v1.5,
MPNET - sentence-transformers/all-mpnet-base-v2

I want to know how it works

Data:For synthetic data generation we leverage both open weight and hosted LLMs. For Open LLMs we integrate ollama and for hosted LLMs we use respective LLM provider libraries.

Training: As the image suggests we have multiple heads to the same model: A setfit model with default logistic regression head, a calibrated head, a LOF and IF head and a faiss index of NNs and a numpy compressed file of multi-vector representation of the NN. Classifier heads will take the first pass, LOF/IF heads help in OOS detection and NNs can help for classifier uncertainity, multi-vector representations help in reranking candidates.

So when run find_route

route_obj = query_router.find_route(query="How can we build an Alarm clock from scratch?")

route0x system combines all of its heads returns route_obj

{
  'is_oos': True,                            # LOF / IF head output
 'query': 'How can we build an Alarm clock from scratch?',
 'route_id': 1,
 'route_name': 'NO_NODES_DETECTED',          # default label for OOS intents
 'prob': 0.56,                               # Model confidence
 'majority_voted_route': 'NO_NODES_DETECTED' # Majority voted NN
 'mean_distance_from_majority_route': 1.1501572
 }

default flow is controlled by return_raw_scores, when False.

route_obj = query_router.find_route(<your_query>, return_raw_scores= False)

if route['is_oos']:
    if prob < model_confidence_threshold_for_using_outlier_head:
        predicted_route = oos_label
else:
    if prob <= model_uncertainity_threshold_for_using_nn:
        predicted_route = route["majority_voted_route"]

You can add your own fallback flow in 2 ways:

a.) Add your own thresholds:

query_router = RouteFinder(<your_route0x_model_path>, 
                          use_calibrated_head= True,
                          model_confidence_threshold_for_using_outlier_head = <your_value>,
                          model_uncertainity_threshold_for_using_nn= <your_value>
                          )

b.) Add your custom logic set return_raw_scores = True and do post-hoc. if you are wondering why this might be useful. There are few datasets where outliers are not a concern so you can entirely ignore the LOF/IF head's output and only use the classifier prediction + uncertainity routing based on NNs.

How can you come up with sensible values for model_confidence_threshold_for_using_outlier_head and model_uncertainity_threshold_for_using_nn for your dataset/domain ?

We have added an experimental feature to offer a confidence_trend (confidence_trend.png in your route0x model folder), which can give a good idea as what should be these values.

What is use_multivec_reranking and how it works ?

You can read more on this here in this awesome work by Kacper Łukawski. We took the basic idea and repurposed it. It can lift your overall peformance for a small tradeoff in latency.

image courtesy: qdrant website

I want to understand the knobs and tinker with it:

Take all these numbers as sensible defaults which (surprisingly) works well across datasets, but at the same time be sure to experiment for your usecase as model training is a subjective and nuanced endeavour. Small tweaks to make or break peformance for your usecase.

Key knobs for building

'loss_funct_name': 'PairwiseArcFaceFocalLoss',   # If you need Large Angular Margin + Scale between your routes.
'llm_name': 'llama3.1',   # Use local or hosted LLMs. For hosted option, use provider given model names as-is.
'min_samples': 12, # Recommended.
'samples_per_route': 30, # Recommended, trying 50 can help in some cases.
'expected_oos_proportion': 0.1, # Unless you have extra domain knowledge on OOS dont touch this. if you dont except any OOS use 0.01.
'nn_for_oos_detection': 10, # # Unless you have extra domain knowledge on OOS dont touch this. if you dont except any OOS use 5. 
'model_name': 'sentence-transformers/all-mpnet-base-v2',# Recommended english use-cases.
'add_additional_invalid_routes': False, # enable to stop specific intents like chitchat/profanity 
"invalid_routes": ["gibberish", "mojibake", "chitchat", "non-english", "profanity"],
'instruct_llm': '' # To send additional instructions for better synthetic data gen.

Key knobs for routing

  'use_calibrated_head':    # Default is False, Try both
  'return_raw_scores':      # Default is False, Try True for using custom fallback flow.
  'use_multivec_reranking': # False, Try True for squeezing extra fallback performance for a tiny extra latency.
  'max_length': 24,         # If use_multivec_reranking = False, 24 is good, else use longer based on your median query token size.
  'model_confidence_threshold_for_using_outlier_head': 0.9, # Recommended, (See "How it works" for more)
  'model_uncertainity_threshold_for_using_nn': 0.5, # Recommended, (See "How it works" for more)

Full set of knobs for building

{
  'train_path': None,
 'eval_path': None,
 'model_name': 'sentence-transformers/all-mpnet-base-v2',
 'epochs': 1,
 'max_steps': 100,
 'batch_size': 16,
 'output_dir': 'output',
 'seed': 1234,
 'lr': 2e-05,
 'loss_funct_name': 'PairwiseArcFaceFocalLoss',
 'warmup_proportion': 0.05,
 'min_samples': 12,
 'samples_per_route': 50,
 'routes': None,
 'route_template': '{}',
 'domain': 'personal assistant',
 'llm_name': 'llama3.1',
 'instruct_llm': '',
 'enable_id_oos_gen': True,
 'enable_synth_data_gen': True,
 'max_query_len': 24,
 'hf_dataset': None,
 'label_column': 'label',
 'text_column': 'text',
 'oos_label': 'NO_NODES_DETECTED',
 'add_additional_invalid_routes': False,
 'invalid_routes': ['gibberish',
  'mojibake',
  'chitchat',
  'non-english',
  'profanity'],
 'expected_oos_proportion': 0.1,
 'nn_for_oos_detection': 10,
 'skip_eval': False,
 'only_oos_head': False,
 'add_typo_robustness': False,
 'fallback_samples_per_route': 30,
 'device': 'cuda:0',
 'do_quantise': False,
 'local_llm_host': 'http://localhost:11434',
 'HOSTED_LLM_FAMILIES': {'openai': ['gpt'],
  'anthropic': ['claude'],
  'google': ['gemini']}
  }

Full set of knobs for routing

  {
  'pooling_strategy': 'mean',
  'labels_dict': {},
  'metadata_dict':{},
  'multi_vec_embs': '/content/route0x_sof_model/token_embeddings.npz',
  'oos_label': 'NO_NODES_DETECTED',
  'max_length': 24,
  'model_confidence_threshold_for_using_outlier_head': 0.9,
  'model_uncertainity_threshold_for_using_nn': 0.5,
  'nn_for_fallback': 5
  }

I want to see more usage samples

Samples

I want to see the detailed numbers & reproduce benchmarks

Benchmarks

Features and Roadmap

Integrate distilabel ?
Integrate DSPy or AdalFlow for streamlining LLM prompt integrations ?
Run identify best non-english base model and test on few datasets (should be straight-jacket).
Implement typo robustness for queries.
Fill-in other LLM providers.
Experimental
- Identifying Confidence and Uncertainity thresholds to use fallback mechanism using Elbow/Knee locator.
- Generating a test set quickly check the route0x outputs.

Scope, Caveats and Limitations

On Low shot regime and domains

Low-shot regime (+ 100 steps) demonstrated works well for closed-domain TODS/GODS systems at any granularity.
But this WON'T work for open-domain datasets like ORCAS-I or Similar web queries. Even upgrading Low to Few shot won't help either.
For open-domain datasets you need to switch vanilla supervised fine-tuning.
Curious souls can compare ORCAS-I FS performance (100 samples / route) on Route0x vs Semantic Router in benchmarks folder.

On LLMS

For Local LLMs tested only on llama3.x.
For hosted LLMs tested only on OAI GPT.x.
While theoretically any LLM should work, prompt faithfulness needs to verified.

Citations

Will be added shortly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Low latency, High Accuracy, Custom Query routers.

Table of Contents

Route0x: Getting Started

Training

Build a query router for your routes - Starter Notebook

Inference

Route0x Evals: A Highlight reel.

1. Route0x vs Amazon Research 2024: HINT3 OOS Dataset

2. Route0x vs Salesforce Research 2022: CLINC OOS Dataset

3. Route0x vs Aurelio Labs' Semantic Router

I want to know how it works

I want to understand the knobs and tinker with it:

Key knobs for building

Key knobs for routing

Full set of knobs for building

Full set of knobs for routing

I want to see more usage samples

I want to see the detailed numbers & reproduce benchmarks

Features and Roadmap

Scope, Caveats and Limitations

Citations

Files

README.md

Latest commit

History

README.md

File metadata and controls

Low latency, High Accuracy, Custom Query routers.

Table of Contents

Route0x: Getting Started

Training

Build a query router for your routes - Starter Notebook

Inference

Route0x Evals: A Highlight reel.

1. Route0x vs Amazon Research 2024: HINT3 OOS Dataset

2. Route0x vs Salesforce Research 2022: CLINC OOS Dataset

3. Route0x vs Aurelio Labs' Semantic Router

I want to know how it works

I want to understand the knobs and tinker with it:

Key knobs for building

Key knobs for routing

Full set of knobs for building

Full set of knobs for routing

I want to see more usage samples

I want to see the detailed numbers & reproduce benchmarks

Features and Roadmap

Scope, Caveats and Limitations

Citations