Stars
ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
Input Method Engine (IME) for Mac OS X with built-in support for all Indic Languages
Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.
Open Source Speech Inferencing Libary for Indic Languages
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
State-of-the-Art Text Embeddings
Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.
Unsupervised text tokenizer for Neural Network-based text generation.
General technology for enabling AI capabilities w/ LLMs and MLLMs
High accuracy RAG for answering questions from scientific documents with citations
🦜🔗 Build context-aware reasoning applications 🦜🔗
A library for efficient similarity search and clustering of dense vectors.
An autoregressive character-level language model for making more things
Code for the paper: "Large Language Models as Corporate Lobbyists" (2023).
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…
🦜🔗 Build context-aware reasoning applications
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
PyTorch package for the discrete VAE used for DALL·E.
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.