Releases: databricks/lilac
v0.1.1
Overview
- Embedding computation can now be larger-than-RAM! Computing lots of embeddings will iteratively write to a vector store.
- JSON and CSV sources are heavily optimized and go through duckdb for parsing.
- Clustering now supports semantic clustering with embeddings, using DBScan.
New features
- Add SQLite source and optimize the JSON and CSV sources by @dsmilkov in https://github.com/lilacai/lilac/pull/710
- Add a dict source and convert
LangSmith
source to use it by @dsmilkov in https://github.com/lilacai/lilac/pull/716 - Add clustering signal by @dsmilkov in https://github.com/lilacai/lilac/pull/711
Performance
- Use iterables for compute_signal and compute_embedding. by @nsthorat in https://github.com/lilacai/lilac/pull/706
- Write embeddings to the vector store iteratively by @nsthorat in https://github.com/lilacai/lilac/pull/709
- Add SQLite source and optimize the JSON and CSV sources by @dsmilkov in https://github.com/lilacai/lilac/pull/710
- Speed up the docker image build step by installing lilac from pip before installing the local wheel. by @nsthorat in https://github.com/lilacai/lilac/pull/714
- Improve perf of server by removing UUID sort by @dsmilkov in https://github.com/lilacai/lilac/pull/715
Bug fixes
- Fix semantic search on repeated by @dsmilkov in https://github.com/lilacai/lilac/pull/704
- Fix syntax error with keyword search by @dsmilkov in https://github.com/lilacai/lilac/pull/705
- Fix bug with span highlighting a repeated field by @nsthorat in https://github.com/lilacai/lilac/pull/713
- Change the bootup load to be during the new FastAPI lifecycle API. by @nsthorat in https://github.com/lilacai/lilac/pull/717
Full Changelog: https://github.com/lilacai/lilac/compare/v0.1.0...v0.1.1
v0.1.0
New Features
Lilac now supports labeling! For a detailed guide, see Labeling a dataset
Labels can be added for individual rows:
dataset.add_labels(
'good',
row_ids=['0003076800f1471f8f4c8a1b2deda742'])
Or for slices of the data:
dataset.add_labels(
'short',
filters=[
(('text', 'text_statistics', 'num_characters'), 'less', 1000)
]
)
They can then be exported:
short_rows = list(
dataset.select_rows(
['*', 'short'],
filters=[
(('short', 'label'), 'exists')
]
)
)
# Print the first row.
print(short_rows[0])
Output:
{
'__rowid__': '0003076800f1471f8f4c8a1b2deda742',
'text': 'If you want to truly experience the magic (?) of Don Dohler, then check out "Alien Factor" or maybe "Fiend", but not this. Alien Factor is actually rather imaginative considering the low budget and it\'s fairly creepy, but "Nightbeast", which I guess is sort of an updating of Alien Factor, is just plain dumb. Actors sleepwalk through their roles, especially Mr. Monotone sheriff, and the monster is some dumb Halloween-mask kind of thing instead of the wildly imaginative (but kind of stupid) looking critters from Alien Factor. A spaceship crashes on Earth and there\'s a critter inside, of course, who runs around vaporizing people. And ripping off arms, etc. And he has a cool ray gun that he uses to vaporize people too, until it gets shot out of his hand. And that\'s really about it. "Alien Factor" beats this mess hands down, if you really want to see a good Don Dohler movie, check that out instead. And RIP Don Dohler, 12/2/06.',
'label': 'neg',
'__hfsplit__': 'test',
'good': {
'label': 'true',
'created': datetime.datetime(2023, 9, 20, 10, 16, 15, 545277)
}
}
Labels can also be added via the UI:
What's changed
- Make '.' the default project. by @nsthorat in https://github.com/lilacai/lilac/pull/701
Bug fixes
- Allow
add_labels
andremove_labels
without selection by @dsmilkov in https://github.com/lilacai/lilac/pull/698 - Fix UI regression and empty
lilac.yml
(no datasets) by @dsmilkov in https://github.com/lilacai/lilac/pull/700
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.20...v0.1.0
v0.0.20
Features
- Add "More like this" button in the item viewer by @dsmilkov in https://github.com/lilacai/lilac/pull/676
- Add simple labeling functionality in the item viewer by @dsmilkov in https://github.com/lilacai/lilac/pull/679
- Add removing labels, and add row_ids to add labels. by @nsthorat in https://github.com/lilacai/lilac/pull/680
- Improving the label download by @dsmilkov in https://github.com/lilacai/lilac/pull/682
- Expose
LangSmithSource
to the public API and docs by @dsmilkov in https://github.com/lilacai/lilac/pull/684 - Add UI to clear labels. by @nsthorat in https://github.com/lilacai/lilac/pull/686
- Add a 'label all' button to label all results in view by @nsthorat in https://github.com/lilacai/lilac/pull/687
- Add docs for labeling. Fix some labeling issues. by @nsthorat in https://github.com/lilacai/lilac/pull/692
Bug fixes
- Tiny CSS fixes to make mobile not terrible by @nsthorat in https://github.com/lilacai/lilac/pull/677
- Fix REST API with new labels API. by @nsthorat in https://github.com/lilacai/lilac/pull/681
- Fix issue with overflow on text by @nsthorat in https://github.com/lilacai/lilac/pull/683
- Fix upload scripts so we can push to a staging directory without uploading data. by @nsthorat in https://github.com/lilacai/lilac/pull/689
- Add better error messaging when inferring schema by @dsmilkov in https://github.com/lilacai/lilac/pull/691
- Fix the huggingface deploy script. by @nsthorat in https://github.com/lilacai/lilac/pull/695
- Fix bug with UDFs after metadata separation by @nsthorat in https://github.com/lilacai/lilac/pull/696
Other
- Migrate to Pydantic V2 by @dsmilkov in https://github.com/lilacai/lilac/pull/685
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.19...v0.0.20
v0.0.19
What's Changed
New Features 🎉
- Improve the project API and documentation. by @nsthorat in https://github.com/lilacai/lilac/pull/668
- Add the python API for adding labels. by @nsthorat in https://github.com/lilacai/lilac/pull/667
- Add UI for viewing labels. by @nsthorat in https://github.com/lilacai/lilac/pull/670
Other Changes
- Update homepage with short 10sec videos by @dsmilkov in https://github.com/lilacai/lilac/pull/663
- Optional Outputs by @hinthornw in https://github.com/lilacai/lilac/pull/666
- Fix bug with the selectRows cache not being cleared when labeling concepts in HF. by @nsthorat in https://github.com/lilacai/lilac/pull/671
- Add a large guide for querying datasets. by @nsthorat in https://github.com/lilacai/lilac/pull/669
- Re-design the items by @dsmilkov in https://github.com/lilacai/lilac/pull/674
New Contributors
- @hinthornw made their first contribution in https://github.com/lilacai/lilac/pull/666
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.18...v0.0.19
v0.0.18
New Features
- Add first version for Dataset Insights by @dsmilkov in https://github.com/lilacai/lilac/pull/641
- Add a compute concept modal. by @nsthorat in https://github.com/lilacai/lilac/pull/657
- Add expandable metadata by @dsmilkov in https://github.com/lilacai/lilac/pull/644
- Expand parts of metadata according to the search context by @dsmilkov in https://github.com/lilacai/lilac/pull/659
Other Changes
- Fix the huggingface deploy script. by @nsthorat in https://github.com/lilacai/lilac/pull/638
- Fix bug with concept labeler not returning refreshed results. by @nsthorat in https://github.com/lilacai/lilac/pull/639
- Improve documentation around GCS paths. by @nsthorat in https://github.com/lilacai/lilac/pull/647
- When merging floats, check for closeness to avoid precision issues. Pin pandas version. by @nsthorat in https://github.com/lilacai/lilac/pull/655
- Fix
RuntimeError
in HNSW index by @dsmilkov in https://github.com/lilacai/lilac/pull/656 - Fix negative-sentiment and legal-terminal concepts due to missing top-level
version
field by @dsmilkov in https://github.com/lilacai/lilac/pull/658 - Fix italics for N/A by @nsthorat in https://github.com/lilacai/lilac/pull/662
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.17...v0.0.18
v0.0.17
What's Changed
- Fix bug in load script where we try to use the task manager when none is passed. by @nsthorat in https://github.com/lilacai/lilac/pull/627
- Various bug fixes by @dsmilkov in https://github.com/lilacai/lilac/pull/629
- Fix the async bug when starting the server by @dsmilkov in https://github.com/lilacai/lilac/pull/636
- Fix bug with non-serializable schema in the concept labeler. by @nsthorat in https://github.com/lilacai/lilac/pull/632
- Update the global project config during changes. by @nsthorat in https://github.com/lilacai/lilac/pull/631
- Remove the explicit cache directory for sentence transformers. by @nsthorat in https://github.com/lilacai/lilac/pull/637
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.16...v0.0.17
v0.0.16
New Features
- Add LangSmith source by @dsmilkov in https://github.com/lilacai/lilac/pull/626
Other Changes
- Improve memory usage of
lilac load
to unblock mosaic datasets by @dsmilkov in https://github.com/lilacai/lilac/pull/620 - Add a project_path to lilac_start. by @nsthorat in https://github.com/lilacai/lilac/pull/621
- Allow tanstack query result to contain non-serializable data by @dsmilkov in https://github.com/lilacai/lilac/pull/625
- Fix auth bugs with concepts. Pip install lilac[all] in the dockerfile. by @nsthorat in https://github.com/lilacai/lilac/pull/622
- Add ability to make concepts public. by @nsthorat in https://github.com/lilacai/lilac/pull/624
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.15...v0.0.16
v0.0.15
What's Changed
- Fix paths on windows by @dsmilkov in https://github.com/lilacai/lilac/pull/613
- Make db.select_groups work with python 3.11 by @dsmilkov in https://github.com/lilacai/lilac/pull/612
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.14...v0.0.15
v0.0.14
What's Changed
- updated from HuggingFaceDataset to HuggingFaceSource by @Contributorrandom in https://github.com/lilacai/lilac/pull/611
- Use pip for the HuggingFace demos. by @nsthorat in https://github.com/lilacai/lilac/pull/609
A bug with JavaScript not getting built for the pip package was fixed and released with this version. This includes the change to the searchbox: https://github.com/lilacai/lilac/pull/603
New Contributors
- @Contributorrandom made their first contribution in https://github.com/lilacai/lilac/pull/611
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.13...v0.0.14
v0.0.13
What's Changed
- Simplify the search bar by @dsmilkov in https://github.com/lilacai/lilac/pull/603
- Update deps versions (including pyarrow) to latest by @dsmilkov in https://github.com/lilacai/lilac/pull/605
- Update to 0.0.13. by @nsthorat in https://github.com/lilacai/lilac/pull/606
Full Changelog: https://github.com/lilacai/lilac/compare/v0.0.12...v0.0.13