Releases: instructlab/sdg
v0.3.1
What's Changed
- Add more tests for golden/distractor context picking by @bbrowning in #256
- Document dataset formats by @markmc in #236
- ci: move E2E runner from github to AWS by @nathan-weinberg in #260
- ci: add AWS tag to show github PR number for all jobs by @nathan-weinberg in #264
- build(deps): bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.0 by @dependabot in #263
- ci: add GitHubRef to AWS labels as well by @nathan-weinberg in #265
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.0 to 1.10.1 by @dependabot in #266
- chore: replace platformdirs with xdg-base-dirs by @jaideepr97 in #269
- chore: add auto-merging policy for SDG by @khaledsulayman in #262
- ci: update lint workflow by @nathan-weinberg in #278
- build(deps): bump step-security/harden-runner from 2.9.1 to 2.10.1 by @dependabot in #274
- build(deps): bump hynek/build-and-inspect-python-package from 2.8.0 to 2.9.0 by @dependabot in #268
- build(deps): bump actions/checkout from 4.1.6 to 4.1.7 by @dependabot in #280
- build(deps): bump actions/setup-python from 5.1.0 to 5.2.0 by @dependabot in #279
- build(deps): bump rhysd/actionlint from 1.7.1 to 1.7.2 in /.github/workflows by @dependabot in #285
- build(deps): bump rojopolis/spellcheck-github-actions from 0.41.0 to 0.42.0 by @dependabot in #283
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.1 to 1.10.2 by @dependabot in #282
- build(deps): bump DavidAnson/markdownlint-cli2-action from 16.0.0 to 17.0.0 by @dependabot in #275
- ci: add additional autolabeling rules by @nathan-weinberg in #286
- github: add stale bot to sdg repo by @nathan-weinberg in #287
- ci: fix lint action by @nathan-weinberg in #288
- build(deps): bump actions/checkout from 4.1.7 to 4.2.0 by @dependabot in #289
- Handle empty dataset from output of sdg leaf node without raising error by @relyt0925 in #272
New Contributors
- @jaideepr97 made their first contribution in #269
- @khaledsulayman made their first contribution in #262
- @relyt0925 made their first contribution in #272
Full Changelog: v0.3.0...v0.3.1
v0.3.0
⚠️ Introducing removal of unused arguments in generate() API for initializing OpenAI client ⚠️
Valid OpenAI client now needs to be passed to the API as it will no longer be initialized on the API side.
The removal of the unused arguments for initalizing OepnAI client was driven from the CLI, please refer to the PR on GitHub.
What's Changed
- tests: Add test to validate that generate_data() is generating the files expected by @hickeyma in #226
- Use instructlab-schema package to parse qna.yaml files by @bjhargrave in #62
- remove instructlab from requirements-dev by @makelinux in #249
- Bump rojopolis/spellcheck-github-actions from 0.40.0 to 0.41.0 by @dependabot in #241
- generate_data: remove 6 unused arguments by @makelinux in #248
- Fix selection logic for distractor documents by @aakankshaduggal in #252
- tests: Remove custom yaml rules by @bjhargrave in #253
Full Changelog: v0.2.6...v0.3.0
v0.2.7
v0.2.6
v0.2.5
What's Changed
- Don't write empty checkpoint datasets by @bbrowning in #239
- Document how to mix in pregenerated skills dataset by @bbrowning in #237
Full Changelog: v0.2.4...v0.2.5
v0.2.4
What's Changed
- Separate checkpoints by leaf nodes by @danmcp, @shivchander in #231
Full Changelog: v0.2.3...v0.2.4
v0.2.3
What's Changed
- Add support for auxiliary dataset generation by @shivchander, @khaledsulayman, @abhi1092, @aakankshaduggal, @bbrowning, @markmc in #204
- Add missing license identifiers by @danmcp in #229
- Fixing typos by @danmcp in #230
- Remove the unnecessary SDG class by @markmc in #64
New Contributors
Full Changelog: v0.2.2...v0.2.3
v0.2.2
What's Changed
- Add data checkpointing capability by @shivchander, @derekhiggins, @markmc in #222
- Remove calls to logging.basicConfig on import by @tiran, @markmc in #194
- Add gen_kwargs support for ConditionalLLMBlock by @derekhiggins in #221
- tests: Add unit tests for taxonomy and model family by @hickeyma in #188
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
Full Changelog: v0.2.0...v0.2.1
New Features ✨
- Introduce a way to mix generated datasets before sending to training by @shivchander @khaledsulayman @abhi1092 @aakankshaduggal @bbrowning @markmc in #163 #215
- Introduce data mixing recipe yaml files by @shivchander @khaledsulayman @abhi1092 @aakankshaduggal @bbrowning @markmc in #203
- Add 4 new pipeline blocks by @abhi1092 @shivchander @derekhiggins @markmc in #182
- Generate data for model evaluation using the MMLU benchmark by @shivchander @abhi1092 @aakankshaduggal @derekhiggins @markmc in #180 #212 #209 #193
Fixes 🐛
- Remove temporary e2e hack to use knowledge v3 PR by @markmc in #187
- Remove sys_prompt from contexts.yaml by @shivchander @derekhiggins in #189
- Move Block._validate to llmblock by @abhi1092 @derekhiggins in #191
- generate_data: introduce argument
clientto replace 6 others by @makelinux @tiran in #114 - Fix logging string formatting by @derekhiggins in #197
- Add utility function to convert from Pandas dataframe to Hugging Face dataset by @hickeyma in #199
- Update ConditionalLLMBlock's config_paths schema by @derekhiggins in #211
- Move system pipelines to /usr/share/instructlab/sdg/pipelines by @markmc in #214
New Contributors
- @bbrowning made their first contribution in #163
- @hickeyma made their first contribution in #199
v0.2.0
⚠️ Introducing v3 knowledge format - no backwards compat for v1/v2 ⚠️
The newly introduced v3 knowledge format is incompatible with the previous v1 and v2 formats. As a result, all existing knowledge contributions must be re-formatted to comply with the v3 specifications.
For detailed information and guidelines on how to re-format your contributions, please refer to the issue discussion on GitHub.
What's Changed
- Add v3 knowledge schema support by @abhi1092, @shivchander, @aakankshaduggal, @russellb, @markmc, @derekhiggins in #161
Full Changelog: v0.1.3...v0.2.0