Skip to content

Commit 405ff22

Browse files
authored
Merge pull request #495 from bbrowning/release-070-changelog
Update release notes for v0.7.0
2 parents 4134575 + 2589ea2 commit 405ff22

File tree

2 files changed

+15
-1
lines changed

2 files changed

+15
-1
lines changed

.spellcheck-en-custom.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ ICL
2626
icl
2727
ie
2828
instructlab
29+
IterBlock
2930
Jinja
3031
JSON
3132
Langchain's

CHANGELOG.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Unreleased 0.7.x
1+
## v0.7.0
22

33
### Features
44

@@ -10,6 +10,18 @@ See the `tests/testdata/custom_block.py` and `tests/testdata/custom_block_pipeli
1010

1111
See the `tests/testdata/custom_prompt.py` file in this repository for an example how to register custom chat templates used when formatting prompts.
1212

13+
### New Blocks - IterBlock and LLMMessagesBlock
14+
15+
We have two new Block types available for pipelines in this release - `IterBlock` and `LLMMessagesBlock`. `IterBlock` allows you to execute another `Block` multiple times, based on a configured number of iterations. `LLMMessagesBlock` is like `LLMBlock` but uses the newer chat/completions API of OpenAI-compatible servers instead of the legacy completions API.
16+
17+
### Consolidated PDF and Markdown ingestion and chunking implementations
18+
19+
Instead of sending PDF input documents through Docling and using something custom for Markdown, we now send both types of documents through Docling and have consolidated the chunking implementation across both document types. This may result in different chunks being generated for markdown content compared to previous releases.
20+
21+
### Added a new `instructlab.sdg.mix_datasets` Python API
22+
23+
We've added a new Python API for advanced users that need to re-mix our generated outputs, for example to weight one taxonomy leaf node over others in the output or to have more than our default of 30 skill samples per leaf node in the final mixed output. See the example at `docs/examples/mix_datasets/` for some example Python code and Recipe yaml files to accomplish this.
24+
1325
### Breaking Changes
1426

1527
#### Pipeline configs and Prompt templates switched to Jinja
@@ -23,6 +35,7 @@ Any users that were specifying custom pipeline configs (instead of using the def
2335
### Fixes
2436

2537
* The PyTorch dependency is removed, because SDG doesn't directly use PyTorch. The test suite still depends on `instructlab` core, which depends on PyTorch.
38+
* The `batch_size` parameter is now respected every time we call an inference server from an `LLMBlock`. Previously, we were only batching the initial input but not accounting for some Blocks that may emit more output samples than input samples, meaning we would exceed our configured `batch_size` when actually making batching inference calls to vLLM, causing more memory to be consumed than expected as well as leading to scenarios where we were overloading inference servers in unexpected ways due to sending in batches with hundreds of completion requests instead of the configured size, which defaults to `8` on most hardware profiles.
2639

2740
## v0.6.3
2841

0 commit comments

Comments
 (0)