Skip to content

Commit 4b46874

Browse files
authored
feat(flotilla): Use max sources config to determine files per partition (#4535)
## Changes Made Allow the number of files per partition in swordfish to be configurable by config. For now. Ideally the scheduler or optimizer will be able to size them better ## Related Issues <!-- Link to related GitHub issues, e.g., "Closes #123" --> ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)
1 parent 5f7fefa commit 4b46874

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

src/daft-distributed/src/pipeline_node/scan_source.rs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,8 +66,9 @@ impl ScanSourceNode {
6666
return Ok(());
6767
}
6868

69-
for scan_task in self.scan_tasks.iter() {
70-
let task = self.make_source_tasks(vec![scan_task.clone()].into())?;
69+
let max_sources_per_scan_task = self.config.max_sources_per_scan_task;
70+
for scan_tasks in self.scan_tasks.chunks(max_sources_per_scan_task) {
71+
let task = self.make_source_tasks(scan_tasks.to_vec().into())?;
7172
if result_tx
7273
.send(PipelineOutput::Task(SubmittableTask::new(task)))
7374
.await

0 commit comments

Comments
 (0)