Skip to content

Pull requests: allenai/dolma

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Docfix, material to running the c4 replication script
#270 opened Jul 3, 2025 by segyges Loading…
Improve WARC processing
#260 opened Apr 15, 2025 by soldni Draft
first
#240 opened Feb 14, 2025 by Whattabatt Draft
[WIP DO NOT MERGE] Learn2Code Feature Branch
#233 opened Feb 13, 2025 by cmwilhelm Loading…
simpler logic for calculating code taggers
#229 opened Feb 12, 2025 by kyleclo Loading…
Bump openssl from 0.10.66 to 0.10.70 in the cargo group dependencies Pull requests that update a dependency file rust Pull requests that update Rust code
#228 opened Feb 3, 2025 by dependabot bot Loading…
New language ID
#223 opened Dec 30, 2024 by soldni Loading…
DCLM Style Deduplications
#214 opened Sep 30, 2024 by revbucket Loading…
Mattj/requirements
#212 opened Sep 26, 2024 by revbucket Loading…
DNM: Patch FT Tagger
#210 opened Sep 25, 2024 by undfined Draft
New Progress Bar, Backoff, Batching
#165 opened May 23, 2024 by soldni Loading…
Warc Backoff
#160 opened May 10, 2024 by soldni Loading…
Baseline data
#61 opened Oct 20, 2023 by IanMagnusson Draft
Text modification config
#60 opened Oct 19, 2023 by rodneykinney Loading…
ProTip! Adding no:label will show everything without a label.