Releases: replicate/cog
v0.14.0
Support for concurrent predictions
This release introduces support for concurrent processing of predictions through the use of an async predict function.
To enable the feature add the new concurrency.max
entry to your cog.yaml file:
concurrency:
max: 32
And update your predictor to use the async def predict
syntax:
class Predictor(BasePredictor):
async def setup(self) -> None:
print("async setup is also supported...")
async def predict(self) -> str:
print("async predict");
return "hello world";
Cog will now process up to 32 predictions simultaneously, once at capacity subsequent predictions will return a 409 HTTP response.
Iterators
If your model is currently using Iterator
or ConcatenateIterator
it will need to be updated to use AsyncIterator
or AsyncConcatenateIterator
respectively.
from cog import AsyncConcatenateIterator, BasePredictor
class Predict(BasePredictor):
async def predict(self) -> AsyncConcatenateIterator[str]:
for fruit in ["apple", "banana", "orange"]:
yield fruit
Migrating from 0.10.0a
An earlier fork of cog with concurrency support was published under the 0.10.0a
release channel. This is now unsupported and will receive no further updates. There are some breaking changes in the API that will be introduced with the release of the 0.14.0
beta. This alpha release is backwards compatible and you will see deprecation warnings when calling the deprecated functions.
emit_metric(name, value)
- this has been replaced bycurrent_scope().record_metric(name, value)
Note
Note that the use of current_scope
is still experimental and will output warnings to the console. To suppress these you can ignore the ExperimentalFeatureWarning
:
import warnings
from cog import ExperimentalFeatureWarning
warnings.filterwarnings("ignore", category=ExperimentalFeatureWarning)
Known limitations
- An async setup method cannot be used without an async predict method. Supported combinations are: sync setup/sync predict, async setup/async predict and sync setup/async predict.
- File uploads will block the event loop. If your model outputs
File
orPath
types these will currently block the event loop. This may be an issue for large file outputs and will be fixed in a future release.
Changelog
- efad169 Add a simple explanation for the standard predictor (#2187)
- 7b97da5 Add fast pusher for fast builds (#2114)
- 8a6fa4c Add multi progress bar to uploads (#2134)
- ecbe178 Add upload and verification to the fast pusher (#2128)
- 663c375 Build framework for file challenge (#2175)
- f99ef73 Bump github.com/golangci/golangci-lint from 1.62.2 to 1.64.2
- 6f55fdd Call the web when layers have been pushed (#2139)
- 18a3e90 Capture standard output when loading the predictor
- 85658ae Change ffmpeg int test to 3.12 (#2131)
- 9fe30d7 Check version formats in fast_generator (#2129)
- c95774f Create multipart uploader for fast push (#2135)
- 59545cc Enforce usage of r8.im image name (#2146)
- b3713a6 Fail python_packages on --x-fast (#2162)
- 552a0cc Fast Push Fixes (#2133)
- 3f0d7c9 Fix fast tarball tmp dir (#2194)
- 0e36b61 Fix file too long on tmp file create (#2132)
- 1247b48 Fix monobeam client not authenticating properly (#2148)
- 018aca5 Fix path for weights in symlink (#2172)
- c245552 Freeze user layer only for fast builds (#2165)
- e7e042c Handle URL
Path
incog predict
- 61d9b8d Handle weights moved to subdir (#2191)
- e12f237 Include
sha256:
prefix in generated runtime config files (#2153) - 80d2a27 Move fast push test to cog-runtime (#2181)
- 3ab642a Propagate R8_COG_VERSION on fast push (#2166)
- e0e8b67 Pull monobase:latest before building fast model
- 425715c Remove COG_EAGER_IMPORTS from fast path
- c417cec Remove COG_PYENV_PATH from fast path
- 2a0763a Remove deprecated interfaces Predictor.log and emit_metric
- 94bd7be Require a test to be added for fast push (#2149)
- a085909 Revert "Handle URL
Path
incog predict
" - 0c6c0be Send push timings to the server (#2152)
- e27d743 Separate tmp mounts for fast layers (#2182)
- bc4b95e Support Python requirements package[extra,...]==version (#2160)
- a27eb98 Tweak fast build cache and file exposure (#2188)
- cca8874 Update API endpoint in accordance with new name (#2151)
- b49e405 Update CDN URL (#2161)
- 0601688 Update ruff config so lint works again (#2173)
- 8cfd9e1 Update training.md
- aed8fe0 Use coglet in fast_generator (#2130)
- d13973b Validate config by checking the run commands (#2147)
- 1a1853f add llms.txt (#2120)
- 637614c simplify llms.txt generation (#2123)
- 9f74e7d update token config in
cog init
actions workflow template (#2192)
v0.13.7
Changelog
- 6eb2d2e Add a cog integration test for apt-packages (#2104)
- 9efb306 Add fast generator for cog build (#2108)
- 60017a9 Add test for ffmpeg in base images (#2122)
- 2f12ead Avoid warnings
- 8dac405 Be explicit about the Python version we're expecting in tests
- b7aa7c3 Fix pydantic2 cog builds (#2115)
- 97d749f Increase
nofile
limit for tests - 1007849 Move _tag_var to Scope
- cb78a0f Officially mark Cog as supporting Python 3.13
- ba1d4c2 Only add webp to mimetypes on old Pythons
- e2ad2a4 Pin ruff to 0.9.1 and reformat
- b1c9188 Update color of dark mode website
- a36b42f Update fastapi requirement from <0.99.0,>=0.75.2 to >=0.75.2,<0.116.0 (#1966)
- 85b85bf chore: fix some comments
- d04f127 convert Scope to attrs.frozen
v0.14.0-alpha1
Support for concurrent predictions
This release introduces support for concurrent processing of predictions through the use of an async predict function.
To enable the feature add the new concurrency.max
entry to your cog.yaml file:
concurrency:
max: 32
And update your predictor to use the async def predict
syntax:
class Predictor(BasePredictor):
async def setup(self) -> None:
print("async setup is also supported...")
async def predict(self) -> str:
print("async predict");
return "hello world";
Cog will now process up to 32 predictions simultaneously, once at capacity subsequent predictions will return a 409 HTTP response.
Iterators
If your model is currently using Iterator
or ConcatenateIterator
it will need to be updated to use AsyncIterator
or AsyncConcatenateIterator
respectively.
from cog import AsyncConcatenateIterator, BasePredictor
class Predict(BasePredictor):
async def predict(self) -> AsyncConcatenateIterator[str]:
for fruit in ["apple", "banana", "orange"]:
yield fruit
Migrating from 0.10.0a
An earlier fork of cog with concurrency support was published under the 0.10.0a
release channel. This is now unsupported and will receive no further updates. There are some breaking changes in the API that will be introduced with the release of the 0.14.0
beta. This alpha release is backwards compatible and you will see deprecation warnings when calling the deprecated functions.
emit_metric(name, value)
- this has been replaced bycurrent_scope().record_metric(name, value)
Note
Note that the use of current_scope
is still experimental and will output warnings to the console. To suppress these you can ignore the ExperimentalFeatureWarning
:
import warnings
from cog import ExperimentalFeatureWarning
warnings.filterwarnings("ignore", category=ExperimentalFeatureWarning)
Known limitations
- An async setup method cannot be used without an async predict method. Supported combinations are: sync setup/sync predict, async setup/async predict and sync setup/async predict.
- File uploads will block the event loop. If your model outputs
File
orPath
types these will currently block the event loop. This may be an issue for large file outputs and will be fixed in a future release.
Other Changes
- Change torch vision to 0.20.0 for torch 2.5.0 cpu by @8W9aG in #2074
- Ignore files within a .git directory by @8W9aG in #2087
- Add fast build flag to cog by @8W9aG in #2086
- Make dockerfile generators abstract by @8W9aG in #2088
- Do not run a separate python install stage by @8W9aG in #2094
Full Changelog: v0.13.6...v0.14.0-alpha1
v0.10.0-alpha27
Changelog
- 32c7408 fix GHA
v0.13.6
v0.13.3
This release includes an important bug fix around usage of requests
to include explicit connection timeouts. Other changes include tidying related to the removal of python 3.7 support, adding the output of pip freeze
as a docker image label, and some groundwork towards supporting concurrent predictions.
Changelog
- 8e1091f Add lock to subscribers dictionary
- 3d9c298 Add pip freeze to docker label (#2062)
- 3e56e59 Always set timeout on requests (#2064)
- 746ec53 Fix flake on test_path_temporary_files_are_removed (#2059)
- 2bc4710 Make TestWorkerState aware of prediction tags
- 425d5a2 More python 3.7 tidying (#2063)
- 8630036 PR feedback
- 9c894d6 Update Worker to support concurrent predictions
- cf0f8b2 Update python/cog/server/worker.py
- db1cbef make clear why we read the PredictionInput childworker event
- 5f6a742 update TestWorkerState to support concurrent subscribers