-
Notifications
You must be signed in to change notification settings - Fork 127
models deci decidiffusion v1 0
DeciDiffusion
1.0 is an 820 million parameter latent diffusion model designed for text-to-image conversion. Trained initially on the LAION-v2 dataset and fine-tuned on the LAION-ART dataset, the model's training involved advanced techniques to improve speed, training performance, and achieve superior inference quality.
DeciDiffusion 1.0 retains key elements from Stable Diffusion, like the Variational Autoencoder (VAE) and CLIP's pre-trained Text Encoder, while introducing notable improvements. But U-Net is replaced with the more efficient U-Net-NAS which is developed by Deci. This novel component streamlines the model by reducing parameters, resulting in enhanced computational efficiency.
For more details, review the blog.
This model was trained in 4 phases.
-
It was trained from scratch for 1.28 million steps at a resolution of 256x256 using 320 million samples from LAION-v2.
-
The model was trained for 870k steps at a higher resolution of 512x512 on the same dataset to capture more fine-detailed information.
-
Training for 65k steps with EMA, a different learning rate scheduler, and more qualitative data.
-
Then the model underwent fine-tuning on a 2 million sample subset of the LAION-ART dataset.
In phase 1, 8 X 8 X A100 GPUs, AdamW optimizer had been used with batch size 8192 and learning rate 1e-4. In phases 2-4, 8 X 8 X H100 GPUs, LAMB optimizer had been used with batch size 6144 and learning rate 5e-3.
The model has limitations and may not perform optimally in various scenarios. It doesn't generate entirely photorealistic images. Rendering legible text is beyond its capability. The generation of faces and human figures may lack precision. The model is primarily optimized for English captions and may not be as effective with other languages. The auto-encoding component of the model is lossy.
DeciDiffusion primarily underwent training on subsets of LAION-v2, with a focus on English descriptions. As a result, there might be underrepresentation of non-English communities and cultures, potentially introducing bias towards white and western norms. The accuracy of outputs from non-English prompts is notably less accurate. Considering these biases, users are advised to exercise caution when using DeciDiffusion, irrespective of the input provided.
creativeml-openrail++-m
Inference type | Python sample (Notebook) | CLI with YAML |
---|---|---|
Real time | text-to-image-online-endpoint.ipynb | text-to-image-online-endpoint.sh |
Batch | text-to-image-batch-endpoint.ipynb | text-to-image-batch-endpoint.sh |
Inference with Azure AI Content Safety (AACS) Samples
Inference type | Python sample (Notebook) |
---|---|
Real time | safe-text-to-image-online-deployment.ipynb |
Batch | safe-text-to-image-batch-endpoint.ipynb |
{
"input_data": {
"columns": ["prompt"],
"data": ["A photo of an astronaut riding a horse on Mars"],
"index": [0]
}
}
[
{
"prompt": "A photo of an astronaut riding a horse on Mars",
"generated_image": "image",
"nsfw_content_detected": null
}
]
Note:
- "image" string is in base64 format.
- The
deci-decidiffusion-v1-0
model checks for the NSFW content in generated image. We highly recommend to use the model with Azure AI Content Safety (AACS). Please refer sample online and batch notebooks for AACS integrated deployments.
Visualization of inference result for a sample prompt - "a photograph of an astronaut riding a horse"
Version: 7
SharedComputeCapacityEnabled
license : creativeml-openrail++-m
task : text-to-image
hiddenlayerscanned
author : Deci AI
huggingface_model_id : Deci/DeciDiffusion-v1-0
inference_compute_allow_list : ['Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC16as_T4_v3', 'Standard_NC4as_T4_v3', 'Standard_NC64as_T4_v3', 'Standard_NC8as_T4_v3', 'Standard_NC96ads_A100_v4', 'Standard_ND40rs_v2', 'Standard_ND96amsr_A100_v4', 'Standard_ND96asr_v4']
View in Studio: https://ml.azure.com/registries/azureml/models/deci-decidiffusion-v1-0/version/7
License: creativeml-openrail++-m
SharedComputeCapacityEnabled: True
SHA: 10c31ce02b006ae5ceb774dcafdc3a0bcba992ef
datasets: LAION-v2
inference-min-sku-spec: 4|1|28|176
inference-recommended-sku: Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC16as_T4_v3, Standard_NC4as_T4_v3, Standard_NC64as_T4_v3, Standard_NC8as_T4_v3, Standard_NC96ads_A100_v4, Standard_ND40rs_v2, Standard_ND96amsr_A100_v4, Standard_ND96asr_v4