Skip to content

The Economics: Skeletons for the People

William Silversmith edited this page Dec 6, 2019 · 18 revisions

Kimimaro is designed to mass produce skeletons at low compute cost. Here's an example calculation for the cost of running Kimimaro on a petavoxel of connectomics data at mip 3 on Google Compute Engine + Google Cloud Storage. Prices for AWS would be similar but are less predictable because of spot pricing.

1 Petavoxel = 200,000 x 200,000 x 25,000 voxels at 4x4x40 nm resolution  
MIP 3 (typically 32x32x40 nm) =  25,000 x 25,000 x 25,000 voxels = 15.6 TVx  
Using 512x512x512 voxel tasks = 116,416 tasks  
  
Cloud computing using preemptible instances = $0.01 to $0.02 per vCPU/hr  
Typical task time = 30 min (data dependent, can range from seconds to hours, make sure you test!)  

Compute Time Cost = (116,416 tasks) * (30 min / 60 min/hr) * (0.01 to 0.02 $/hr)   
                  = $582 to $1164

Assuming image chunks stored as 128x128x64:

15.6 TVx / (128x128x64 voxels/file) = 14.9 million files
14.9e6 files * ($4 per ten million files) = $5.95

Assume segmentation labels are fractured into about 1.5 billion fragments after chunking:
2.3B PUT requests * 5 $/million PUTs = $11,500

First pass approximate cost $12.5k or 80 GVx/$

Post-processing follows and is also dominated by file creation. Since you'll make 1 
file per label, the cost is something less than 2x but we'll use 2x to be safe.

Total should be < $24k, but job misconfigurations can balloon that. 

Of course your milage may vary. Make sure you perform experiments on your own hardware and use prices applicable to them. Obviously, the biggest cost is file creation and we're working on ways to reduce that soon. Stay tuned.

If you think these numbers are too steep "for the people", you should have seen what they used to be: >$129k to >$500k.

Also of course, these are example calculations and not any kind of guarantee.

Process petascale datasets at your own risk!

An Update Regarding the Sharded Format (2019-12-06)

Recently, I have been able to process a 73 TVx dataset using the new Neuroglancer sharded format. This technology is still in development, so it was executed on a 500GB RAM machine on a single core over three days. We owned the postprocessing machine, but I'll price it using a similar GCP instance. The updated cost projections:

73.5 TVx = 248k x 297k x 999 at 4x4x40 nm resolution
MIP 3 (typically 32x32x40 nm) =  31k x 37k x 999 = 1.1 Tvx
Using 512x512x512 voxel tasks = 8,566 tasks  
  
Cloud computing using preemptible instances = $0.01 to $0.02 per vCPU/hr  
Typical task time = 30 min (data dependent, can range from seconds to hours, make sure you test!)  
Compute Time Cost = (8,566 tasks) * (30 min / 60 min/hr) * (0.01 to 0.02 $/hr)   
                  = $42 to $86

This generates 8,566 PUT requests during the first phase:

8,566 files * $4 per million = $0.03 

The skeletons were processed into about 1024 "shard" files:

1024 * $4 per million = $0.004

The processing time was about 72 hrs on a 500 GB instance with 56 cores (only 1 was used). 
A similar instance on GCP would be an n1-highmem-64 with 64 CPUs and 416 GB memory (we used 
about 350 GB at peak) which goes for $2.653 per hr. It's not possible to use a preemptible 
instance with such long running jobs.

72 hrs * $2.653 / hr = $191.0

Total = $277.05 or about 265 GVx/$

Using the older method, each of those 14.5 M skeletons would have incurred at least 2x PUTs:
14.5M x 2 x $4 per million = $116
compute time, roughly similar to the first pass = $42
(the compute time of postprocessing has been vastly improved so I conservatively 
 used the lower amount)

Comparison = $244 or 301 GVx/$

The bulk of the cost from this procedure resided in compute rather than file creation. It should be possible to more efficiently utilize compute power with further development which will improve the cost ratio considerably.

Clone this wiki locally