A event recorded event in the default stream zero `cuplaEventRecord(event,0)` is not blocking all other streams. This behavior is different to CUDA. Note: with CUDA 7 the behavior of the default stream can be changed that each thread can have it's own default stream. https://devblogs.nvidia.com/parallelforall/gpu-pro-tip-cuda-7-streams-simplify-concurrency/ This should not interesting because cupla is not thread save :-)