Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy in AOT Module Size and Runtime Efficiency Based on Kernel Execution State #8564

Open
Roushelfy opened this issue Jul 15, 2024 · 0 comments
Labels
question Question on using Taichi

Comments

@Roushelfy
Copy link

Issue Description

Summary

When generating an AOT module using Taichi, I observed a difference in the size of the generated module.tcm file depending on whether the kernel function was executed before archiving. This discrepancy also affects the runtime efficiency when the module is loaded and launched from C++/ C#.

Minimal Sample Code to Reproduce

import taichi as ti

def compile_aot(run=False):
    ti.init(arch=ti.vulkan)
    if ti.lang.impl.current_cfg().arch != ti.vulkan:
        raise RuntimeError("Vulkan is not available.")
    
    @ti.kernel
    def paint(pixels: ti.types.ndarray(dtype=ti.f32, ndim=2), n: ti.u32, t: ti.f32):
        for i, j in pixels:  # Parallelized over all pixels
            c = ti.Vector([-0.8, ti.cos(t) * 0.2])
            z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
            iterations = 0
            while z.norm() < 20 and iterations < 50:
                z = ti.Vector([z[0]**2 - z[1]**2, z[1] * z[0] * 2]) + c
                iterations += 1
            pixels[i, j] = 1 - iterations * 0.02

    n = 1024
    t = 0
    pixels = ti.ndarray(shape=(n * 2, n), dtype=ti.f32)
    if run:
        gui = ti.GUI('Julia Set', (n * 2, n))

        while gui.running:
            t += 1
            paint(pixels, n, t * 0.03)
            pixel = pixels.to_numpy()
            gui.set_image(pixel)
            gui.show()
    
    mod = ti.aot.Module(ti.vulkan)
    mod.add_kernel(paint, template_args={'pixels': pixels})
    mod.archive("build/module.tcm")
    print("Module archived to 'build/module.tcm'")

if __name__ == '__main__':
    compile_aot(run=False)

Observations

When run=False, the generated module.tcm file is 7 KB.
eb4fc9ca75204cf726c3b7debf69a06
When run=True, the generated module.tcm file is 8 KB.
1e0fc0deaff4d31407cc7a45ce1ab2a

The runtime efficiency when the module is called from C++/C# differs between the two cases.(If the kernel function was executed before archiving, it runs faster)

Questions

What causes the difference in the size of the AOT module depending on whether the kernel function is executed before archiving?
Why does this difference impact the runtime efficiency when the module is called from C++?

System Information

Taichi version: [1.8.0]
OS: [WIN 11]
Thank you for your help in understanding this issue.

@Roushelfy Roushelfy added the question Question on using Taichi label Jul 15, 2024
@Roushelfy Roushelfy reopened this Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question on using Taichi
Projects
Status: Done
Development

No branches or pull requests

1 participant