|
26 | 26 | As an extended goal, we aim to develop a new plugin for GPU execution, |
27 | 27 | leveraging CUDA or OpenMP to support high-performance computing workflows |
28 | 28 | within Jupyter. |
29 | | -
|
30 | 29 | tasks: | |
31 | 30 | * Move the currently implemented magics and reframe using xplugin |
32 | 31 | * Complete the on-going work on the Python interoperability magic |
33 | 32 | * Implement a test suite for the plugins |
34 | 33 | * Extended: To be able to execute on GPU using CUDA or OpenMP |
35 | 34 | * Optional: Extend the magics for the wasm use case (xeus-cpp-lite) |
36 | 35 | * Present the work at the relevant meetings and conferences |
| 36 | + tags: ["xeus", "xeus-cpp", "clang", "clang-repl", "jupyter", "gpu", "cuda", "python", "plugins"] |
37 | 37 |
|
38 | 38 | - name: "Enhancing LLM Training with Clad for efficient differentiation" |
39 | 39 | description: | |
|
61 | 61 | has the potential to enhance both research and production-level AI |
62 | 62 | applications, aligning with compiler-research.org's broader goal of |
63 | 63 | advancing computational techniques for scientific discovery. |
64 | | - |
65 | 64 | tasks: | |
66 | 65 | * Develop a simplified LLM setup in C++ |
67 | 66 | * Apply Clad to compute gradients for selected layers and loss functions |
|
71 | 70 | * Develop tests to ensure correctness, numerical stability, and efficiency |
72 | 71 | * Document the approach, implementation details, and performance gains |
73 | 72 | * Present progress and findings at relevant meetings and conferences |
| 73 | + tags: ["clad", "llm", "ai", "machine-learning", "automatic-differentiation", "cpp", "optimization"] |
74 | 74 |
|
75 | 75 | - name: "Integrate Clad in PyTorch and compare the gradient execution times" |
76 | 76 | description: | |
|
85 | 85 | computed by PyTorch's native autograd system. Special attention will be |
86 | 86 | given to CUDA-enabled gradient computations, as PyTorch also offers GPU |
87 | 87 | acceleration capabilities. |
88 | | - |
89 | 88 | tasks: | |
90 | 89 | * Incorporate Clad's API components (such as `clad::array` and `clad::tape`) |
91 | 90 | into PyTorch using its C++ API |
|
96 | 95 | * Document thoroughly the integration process and the benchmark results and identify |
97 | 96 | potential bottlenecks in Clad's execution |
98 | 97 | * Present the work at the relevant meetings and conferences. |
| 98 | + tags: ["clad", "pytorch", "python", "cuda", "benchmarking", "automatic-differentiation", "gpu"] |
99 | 99 |
|
100 | 100 | - name: "Enable automatic differentiation of C++ STL concurrency primitives in Clad" |
101 | 101 | description: | |
|
104 | 104 | derivatives of the function. This project focuses on enabling automatic differentiation |
105 | 105 | of codes that utilise C++ concurrency features such as `std::thread`, `std::mutex`, |
106 | 106 | atomic operations and more. This will allow users to fully utilize their CPU resources. |
107 | | - |
108 | 107 | tasks: | |
109 | 108 | * Explore C++ concurrency primitives and prepare a report detailing the associated challenges |
110 | 109 | involved and the features that can be feasibly supported within the given timeframe. |
111 | 110 | * Add concurrency primitives support in Clad's forward-mode automatic differentiation. |
112 | 111 | * Add concurrency primitives support in Clad's reverse-mode automatic differentiation. |
113 | 112 | * Add proper tests and documentation. |
114 | 113 | * Present the work at the relevant meetings and conferences. |
| 114 | + tags: ["clad", "cpp", "stl", "concurrency", "multithreading", "automatic-differentiation"] |
115 | 115 |
|
116 | 116 | - name: "Interactive Differential Debugging - Intelligent Auto-Stepping and Tab-Completion" |
117 | 117 | description: | |
|
131 | 131 | of the system, then drop to the debugger. This may be achieved by introducing new IDD-specific |
132 | 132 | commands. IDD should be able to tab complete the underlying GDB/LLDB commands. The contributor |
133 | 133 | is also expected to set up the necessary CI infrastructure to automate the testing process of IDD. |
134 | | -
|
135 | | - |
136 | 134 | tasks: | |
137 | 135 | * Enable stream capture |
138 | 136 | * Enable IDD-specific commands to execute until diverging stack or variable value. |
139 | 137 | * Enable tab completion of commands. |
140 | 138 | * Set up CI infrastructure to automate testing IDD. |
141 | 139 | * Present the work at the relevant meetings and conferences. |
| 140 | + tags: ["debugging", "idd", "gdb", "lldb", "regression", "tooling", "ci"] |
142 | 141 |
|
143 | 142 | - name: "Implement CppInterOp API exposing memory, ownership and thread safety information " |
144 | 143 | description: | |
|
166 | 165 | cppyy (Python-C++ language bindings) as an exemplar. If time permits, extend |
167 | 166 | the work to persistify this information across translation units and use it on |
168 | 167 | code compiled with Clang. |
169 | | -
|
170 | 168 | tasks: | |
171 | 169 | * Collect and categorize possible exposed interop information kinds |
172 | 170 | * Write one or more facilities to extract necessary implementation details |
173 | 171 | * Design a language-independent interface to expose this information |
174 | 172 | * Integrate the work in clang-repl and Cling |
175 | 173 | * Implement and demonstrate its use in cppyy as an exemplar |
176 | 174 | * Present the work at the relevant meetings and conferences. |
| 175 | + tags: ["cppinterop", "cppyy", "clang-repl", "cling", "interoperability", "ast", "jit"] |
177 | 176 |
|
178 | 177 | - name: "Implement and improve an efficient, layered tape with prefetching capabilities" |
179 | 178 | description: | |
|
205 | 204 |
|
206 | 205 | This project aims to improve the efficiency of the clad tape and generalize |
207 | 206 | it into a tool-agnostic facility that could be used outside of clad as well. |
208 | | -
|
209 | 207 | tasks: | |
210 | 208 | * Optimize the current tape by avoiding re-allocating on resize in favor of using connected slabs of array |
211 | 209 | * Enhance existing benchmarks demonstrating the efficiency of the new tape |
|
214 | 212 | * [Stretch goal] Support cpu-gpu transfer of the tape |
215 | 213 | * [Stretch goal] Add infrastructure to enable checkpointing offload to the new tape |
216 | 214 | * [Stretch goal] Performance benchmarks |
| 215 | + tags: ["clad", "data-structures", "performance", "memory-management", "gpu", "hpc"] |
217 | 216 |
|
218 | 217 | - name: "Enabling CUDA compilation on Cppyy-Numba generated IR" |
219 | 218 | description: | |
|
254 | 253 | the Numba extension. |
255 | 254 | * Design and develop a CUDA compilation and execution mechanism. |
256 | 255 | * Prepare proper tests and documentation. |
257 | | -
|
| 256 | + tags: ["cppyy", "numba", "cuda", "llvm", "ir", "gpu", "python"] |
258 | 257 |
|
259 | 258 | - name: "Cppyy STL/Eigen - Automatic conversion and plugins for Python based ML-backends" |
260 | 259 | description: | |
|
278 | 277 | techniques in ML tools like JAX/CUTLASS. This project allows the C++ |
279 | 278 | infrastructure to be plugged into at service to the users seeking |
280 | 279 | high-performance library primitives that are unavailable in Python. |
281 | | - |
282 | 280 | tasks: | |
283 | 281 | * Extend STL support for std::vectors of arbitrary dimensions |
284 | 282 | * Improve the initialization approach for Eigen classes |
|
288 | 286 | operations in frameworks like JAX |
289 | 287 | * Work on integrating these plugins with toolkits like CUTLASS that |
290 | 288 | utilise the bindings to provide a Python API |
| 289 | + tags: ["cppyy", "stl", "eigen", "jax", "cutlass", "numpy", "machine-learning"] |
291 | 290 |
|
292 | 291 | - name: "On Demand Parsing in Clang" |
293 | 292 | description: | |
|
401 | 400 | Alternative and more efficient implementation could be to make the lookup tables |
402 | 401 | range based but we do not have even a prototype proving this could be a feasible |
403 | 402 | approach. |
404 | | -
|
405 | 403 | tasks: | |
406 | 404 | * Design and implementation of on-demand compilation for non-templated functions |
407 | 405 | * Support non-templated structs and classes |
|
413 | 411 | meetings, deliver presentations, and contribute blog posts as requested. |
414 | 412 | Additionally, they should demonstrate the ability to navigate the |
415 | 413 | community process with patience and understanding. |
| 414 | + tags: ["clang", "compiler", "parsing", "performance", "memory-optimization", "cling"] |
416 | 415 |
|
417 | 416 | - name: "Broaden the Scope for the Floating-Point Error Estimation Framework in Clad" |
418 | | - |
419 | 417 | description: | |
420 | 418 | In mathematics and computer algebra, automatic differentiation (AD) is |
421 | 419 | a set of techniques to numerically evaluate the derivative of a function |
|
473 | 471 | the capabilities of the framework. |
474 | 472 | * Solve any general-purpose issues that come up with Clad during the process. |
475 | 473 | * Prepare demos and carry out development needed for lossy compression. |
476 | | -
|
| 474 | + tags: ["clad", "floating-point", "numerical-stability", "benchmarking", "error-estimation"] |
477 | 475 |
|
478 | 476 | - name: "Improve robustness of dictionary to module lookups in ROOT" |
479 | 477 | description: | |
|
523 | 521 | implementation works. |
524 | 522 | * Develop tutorials and documentation. |
525 | 523 | * Present the work at the relevant meetings and conferences. |
| 524 | + tags: ["root", "cern", "cpp-modules", "cmssw", "dictionary", "io"] |
526 | 525 |
|
527 | 526 | - name: "Enhance the incremental compilation error recovery in clang and clang-repl" |
528 | 527 | description: | |
|
554 | 553 | * Find and fix cases where there are bugs |
555 | 554 | * Implement template instantiation error recovery support |
556 | 555 | * Implement argument-dependent lookup (ADL) recovery support |
| 556 | + tags: ["clang", "clang-repl", "incremental-compilation", "error-recovery", "jit"] |
557 | 557 |
|
558 | 558 | ################################################################################ |
559 | 559 | # 2025 # |
|
0 commit comments