alicevision · simogasp · Aug 26, 2025 · Aug 27, 2025 · Aug 29, 2025 · Sep 1, 2025
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1 @@
+See [AI_DEVELOPMENT_GUIDE.md](../AI_DEVELOPMENT_GUIDE.md) for full coding conventions.
diff --git a/.markdownlint.json b/.markdownlint.json
@@ -0,0 +1,6 @@
+{
+  "default": true,
+  "MD013": false,
+  "MD024": false,
+  "MD033": false
+}
diff --git a/AI_DEVELOPMENT_GUIDE.md b/AI_DEVELOPMENT_GUIDE.md
@@ -0,0 +1,125 @@
+# AI Development Guide for PopSift
+
+This guide defines how AI-assisted code generation should be done in this repository.  
+It ensures that contributions (from GitHub Copilot, ChatGPT, Claude, etc.) follow a **consistent, modern, and maintainable style**.
+
+---
+
+## General Principles
+
+- Always prioritize **readability** and **clarity** over micro-optimizations.
+- Follow **modern C++17 best practices**.
+- Keep device-side __global__ functions in the same source file as the host-side C++ code that starts this kernel.
+- Always compile __device__ functions with the functions that call them. Preferably declare them static inline.
+- Prefer **modularity**: each class or major component should live in its own file.
+- Code should be **self-documenting** whenever possible, with clear naming and structure.
+
+---
+
+## C++ Guidelines
+
+- **Standard**:
+  - Use **C++17**. Prefer `constexpr`, `auto` and `enum class`.
+  - Use range-based for loops on the host side.
+  - Use smart pointers (`std::unique_ptr`, `std::shared_ptr`) on the host side.
+  - Dynamic memory allocation on the device side is strongly discouraged.
+  - Never pass smart pointers as parameters to __global__ functions.
+- **Memory Management**:
+  - Use RAII on the host side.
+  - Avoid all dynamic memory allocation on the device side.
+  - Understand that reference-counting smart pointers cannot be kept consistent between
+    host and device, and that kernels run asynchronously from host code.
+- **Error Handling**:
+  - Use exceptions in host C++ code.  
+  - In CUDA, check and propagate error codes using helper utilities/macros. Never ignore errors.
+- **Namespaces**: Group related functions/classes logically. Avoid polluting the global namespace.
+- **Headers**:
+  - Keep headers minimal; forward declare instead of including heavy dependencies.
+    However, small helper functions declared `static inline __device__` use several times should be
+    included instead of copying the code.
+  - Each header should be guarded with `#pragma once`. ifndef/endif guards should be used in special
+    circumstances only.
+- **Style**:
+  - `snake_case` for variables and functions.  
+  - `CamelCase` for class and struct names.  
+  - `ALL_CAPS` for macros and compile-time constants.
+
+---
+
+## CUDA Guidelines
+
+- Separate **kernels** (`__global__` functions) from host orchestration code, but keep
+  them in the same module as the host core that starts them.
+- Name kernels descriptively, e.g. `compute_gradient_kernel`.
+- Document assumptions about:
+  - Thread/block layout
+  - Shared memory usage
+  - Synchronization requirements
+- Use `__restrict__` and `constexpr` where appropriate for performance and clarity.
+- Avoid writing kernels that use `local memory`, limit variables to registers and shared
+  memory as much as possible. To achieve this, prefer focused kernels over complex ones.
+- To structure larger kernels, use `__device__` functions that are declared
+  `static inline __device__`. Ensure that caller and device functions are compiled together.
+- Avoid dynamic parallelism.
+- Always validate CUDA API calls.
+
+---
+
+## Threading Guidelines
+
+- **Host Threading**: Use `std::thread` and synchronization primitives from `<mutex>`.
+- **CUDA Streams**: Use multiple streams for concurrent kernel execution.
+- **Thread Safety**: Document thread safety guarantees for all public APIs.
+- **Avoid**: Raw pthreads or platform-specific threading APIs.
+
+---
+
+## Modularity and Organization
+
+- Keep code **organized by functionality** (e.g., detection, description, GPU utilities).
+- Avoid very long functions (>50 lines); refactor into helpers when possible.
+- Prefer **free functions** in namespaces over singletons or unnecessary wrapper classes.
+- Keep algorithms and data structures reusable when possible.
+
+---
+
+## Performance Guidelines
+
+- **Memory Access Patterns**: Prefer coalesced memory access in CUDA kernels. Document stride patterns.
+- **Shared Memory**: Use shared memory for data reuse within thread blocks. Document bank conflicts.
+- **Register Usage**: Monitor register pressure in kernels. Aim for high occupancy.
+- **Asynchronous Operations**: Use CUDA streams for overlapping computation and memory transfers.
+- **Profiling**: Profile with `nvprof` or Nsight before optimizing. Document performance assumptions.
+- **Memory Bandwidth**: Consider memory bandwidth as the primary bottleneck for most kernels.
+
+---
+
+## Documentation
+
+- Use **Doxygen-style comments** for public APIs, classes, and CUDA kernels.
+- Document algorithm choices and any CUDA-specific design tradeoffs.
+- Update examples and README when new features are introduced.
+- At each update ensure that the changelog is also updated following the [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.
+  - for each new feature, bug fix, or breaking change, add a corresponding entry in the changelog.
+  - the description should be short but informative, followed by the relevant PR link.
+
+---
+
+## Git Guidelines
+
+- **Branch Names**: `feature/description`, `fix/issue-number`, `refactor/component`
+- **Commit Messages**: Use conventional commits format: `[feat]`, `[fix]`, `[refactor]`, `[doc]` etc.
+- **File Organization**: Keep related files in logical directories
+- **Ignore Patterns**: Update `.gitignore` for build artifacts and IDE files
+
+---
+
+## Commit & PR Guidelines
+
+- Keep commits small and focused (one feature or fix per commit).
+- Do not commit untracked files that are not relevant.
+- PRs should include:
+  - Clear description of changes
+  - Explanations for algorithmic choices or CUDA-specific design decisions
+  - Updated tests or examples if applicable
+- Code must pass existing CI checks before merging.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		See [AI_DEVELOPMENT_GUIDE.md](../AI_DEVELOPMENT_GUIDE.md) for full coding conventions.