chore: update confs

actions-user · actions-user · commit 5c47dfa9ee00 · 2025-04-30T00:48:20.000Z
diff --git a/arxiv.json b/arxiv.json
@@ -29839,5 +29839,33 @@
         "pub_date": "2025-04-25",
         "summary": "Deep neural networks (DNNs) have proven to be successful in various computer\nvision applications such that models even infer in safety-critical situations.\nTherefore, vision models have to behave in a robust way to disturbances such as\nnoise or blur. While seminal benchmarks exist to evaluate model robustness to\ndiverse corruptions, blur is often approximated in an overly simplistic way to\nmodel defocus, while ignoring the different blur kernel shapes that result from\noptical systems. To study model robustness against realistic optical blur\neffects, this paper proposes two datasets of blur corruptions, which we denote\nOpticsBench and LensCorruptions. OpticsBench examines primary aberrations such\nas coma, defocus, and astigmatism, i.e. aberrations that can be represented by\nvarying a single parameter of Zernike polynomials. To go beyond the principled\nbut synthetic setting of primary aberrations, LensCorruptions samples linear\ncombinations in the vector space spanned by Zernike polynomials, corresponding\nto 100 real lenses. Evaluations for image classification and object detection\non ImageNet and MSCOCO show that for a variety of different pre-trained models,\nthe performance on OpticsBench and LensCorruptions varies significantly,\nindicating the need to consider realistic image corruptions to evaluate a\nmodel's robustness against blur.",
         "translated": ""
+    },
+    {
+        "title": "CompleteMe: Reference-based Human Image Completion",
+        "url": "http://arxiv.org/abs/2504.20042v1",
+        "pub_date": "2025-04-28",
+        "summary": "Recent methods for human image completion can reconstruct plausible body\nshapes but often fail to preserve unique details, such as specific clothing\npatterns or distinctive accessories, without explicit reference images. Even\nstate-of-the-art reference-based inpainting approaches struggle to accurately\ncapture and integrate fine-grained details from reference images. To address\nthis limitation, we propose CompleteMe, a novel reference-based human image\ncompletion framework. CompleteMe employs a dual U-Net architecture combined\nwith a Region-focused Attention (RFA) Block, which explicitly guides the\nmodel's attention toward relevant regions in reference images. This approach\neffectively captures fine details and ensures accurate semantic correspondence,\nsignificantly improving the fidelity and consistency of completed images.\nAdditionally, we introduce a challenging benchmark specifically designed for\nevaluating reference-based human image completion tasks. Extensive experiments\ndemonstrate that our proposed method achieves superior visual quality and\nsemantic consistency compared to existing techniques. Project page:\nhttps://liagm.github.io/CompleteMe/",
+        "translated": ""
+    },
+    {
+        "title": "Learning Streaming Video Representation via Multitask Training",
+        "url": "http://arxiv.org/abs/2504.20041v1",
+        "pub_date": "2025-04-28",
+        "summary": "Understanding continuous video streams plays a fundamental role in real-time\napplications including embodied AI and autonomous driving. Unlike offline video\nunderstanding, streaming video understanding requires the ability to process\nvideo streams frame by frame, preserve historical information, and make\nlow-latency decisions.To address these challenges, our main contributions are\nthree-fold. (i) We develop a novel streaming video backbone, termed as\nStreamFormer, by incorporating causal temporal attention into a pre-trained\nvision transformer. This enables efficient streaming video processing while\nmaintaining image representation capability.(ii) To train StreamFormer, we\npropose to unify diverse spatial-temporal video understanding tasks within a\nmultitask visual-language alignment framework. Hence, StreamFormer learns\nglobal semantics, temporal dynamics, and fine-grained spatial relationships\nsimultaneously. (iii) We conduct extensive experiments on online action\ndetection, online video instance segmentation, and video question answering.\nStreamFormer achieves competitive results while maintaining efficiency,\ndemonstrating its potential for real-time applications.",
+        "translated": ""
+    },
+    {
+        "title": "MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion",
+        "url": "http://arxiv.org/abs/2504.20040v1",
+        "pub_date": "2025-04-28",
+        "summary": "While Structure-from-Motion (SfM) has seen much progress over the years,\nstate-of-the-art systems are prone to failure when facing extreme viewpoint\nchanges in low-overlap, low-parallax or high-symmetry scenarios. Because\ncapturing images that avoid these pitfalls is challenging, this severely limits\nthe wider use of SfM, especially by non-expert users. We overcome these\nlimitations by augmenting the classical SfM paradigm with monocular depth and\nnormal priors inferred by deep neural networks. Thanks to a tight integration\nof monocular and multi-view constraints, our approach significantly outperforms\nexisting ones under extreme viewpoint changes, while maintaining strong\nperformance in standard conditions. We also show that monocular priors can help\nreject faulty associations due to symmetries, which is a long-standing problem\nfor SfM. This makes our approach the first capable of reliably reconstructing\nchallenging indoor environments from few images. Through principled uncertainty\npropagation, it is robust to errors in the priors, can handle priors inferred\nby different models with little tuning, and will thus easily benefit from\nfuture progress in monocular depth and normal estimation. Our code is publicly\navailable at https://github.com/cvg/mpsfm.",
+        "translated": ""
+    },
+    {
+        "title": "Mitigating Catastrophic Forgetting in the Incremental Learning of\n  Medical Images",
+        "url": "http://arxiv.org/abs/2504.20033v1",
+        "pub_date": "2025-04-28",
+        "summary": "This paper proposes an Incremental Learning (IL) approach to enhance the\naccuracy and efficiency of deep learning models in analyzing T2-weighted (T2w)\nMRI medical images prostate cancer detection using the PI-CAI dataset. We used\nmultiple health centers' artificial intelligence and radiology data, focused on\ndifferent tasks that looked at prostate cancer detection using MRI (PI-CAI). We\nutilized Knowledge Distillation (KD), as it employs generated images from past\ntasks to guide the training of models for subsequent tasks. The approach\nyielded improved performance and faster convergence of the models. To\ndemonstrate the versatility and robustness of our approach, we evaluated it on\nthe PI-CAI dataset, a diverse set of medical imaging modalities including OCT\nand PathMNIST, and the benchmark continual learning dataset CIFAR-10. Our\nresults indicate that KD can be a promising technique for IL in medical image\nanalysis in which data is sourced from individual health centers and the\nstorage of large datasets is not feasible. By using generated images from prior\ntasks, our method enables the model to retain and apply previously acquired\nknowledge without direct access to the original data.",
+        "translated": ""
     }
 ]