Fine-tuning for Object Detection and Classify Scenes in Fixed-Camera Settings #1079

n-kato-zh · 2025-05-15T04:24:42Z

n-kato-zh
May 15, 2025

Thank you for your excellent research and for sharing your work.

I am currently working on an approach where, after detecting objects captured by a fixed camera, I perform classification on the bounding boxes using CLIP. While the base model provides a reasonable level of accuracy, I am looking to fine-tune it to better adapt to the characteristics of the objects appearing in this specific camera feed.

Now, I am applying a CNN classifier with categories such as "person" and "other." However, the "other" class contains a variety of miscellaneous elements like shadows and structures. As a first step, I attempted to fine-tune the model by grouping those into a single "scenery" label.

However, perhaps due to the broad and ambiguous nature of "scenery," the model's predictions began favoring the "scenery" class more often compared to the base model. As a result, the precision for the "scenery" class decreased, and the recall for the "person" class also dropped.

Based on this observation, I am now considering a different approach: instead of lumping everything into a single "scenery" class, I plan to assign more specific positive class labels like "shadow" and "pipe," and use these as textual labels for training.

If you have any suggestions or alternative ideas, I would greatly appreciate your advice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuning for Object Detection and Classify Scenes in Fixed-Camera Settings #1079

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Fine-tuning for Object Detection and Classify Scenes in Fixed-Camera Settings #1079

Uh oh!

n-kato-zh May 15, 2025

Replies: 0 comments

n-kato-zh
May 15, 2025