RIPE:
Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction
ππΊ ICCV 2025 πΊπ
Johannes KΓΌnzel Β· Anna Hilsmann Β· Peter Eisert
RIPE demonstrates that keypoint detection and description can be learned from image pairs only - no depth, no pose, no artificial augmentation required.
π‘Alternativeπ‘ Install nothing locally and try our Hugging Face demo: π€Demoπ€
-
Install mamba by following the instructions given here: Mamba Installation
-
Create a new environment with:
mamba create -f conda_env.yml
mamba activate ripe-env
Or just check demo.py
import cv2
import kornia.feature as KF
import kornia.geometry as KG
import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision.io import decode_image
from ripe import vgg_hyper
from ripe.utils.utils import cv2_matches_from_kornia, resize_image, to_cv_kpts
dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = vgg_hyper().to(dev)
model.eval()
image1 = resize_image(decode_image("assets/all_souls_000013.jpg").float().to(dev) / 255.0)
image2 = resize_image(decode_image("assets/all_souls_000055.jpg").float().to(dev) / 255.0)
kpts_1, desc_1, score_1 = model.detectAndCompute(image1, threshold=0.5, top_k=2048)
kpts_2, desc_2, score_2 = model.detectAndCompute(image2, threshold=0.5, top_k=2048)
matcher = KF.DescriptorMatcher("mnn") # threshold is not used with mnn
match_dists, match_idxs = matcher(desc_1, desc_2)
matched_pts_1 = kpts_1[match_idxs[:, 0]]
matched_pts_2 = kpts_2[match_idxs[:, 1]]
H, mask = KG.ransac.RANSAC(model_type="fundamental", inl_th=1.0)(matched_pts_1, matched_pts_2)
matchesMask = mask.int().ravel().tolist()
result_ransac = cv2.drawMatches(
(image1.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
to_cv_kpts(kpts_1, score_1),
(image2.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
to_cv_kpts(kpts_2, score_2),
cv2_matches_from_kornia(match_dists, match_idxs),
None,
matchColor=(0, 255, 0),
matchesMask=matchesMask,
# matchesMask=None, # without RANSAC filtering
singlePointColor=(0, 0, 255),
flags=cv2.DrawMatchesFlags_DEFAULT,
)
plt.imshow(result_ransac)
plt.axis("off")
plt.tight_layout()
plt.show()
# plt.savefig("result_ransac.png")
- Download and install Glue Factory
- Add this repo as a submodule to Glue Factory:
cd glue-factory
git submodule add https://github.com/fraunhoferhhi/RIPE.git thirdparty/ripe
-
Create the new file ripe.py under gluefactory/models/extractors/ with the following content:
ripe.py
import sys from pathlib import Path import torch import torchvision.transforms as transforms from ..base_model import BaseModel ripe_path = Path(__file__).parent / "../../../thirdparty/ripe" print(f"RIPE Path: {ripe_path.resolve()}") # check if the path exists if not ripe_path.exists(): raise RuntimeError(f"RIPE path not found: {ripe_path}") sys.path.append(str(ripe_path)) from ripe import vgg_hyper class RIPE(BaseModel): default_conf = { "name": "RIPE", "model_path": None, "chunk": 4, "dense_outputs": False, "threshold": 1.0, "top_k": 2048, } required_data_keys = ["image"] # Initialize the line matcher def _init(self, conf): self.normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) self.model = vgg_hyper(model_path=conf.model_path) self.model.eval() self.set_initialized() def _forward(self, data): image = data["image"] keypoints, scores, descriptors = [], [], [] chunk = self.conf.chunk for i in range(0, image.shape[0], chunk): if self.conf.dense_outputs: raise NotImplementedError("Dense outputs are not supported") else: im = image[: min(image.shape[0], i + chunk)] im = self.normalizer(im) H, W = im.shape[-2:] kpt, desc, score = self.model.detectAndCompute( im, threshold=self.conf.threshold, top_k=self.conf.top_k, ) keypoints += [kpt.squeeze(0)] scores += [score.squeeze(0)] descriptors += [desc.squeeze(0)] del kpt del desc del score keypoints = torch.stack(keypoints, 0) scores = torch.stack(scores, 0) descriptors = torch.stack(descriptors, 0) pred = { # "keypoints": keypoints.to(image) + 0.5, "keypoints": keypoints.to(image), "keypoint_scores": scores.to(image), "descriptors": descriptors.to(image), } return pred def loss(self, pred, data): raise NotImplementedError
-
Create ripe+NN.yaml in gluefactory/configs with the following content:
ripe+NN.yaml
model: name: two_view_pipeline extractor: name: extractors.ripe threshold: 1.0 top_k: 2048 matcher: name: matchers.nearest_neighbor_matcher benchmarks: megadepth1500: data: preprocessing: side: long resize: 1600 eval: estimator: poselib ransac_th: 0.5 hpatches: eval: estimator: poselib ransac_th: 0.5 model: extractor: top_k: 1024 # overwrite config above
-
Run the MegaDepth 1500 evaluation script:
python -m gluefactory.eval.megadepth1500 --conf ripe+NN # for MegaDepth 1500
Should result in:
'rel_pose_error@10Β°': 0.6834,
'rel_pose_error@20Β°': 0.7803,
'rel_pose_error@5Β°': 0.5511,
- Run the HPatches evaluation script:
python -m gluefactory.eval.hpatches --conf ripe+NN # for HPatches
Should result in:
'H_error_ransac@1px': 0.3793,
'H_error_ransac@3px': 0.5893,
'H_error_ransac@5px': 0.692,
- Create a .env file with the following content:
OUTPUT_DIR="/output"
DATA_DIR="/data"
-
Download the required datasets:
DISK Megadepth subset
To download the dataset used by DISK execute the following commands:
cd data bash download_disk_data.sh
Tokyo 24/7
β οΈ Optionalβ οΈ : Only if you are interest in the model used in Section 4.6 of the paper!- Download the Tokyo 24/7 query images from here: Tokyo 24/7 Query Images V3 from the official website.
- extract them into data/Tolyo_Query_V3
Tokyo_Query_V3/ βββ 00001.csv βββ 00001.jpg βββ 00002.csv βββ 00002.jpg βββ ... βββ 01125.csv βββ 01125.jpg βββ Readme.txt βββ Readme.txt~
ACDC
β οΈ Optionalβ οΈ : Only if you are interest in the model used in Section 6.1 (supplementary) of the paper!- Download the RGB images from here: ACDC RGB Images
- extract them into data/ACDC
ACDC/ rgb_anon βββ fog βΒ Β βββ test βΒ Β βΒ Β βββ GOPR0475 βΒ Β βΒ Β βββ GOPR0477 βΒ Β βββ test_ref βΒ Β βΒ Β βββ GOPR0475 βΒ Β βΒ Β βββ GOPR0477 βΒ Β βββ train βΒ Β βΒ Β βββ GOPR0475 βΒ Β βΒ Β βββ GOPR0476 βββ night
-
Run the training script:
python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline
You can also easily switch setting from the command line, e.g. to addionally train on the Tokyo 24/7 dataset:
python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline data=megadepth+tokyo
Our code is partly based on the following repositories:
Our evaluation was based on the following repositories:
We would like to thank the authors of these repositories for their great work and for making their code available.
Our project webpage is based on the Acadamic Project Page Template by Eliahu Horwitz.
@article{ripe2025,
year = {2025},
title = {{RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction}},
author = {KΓΌnzel, Johannes and Hilsmann, Anna and Eisert, Peter},
journal = {arXiv},
eprint = {2507.04839},
}