Integrate Kalman Filter-based Torrent Health Estimation #8188

grimadas · 2024-10-03T11:10:14Z

The problem

As highlighted in this comment, relying solely on self-assessments isn’t scalable. Navigating through a sea of misleading or fake health signals is challenging. We need a mechanism to (1) filter out spam and irrelevant information and (2) reliably rank popularity and emerging trends.

Solution

Why not apply some tried-and-true signal processing techniques to see if they can cut through the noise?

My plan is to integrate a Kalman Filter-based algorithm into Tribler to estimate torrent health and filter out dead torrents based on seeder reports. Atm, I have developed a prototype that utilizes the filterpy library, specifically leveraging the Unscented Kalman Filter (UKF) implementation. This algorithm allows us to combine seeder reports from various peers while accounting for measurement noise and adjusting for the reliability scores of different sources. And it's pretty fast to run.

To adapt to the dynamic nature of torrent networks I have made few adjustments:

Torrent health checks, performed at different time intervals, are considered reliable only to a certain degree, and our model includes mechanisms to estimate the likelihood of torrent change over time.
Outliers in health reports are defined as values lying outside a 95-99% confidence interval
If a peer consistently provides unreliable reports, its reputation is decreased drastically. If the report seems valid reputation score is slightly increased.
These reputation scores are then incorporated as weights in the predict_health function, which computes the current best estimate of torrent health given timestamp.

Development plan:

Integrate the current prototype into the Tribler client and run it locally to test its effectiveness using real network health checks. Evaluate how adequate the algorithm is.
Numerical examples with real stuff. Performance analysis
Refactor the Kalman Filter to use only numpy to reduce dependency weight, removing the reliance on scipy to ensure a lightweight solution (scipy dependency is too much).
Experimental release

The text was updated successfully, but these errors were encountered:

adlai · 2024-10-11T09:47:13Z

Why are both scipy and numpy together considered too much, if numpy alone is not?

qstokkink added type: enhancement component: content discovery labels Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Kalman Filter-based Torrent Health Estimation #8188

Integrate Kalman Filter-based Torrent Health Estimation #8188

grimadas commented Oct 3, 2024 •

edited

Loading

adlai commented Oct 11, 2024

Integrate Kalman Filter-based Torrent Health Estimation #8188

Integrate Kalman Filter-based Torrent Health Estimation #8188

Comments

grimadas commented Oct 3, 2024 • edited Loading

The problem

Solution

Development plan:

adlai commented Oct 11, 2024

grimadas commented Oct 3, 2024 •

edited

Loading