You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As highlighted in this comment, relying solely on self-assessments isn’t scalable. Navigating through a sea of misleading or fake health signals is challenging. We need a mechanism to (1) filter out spam and irrelevant information and (2) reliably rank popularity and emerging trends.
Solution
Why not apply some tried-and-true signal processing techniques to see if they can cut through the noise?
My plan is to integrate a Kalman Filter-based algorithm into Tribler to estimate torrent health and filter out dead torrents based on seeder reports. Atm, I have developed a prototype that utilizes the filterpy library, specifically leveraging the Unscented Kalman Filter (UKF) implementation. This algorithm allows us to combine seeder reports from various peers while accounting for measurement noise and adjusting for the reliability scores of different sources. And it's pretty fast to run.
To adapt to the dynamic nature of torrent networks I have made few adjustments:
Torrent health checks, performed at different time intervals, are considered reliable only to a certain degree, and our model includes mechanisms to estimate the likelihood of torrent change over time.
Outliers in health reports are defined as values lying outside a 95-99% confidence interval
If a peer consistently provides unreliable reports, its reputation is decreased drastically. If the report seems valid reputation score is slightly increased.
These reputation scores are then incorporated as weights in the predict_health function, which computes the current best estimate of torrent health given timestamp.
Development plan:
Integrate the current prototype into the Tribler client and run it locally to test its effectiveness using real network health checks. Evaluate how adequate the algorithm is.
Numerical examples with real stuff. Performance analysis
Refactor the Kalman Filter to use only numpy to reduce dependency weight, removing the reliance on scipy to ensure a lightweight solution (scipy dependency is too much).
Experimental release
The text was updated successfully, but these errors were encountered:
The problem
As highlighted in this comment, relying solely on self-assessments isn’t scalable. Navigating through a sea of misleading or fake health signals is challenging. We need a mechanism to (1) filter out spam and irrelevant information and (2) reliably rank popularity and emerging trends.
Solution
Why not apply some tried-and-true signal processing techniques to see if they can cut through the noise?
My plan is to integrate a Kalman Filter-based algorithm into Tribler to estimate torrent health and filter out dead torrents based on seeder reports. Atm, I have developed a prototype that utilizes the filterpy library, specifically leveraging the Unscented Kalman Filter (UKF) implementation. This algorithm allows us to combine seeder reports from various peers while accounting for measurement noise and adjusting for the reliability scores of different sources. And it's pretty fast to run.
To adapt to the dynamic nature of torrent networks I have made few adjustments:
predict_health
function, which computes the current best estimate of torrent health given timestamp.Development plan:
Integrate the current prototype into the Tribler client and run it locally to test its effectiveness using real network health checks. Evaluate how adequate the algorithm is.
Numerical examples with real stuff. Performance analysis
Refactor the Kalman Filter to use only numpy to reduce dependency weight, removing the reliance on scipy to ensure a lightweight solution (scipy dependency is too much).
Experimental release
The text was updated successfully, but these errors were encountered: