-
Notifications
You must be signed in to change notification settings - Fork 49
Ban socket addresses not sending a valid connection ID #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here's a comparison of three Rust crates that implement Counting Bloom Filters:
Notes:
Data is based on available information as of December 2024. |
Hi @da2ce7 I think I'm going to implement it in two phases, first the IPs and then the subnets. NOTES/QUESTIONS:
|
…imit If the client does not send the rigth conenction ID more than 10 times it's banned. In this first implementation after sending 10 times a wrong connection ID. They are only unabnned when the tracker is restarted.
…imit If the client does not send the rigth conenction ID more than 10 times it's banned. In this first implementation after sending 10 times a wrong connection ID. They are only unabnned when the tracker is restarted.
…imit If the client does not send the rigth conenction ID more than 10 times it's banned. In this first implementation after sending 10 times a wrong connection ID. They are only unbanned when the tracker is restarted.
…limit If the client does not send the rigth conenction ID more than 10 times it's banned. In this first implementation after sending 10 times a wrong connection ID. They are only unbanned when the tracker is restarted.
Hi @da2ce7, I've implemented the minimal solution here. When should we unban an IP? I think we can unban all IPs every 24 hours ( I have more questions in my previous comment ☝🏼. |
…tured Instead of captured the mapped error in the caller function when the error is already converted into a UDP error reponse. This prevents from parsing the error message to filter the error we are interesting in.
Running the cleaner check on each iteration decreased the UDP tracker performance.
Running the cleaner check on each iteration decreased the UDP tracker performance.
…limit If the client does not send the rigth conenction ID more than 10 times it's banned. In this first implementation after sending 10 times a wrong connection ID. They are only unbanned when the tracker is restarted.
…tured Instead of captured the mapped error in the caller function when the error is already converted into a UDP error reponse. This prevents from parsing the error message to filter the error we are interesting in.
Running the cleaner check on each iteration decreased the UDP tracker performance.
Today, we discussed this issue in our weekly meeting. @da2ce7 said the false negatives rate is too high. @da2ce7 I forgot to mention that you can set the rate: The frequency of false positives can be preciecly bounded by setting the size of the filter, and is called the False Positive Rate. See https://docs.rs/bloom/latest/bloom/index.html#bloom-filters. @da2ce7 also commented there is an open issue for a long time about the high False-positive rate: We were also considering using another crate that I discarded because It does not have an implementation for a Counting Bloom Filter. The crate: https://github.com/tomtomwombat/fastbloom NOTE: It has the same name as the other one I was analyzing, but they are actually two different packages:
I've opened a new issue to ask them for their plans to add a Counting Bloom Filter feature. We also discussed alternative implementations to remove false positives. I will describe the solution in a new comment. |
1. Alternative Implementation: Two-Tiered ApproachThe basic idea is to use the CBF just as a fast filter to detect potencial bad actors. If the counter for an IP goes over a threshold, we don't ban the IP directly. Instead, we add the IP to a reliable secondary list with a HashMap. Counting Bloom Filter as a Fast Filter:
Reliable Backend Structure:
False Positive Handling:
Unban Handling:
Pros
Cons
Questions |
In the new implementation add a new metric to tracker stats for the number of banned IPs. |
…limit If the client does not send the rigth conenction ID more than 10 times it's banned. In this first implementation after sending 10 times a wrong connection ID. They are only unbanned when the tracker is restarted.
…tured Instead of captured the mapped error in the caller function when the error is already converted into a UDP error reponse. This prevents from parsing the error message to filter the error we are interesting in.
Running the cleaner check on each iteration decreased the UDP tracker performance.
We are using a Counting Bloom Filter to count IPs sending wrong connections IDs. IPs are banned after sending 10 wrong connections IDs. CBFs are fast and use litle memory but they are also innaccurate. They have False Positives meaning some IPs would be banned only becuase there are bucket colissions (IPs sharing the same counter). To avoid banning IPs incorrectly we decided to introduce a second counter, which is a HashMap counting error is a exact way. IPs are only banned when this counter reaches the limit. We keep the CBF as a first level filter. It's a fast check to filter IPs without affecting tracker's performance. When the IP is banned according tho the first filter we start a counter for that IP in the second exact counter. This solution should be good if the number of IPs is low. We have to find another solution anyway for IPv6 where is cheaper to own a range if IPs.
Since the new solution with a HashMap consumes more memory, we should keep hte banning list short. The drawback is clients will be allowed to send more wrong connections IDs. However, sending 10 requests with wrong connection IDs every 2 minutos should not affect much the performance, unless we have many IPs, and in that case we would have a problem with memory anyway. In the future Sys Admin could inject this via a setting value.
Since the new solution with a HashMap consumes more memory, we should keep hte banning list short. The drawback is clients will be allowed to send more wrong connections IDs. However, sending 10 requests with wrong connection IDs every 2 minutos should not affect much the performance, unless we have many IPs, and in that case we would have a problem with memory anyway. In the future Sys Admin could inject this via a setting value.
…limit The life demo tracker is receiving many UDP requests with a wrong conenctions IDs. Errors are logged (write disk) and that decreases the tracker performance. This counts errors and bans Ips after 10 errors for 2 minutes. We use two levels of counters. 1. First level: A Counting Bloom Filter: fast and low memory consumption but innacurate (False Positives). 2. HashMap: Exact Counter for Ips. CBFs are fast and use litle memory but they are also innaccurate. They have False Positives meaning some IPs would be banned only becuase there are bucket colissions (IPs sharing the same counter). To avoid banning IPs incorrectly we decided to introduce a second counter, which is a HashMap that counts error precisely. IPs are only banned when this counter reaches the limit (over 10 errors). We keep the CBF as a first level filter. It's a fast-check IP filter without affecting tracker's performance. When the IP is banned according to the first filter we double-check in the HashMap. CBF is faster than checking always for banned IPs against the HashMap. This solution should be good if the number of IPs is low. We have to find another solution anyway for IPv6 where is cheaper to own a range of IPs.
29e506d feat: use default aquatic udp port for benchmarking (Jose Celano) 10f9bda feat: [#1096] ban client IP when exceeds connection ID errors limit (Jose Celano) 87401e8 chore(deps): add dependency bloom (Jose Celano) Pull request description: This PR uses a [Counting Bloom Filter](https://docs.rs/bloom/latest/bloom/#counting-bloom-filters) to count IP sending UDP requests with wrong connection IDs. The IP is banned when the tracker receives more than 10 requests from a given IP with a bad connection ID. Bad connection IDs are cookie values that have expired or are from the future. With the current `CountingBloomFilter` configuration (0.01 rate), we would have a **False Positive** for every 10000 IPs, meaning when two IPs have a collision, and one of them is misbehaving, the other one would also be banned. To avoid false positives, we introduced a second counter with a HashMap. This consumes more memory, but it's reset every 120 seconds. The HashMap is only used when the CBF detects a potential bad client. ### TODO - [x] Straightforward implementation - [x] Benchmarking (how much this new feature affects performance) - [x] Add an E2E test - [x] Remove IPs from the banned list every hour - [x] Review filter settings `CountingBloomFilter::with_rate(4, 0.01, 100)` - [x] Refactor: extract the IP ban service from the main loop - [x] Benchmarking after extracting `BanService` ### Questions - [ ] Should we add a configuration option for the maximum number of errors allowed? ### Future PR - [ ] Add a metric to tracker stats for the number of banned IPs. - [ ] Ban subnets ACKs for top commit: josecelano: ACK 29e506d Tree-SHA512: 004959e00eced1b9c1de39de81f8f9f1d8da1b46f5ee38b3b0679e77cc40448525ac197145ace5dd62017c39a72f7175b06f556e6a7eb8cffbdc57f67052a856
Uh oh!
There was an error while loading. Please reload this page.
From: torrust/torrust-demo#14
Relates to: #1033
We are having problems with the tracker demo. The logs contain many errors validating the connection ID. It looks like the client doesn't implement the protocol correctly because it's not sending the connection ID received from the
connect
request. Since the client is making many requests, this produces a lot of new ERROR records in the logs, ultimately depressing tracker performance.Solution Overview:
Hierarchical Counting Bloom Filters:
Individual IP Layer: Use one CBF to track individual IP addresses. This will allow for fine-grained detection of misbehaving IPs.
Subnet Layer: Implement another level of CBFs where each filter covers a subnet range. This allows for detecting patterns of misbehavior across a subnet without penalizing neighboring IPs inappropriately.
Implementation Details:
Individual IP CBF:
Subnet CBF:
Advantages:
Granularity: This approach gives you both detailed control over individual IPs and the ability to detect broader patterns of misbehavior within subnets.
Performance: CBFs are efficient in terms of memory usage and speed, allowing for quick lookups and updates even with large datasets.
Flexibility: You can adjust the thresholds for individual IPs and subnets separately, allowing for different levels of tolerance based on your policy or observed behavior patterns.
False Positives: While CBFs have a small chance of false positives, by using multiple levels (individual and subnet), you can mitigate the impact. For example, if an IP is flagged at both levels, it's more likely to be a true positive.
Challenges:
Configuration: You need to decide on the size of the CBFs, the number of hash functions, and the error thresholds for both individual IPs and subnets. This requires some experimentation or simulation to find the right balance between false positives, memory usage, and effectiveness.
Complexity: Managing two layers of CBFs introduces additional complexity in terms of implementation and maintenance.
False Positives at Subnet Level: If a subnet contains both misbehaving and well-behaving IPs, the well-behaving ones might suffer from the actions taken against the subnet.
Implementation Steps:
Decide on the Subnet Size: Determine what constitutes a subnet for your purposes (e.g., /24, /16, etc.).
Initialize CBFs:
Error Handling Logic:
Action Protocol:
Decay Mechanism: Implement a decay or aging process for counts to ensure that past behavior doesn't indefinitely affect current interactions unless the behavior persists.
By employing this hierarchical approach with Counting Bloom Filters, you can effectively manage IP-based errors at different levels of granularity, protecting your network's performance while minimizing the impact on innocent IPs.
Originally posted by @da2ce7 in torrust/torrust-demo#14 (comment)
The text was updated successfully, but these errors were encountered: