-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter IPv6 addresses #4
Comments
Thinking on this more, could the IP regex be delegated to stdlib's [7] pry(main)> Resolv::AddressRegex
=> /(?:(?-mix:\A((?x-mi:0
|1(?:[0-9][0-9]?)?
|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?
|[3-9][0-9]?))\.((?x-mi:0
|1(?:[0-9][0-9]?)?
|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?
|[3-9][0-9]?))\.((?x-mi:0
|1(?:[0-9][0-9]?)?
|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?
|[3-9][0-9]?))\.((?x-mi:0
|1(?:[0-9][0-9]?)?
|2(?:[0-4][0-9]?|5[0-5]?|[6-9])?
|[3-9][0-9]?))\z))|(?:(?x-mi:
(?:(?x-mi:\A
(?:[0-9A-Fa-f]{1,4}:){7}
[0-9A-Fa-f]{1,4}
\z)) |
(?:(?x-mi:\A
((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?) ::
((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)
\z)) |
(?:(?x-mi:\A
((?:[0-9A-Fa-f]{1,4}:){6,6})
(\d+)\.(\d+)\.(\d+)\.(\d+)
\z)) |
(?:(?x-mi:\A
((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?) ::
((?:[0-9A-Fa-f]{1,4}:)*)
(\d+)\.(\d+)\.(\d+)\.(\d+)
\z))))/ EDIT: Yes, there is a good reason. The regex is anchored so wouldn't match anything in a logging context. |
Hey @bjeanes, great suggestion. I created an |
Hah! Regexp.new(Resolv::AddressRegex.source.gsub('\A', '\b').gsub('\z', '\b')) ^^ that is exactly what I have just deployed to production already! |
I've deployed this and the slowdown is noticeable in web RPMs too. 🤔
I run on Heroku so my logs go via STDOUT. I do wonder if I'm better off with a solution that filters the logs outside of the core Ruby process, even if backed by this gem. That should, at least in theory, allow for some better use of multiple cores... |
I'm surprised it's noticeable at the RPM level. What difference are you seeing there? |
I'd say around a 10% slowdown, but I can't rule out it being something else
in that deploy. I haven't looked more deeply into it. I can try reverting
and seeing. I'm on Heroku which I think only has an IPv4 stack anyway. It's
probably not important for us, currently, but I'd rather future proof it...
… |
Just ran benchmarks with Ruby 2.7 and latest code: Warming up --------------------------------------
no ipv6 3.126k i/100ms
ipv6 1.938k i/100ms
Calculating -------------------------------------
no ipv6 34.599k (± 3.5%) i/s - 175.056k in 5.065833s
ipv6 19.874k (± 3.4%) i/s - 100.776k in 5.076570s It still appears to reduce throughput significantly, but 20k iterations per second is still pretty fast and most of the time spent in an application is not in logging. Will plan to merge once IPv6 has more adoption. Edit: another approach could be to use a less complex regex if common sources of IPs (like |
Yeah, or use a less complex regex regardless. It probably is better (from GDPR etc standpoint) to accidentally filter non-IPs but catch all actual IPs than to only catch IPs but take a large hit in performance. This could even be a configuration option, that swaps in a more accurate regex when the user opts into the trade-off of slower perf. |
I am looking through the code and noticed the regex only seems to match IPv4 addresses.
Should IPv6 also be matched?
I'm hoping to do this so that I can add a custom scrubber which used IpAnonymizer and lean on the gem-maintained Regex.
Thoughts?
The text was updated successfully, but these errors were encountered: