-
-
Notifications
You must be signed in to change notification settings - Fork 157
[db/backends] Extract InfluxQL base client #732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[db/backends] Extract InfluxQL base client #732
Conversation
📝 WalkthroughWalkthroughThis PR extracts the InfluxQL-compatible TSDB client logic into a new reusable class Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@openwisp_monitoring/db/backends/influx_base.py`:
- Around line 491-507: The block that handles existing "GROUP BY" assumes
re.search(self._group_by_time_tag_regex, query) returns a match and directly
uses matches.group(1), which will raise when the query has GROUP BY with tags
only (no time()); update the logic in the method containing this block to check
whether matches is not None before calling matches.group(1) and handle the
tag-only case (e.g., skip stripping time(), treat group_by_fields as empty, or
proceed without replacing the time function) so that when strip=True and there's
a GROUP BY without time() the code does not attempt to access group 1; refer to
the variables/methods _group_by_time_tag_regex, matches, group_by_fields and the
surrounding GROUP BY handling branch to locate where to add the guard and
alternative flow.
- Around line 120-166: The dbs cached_property currently reads udp_port from
self.options and performs arithmetic without validating type, causing a
TypeError if a string is provided; update the dbs logic (and any other place
reading udp_port such as use_udp if relevant) to normalize and validate udp_port
once by converting self.options.get("udp_port", 8089) to int (e.g., port =
int(self.options.get("udp_port", 8089))) and explicitly catch/raise a clear
ValueError on invalid input so the two InfluxDBClient instances use the
validated integer port (use that port and port + 1 for the short client) and
fail fast with a helpful message if conversion fails.
- Around line 295-381: The read() and _get_where_query() code interpolate
identifiers and string literals directly into InfluxQL; fix by adding a small
escaping helper (e.g., _escape_influx_value or reuse/extend _clean_value) that
backslash-escapes single quotes in string literals and a helper to quote/escape
identifiers (measurement/retention/field/tag keys) with double quotes; then
update read(): wrap retention policy and measurement in the identifier-quoting
helper when building from_clause, ensure field names inserted into fields list
are quoted with the identifier helper (while keeping DISTINCT()/COUNT() wrappers
intact), and pass tag values through the string-escaping helper when
constructing conditions; also update _get_where_query() to use the same helpers
for tag values and identifiers so all interpolated values are safely escaped.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
openwisp_monitoring/db/backends/influx_base.pyopenwisp_monitoring/db/backends/influxdb/client.py
🧰 Additional context used
🧬 Code graph analysis (2)
openwisp_monitoring/db/backends/influxdb/client.py (1)
openwisp_monitoring/db/backends/influx_base.py (1)
BaseInfluxDatabaseClient(31-584)
openwisp_monitoring/db/backends/influx_base.py (2)
openwisp_monitoring/utils.py (1)
retry(17-35)openwisp_monitoring/db/exceptions.py (1)
TimeseriesWriteException(1-2)
🪛 Ruff (0.14.13)
openwisp_monitoring/db/backends/influx_base.py
42-70: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
71-71: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
73-82: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
170-173: Avoid specifying long messages outside the exception class
(TRY003)
240-240: Do not catch blind exception: Exception
(BLE001)
247-247: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
319-321: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
319-321: Avoid specifying long messages outside the exception class
(TRY003)
334-336: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
334-336: Avoid specifying long messages outside the exception class
(TRY003)
349-349: Possible SQL injection vector through string-based query construction
(S608)
374-377: Avoid specifying long messages outside the exception class
(TRY003)
389-389: Loop control variable measurement not used within loop body
Rename unused measurement to _measurement
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Python==3.12 | django~=5.2.0
- GitHub Check: Python==3.13 | django~=5.2.0
- GitHub Check: Python==3.12 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=4.2.0
- GitHub Check: Python==3.10 | django~=4.2.0
- GitHub Check: Python==3.13 | django~=5.1.0
- GitHub Check: Python==3.11 | django~=4.2.0
- GitHub Check: Python==3.10 | django~=5.2.0
- GitHub Check: Python==3.11 | django~=5.2.0
- GitHub Check: Python==3.11 | django~=5.1.0
- GitHub Check: Python==3.10 | django~=5.1.0
🔇 Additional comments (9)
openwisp_monitoring/db/backends/influx_base.py (8)
42-105: Config override precedence is clear.
The explicitdb_name/options→config→TIMESERIES_DBfallback order is easy to follow and preserves backward compatibility.
106-119: Create/drop database wrappers look good.
The retry-wrapped helpers are simple and consistent.
167-294: Core query/write helpers look consistent.
Operator validation, retention-policy handling, and batching logic are cohesive and readable.
383-421: List/retention/delete helpers look solid.
Result reshaping and retention-policy helpers are straightforward.
424-446: Validation + param normalization looks fine.
The forbidden-word check and parameter cleanup are clear and contained.
456-474: Chart query assembly is straightforward.
The composition of fields, grouping, and timezone suffixing is easy to follow.
517-553: Field substitution helpers look good.
The placeholder expansion and field aliasing logic are clean.
554-583: Top-fields selection looks fine.
The aggregation and ordering steps are straightforward.openwisp_monitoring/db/backends/influxdb/client.py (1)
1-5: Thin subclass refactor looks good.
The backend stays minimal while delegating logic to the shared base.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| @cached_property | ||
| def db(self): | ||
| """Returns an ``InfluxDBClient`` instance.""" | ||
| return self.dbs["default"] | ||
|
|
||
| @cached_property | ||
| def dbs(self): | ||
| dbs = { | ||
| "default": InfluxDBClient( | ||
| self.host, | ||
| self.port, | ||
| self.user, | ||
| self.password, | ||
| self.db_name, | ||
| use_udp=self.options.get("udp_writes", False), | ||
| udp_port=self.options.get("udp_port", 8089), | ||
| ), | ||
| } | ||
| if self.options.get("udp_writes", False): | ||
| # When using UDP, InfluxDB allows only using one retention policy | ||
| # per port. Therefore, we need to have different instances of | ||
| # InfluxDBClient. | ||
| dbs["short"] = InfluxDBClient( | ||
| self.host, | ||
| self.port, | ||
| self.user, | ||
| self.password, | ||
| self.db_name, | ||
| use_udp=self.options.get("udp_writes", False), | ||
| udp_port=self.options.get("udp_port", 8089) + 1, | ||
| ) | ||
| dbs["__all__"] = InfluxDBClient( | ||
| self.host, | ||
| self.port, | ||
| self.user, | ||
| self.password, | ||
| self.db_name, | ||
| ) | ||
| else: | ||
| dbs["short"] = dbs["default"] | ||
| dbs["__all__"] = dbs["default"] | ||
| return dbs | ||
|
|
||
| @cached_property | ||
| def use_udp(self): | ||
| return self.options.get("udp_writes", False) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate udp_port before arithmetic.
When udp_writes is enabled, udp_port is incremented (+ 1). If overrides pass it as a string (common from env/config), this will raise a TypeError at runtime. Normalize to int once and fail fast with a clearer error.
🛠️ Suggested fix
`@cached_property`
def dbs(self):
+ udp_port = self.options.get("udp_port", 8089)
+ try:
+ udp_port = int(udp_port)
+ except (TypeError, ValueError):
+ raise ValueError("udp_port must be an integer")
dbs = {
"default": InfluxDBClient(
self.host,
self.port,
self.user,
self.password,
self.db_name,
use_udp=self.options.get("udp_writes", False),
- udp_port=self.options.get("udp_port", 8089),
+ udp_port=udp_port,
),
}
if self.options.get("udp_writes", False):
# ...
dbs["short"] = InfluxDBClient(
self.host,
self.port,
self.user,
self.password,
self.db_name,
use_udp=self.options.get("udp_writes", False),
- udp_port=self.options.get("udp_port", 8089) + 1,
+ udp_port=udp_port + 1,
)🤖 Prompt for AI Agents
In `@openwisp_monitoring/db/backends/influx_base.py` around lines 120 - 166, The
dbs cached_property currently reads udp_port from self.options and performs
arithmetic without validating type, causing a TypeError if a string is provided;
update the dbs logic (and any other place reading udp_port such as use_udp if
relevant) to normalize and validate udp_port once by converting
self.options.get("udp_port", 8089) to int (e.g., port =
int(self.options.get("udp_port", 8089))) and explicitly catch/raise a clear
ValueError on invalid input so the two InfluxDBClient instances use the
validated integer port (use that port and port + 1 for the short client) and
fail fast with a helpful message if conversion fails.
| def read(self, key, fields, tags, **kwargs): | ||
| extra_fields = kwargs.get("extra_fields") | ||
| since = kwargs.get("since") | ||
| order = kwargs.get("order") | ||
| limit = kwargs.get("limit") | ||
| distinct_fields = kwargs.get("distinct_fields", []) | ||
| count_fields = kwargs.get("count_fields", []) | ||
| rp = kwargs.get("retention_policy") | ||
| where = kwargs.get("where", []) | ||
| precision = kwargs.get("precision", "s") | ||
|
|
||
| # Ensure fields is a list (in case it's passed as a string) | ||
| if isinstance(fields, str): | ||
| fields = [fields] | ||
| else: | ||
| fields = list(fields) | ||
|
|
||
| # Apply DISTINCT to fields | ||
| if distinct_fields: | ||
| for field in distinct_fields: | ||
| try: | ||
| field_index = fields.index(field) | ||
| fields[field_index] = f"DISTINCT({field})" | ||
| except ValueError: | ||
| raise self.client_error( | ||
| f'Cannot perform DISTINCT on "{field}" as it is not present in the fields list.' | ||
| ) | ||
|
|
||
| # Ensure COUNT(DISTINCT) if both DISTINCT and COUNT are applied | ||
| if field in count_fields: | ||
| count_fields[count_fields.index(field)] = f"DISTINCT({field})" | ||
|
|
||
| # Apply COUNT to fields | ||
| if count_fields: | ||
| for field in count_fields: | ||
| try: | ||
| field_index = fields.index(field) | ||
| fields[field_index] = f"COUNT({field})" | ||
| except ValueError: | ||
| raise self.client_error( | ||
| f'Cannot perform COUNT on "{field}" as it is not present in the fields list.' | ||
| ) | ||
|
|
||
| # Handle extra fields and ensure proper formatting | ||
| if extra_fields and extra_fields != "*": | ||
| fields = fields + extra_fields | ||
| elif extra_fields == "*": | ||
| fields = ["*"] | ||
|
|
||
| # Construct SELECT clause | ||
| fields_clause = ", ".join(fields) | ||
|
|
||
| # Construct the FROM clause with retention policy if provided | ||
| from_clause = f"{rp}.{key}" if rp else key | ||
| q = f"SELECT {fields_clause} FROM {from_clause}" | ||
|
|
||
| conditions = [] | ||
| for field, op, value in where: | ||
| op = self._clean_operator(op) | ||
| value = self._clean_value(value) | ||
| conditions.append(f"{field} {op} {value}") | ||
| # Add conditions (time and tags) | ||
| if since: | ||
| timestamp = self._get_timestamp(since) | ||
| conditions.append(f"time >= '{timestamp}'") | ||
| if tags: | ||
| conditions.append( | ||
| " AND ".join(["{0} = '{1}'".format(*tag) for tag in tags.items()]) | ||
| ) | ||
| if conditions: | ||
| conditions = "WHERE %s" % " AND ".join(conditions) | ||
| q = f"{q} {conditions}" | ||
| if order: | ||
| # InfluxDB only allows ordering results by time | ||
| if order == "time": | ||
| order = "time ASC" | ||
| elif order == "-time": | ||
| order = "time DESC" | ||
| else: | ||
| raise self.client_error( | ||
| f'Invalid order "{order}" passed.\nYou may pass "time" / "-time" to get ' | ||
| "result sorted in ascending /descending order respectively." | ||
| ) | ||
| q = f"{q} ORDER BY {order}" | ||
| if limit: | ||
| q = f"{q} LIMIT {limit}" | ||
| return list(self.query(q, precision=precision).get_points()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find call sites of .read() method (excluding tests)
echo "=== Call sites of .read() method ==="
rg -nP -C3 '\.read\s*\(' --type=py -g '!**/tests/**' | head -100
echo ""
echo "=== Examining the influx_base.py file structure ==="
wc -l openwisp_monitoring/db/backends/influx_base.py
echo ""
echo "=== Lines around 448-455 (likely _get_where_query) ==="
sed -n '445,460p' openwisp_monitoring/db/backends/influx_base.pyRepository: openwisp/openwisp-monitoring
Length of output: 9021
🏁 Script executed:
#!/bin/bash
# Check if there's input validation/escaping functions
echo "=== Searching for escape/validation functions ==="
rg -nP 'def.*escape|def.*clean|def.*validate|def.*sanitize' openwisp_monitoring/db/backends/influx_base.py
echo ""
echo "=== Full read method to understand parameter handling ==="
sed -n '295,381p' openwisp_monitoring/db/backends/influx_base.py | head -100Repository: openwisp/openwisp-monitoring
Length of output: 3810
🏁 Script executed:
#!/bin/bash
# Look for _get_where_query method
echo "=== Searching for _get_where_query method ==="
rg -nP 'def _get_where_query' openwisp_monitoring/db/backends/influx_base.py -A 10
echo ""
echo "=== Check for any cleaning/escaping of values ==="
rg -nP '_clean_' openwisp_monitoring/db/backends/influx_base.pyRepository: openwisp/openwisp-monitoring
Length of output: 781
🏁 Script executed:
#!/bin/bash
# Get the implementation of _clean_value and _clean_operator
echo "=== _clean_value and _clean_operator implementations ==="
sed -n '167,195p' openwisp_monitoring/db/backends/influx_base.py
echo ""
echo "=== views.py context around chart.read() call ==="
sed -n '85,110p' openwisp_monitoring/views.py
echo ""
echo "=== How tags are passed in models.py ==="
sed -n '495,510p' openwisp_monitoring/monitoring/base/models.pyRepository: openwisp/openwisp-monitoring
Length of output: 2984
🏁 Script executed:
#!/bin/bash
# Check for any InfluxQL escaping or validation utilities
echo "=== Search for escaping functions in the codebase ==="
rg -i 'escape|quote' openwisp_monitoring/db/backends/influx_base.py
echo ""
echo "=== Search for input validation patterns ==="
sed -n '424,445p' openwisp_monitoring/db/backends/influx_base.pyRepository: openwisp/openwisp-monitoring
Length of output: 1047
🏁 Script executed:
#!/bin/bash
# Check where metric.key and tags come from - are they user-controlled?
echo "=== Check what metric.key and tags are (from models) ==="
rg -nP 'self\.key\s*=' openwisp_monitoring/monitoring/base/models.py | head -20
echo ""
echo "=== Check if tags are from user input ==="
sed -n '455,580p' openwisp_monitoring/db/backends/influx_base.py | grep -A5 -B5 'tags'Repository: openwisp/openwisp-monitoring
Length of output: 310
🏁 Script executed:
#!/bin/bash
# Check where metric.tags come from
echo "=== Check metric.tags definition ==="
rg -nP 'def tags|self\.tags' openwisp_monitoring/monitoring/base/models.py | head -30
echo ""
echo "=== Check Metric model for tag handling ==="
sed -n '1,100p' openwisp_monitoring/monitoring/base/models.py | grep -A10 'class.*Metric'
echo ""
echo "=== Check if views.py filters are user-controlled ==="
sed -n '1,100p' openwisp_monitoring/views.py | head -50Repository: openwisp/openwisp-monitoring
Length of output: 2577
🏁 Script executed:
#!/bin/bash
# Look for where inputs come from - check query parameters
echo "=== Check view parameters and inputs ==="
rg -nP 'request\.(GET|POST|query_params)' openwisp_monitoring/views.py -B2 -A2
echo ""
echo "=== Check if key/metric name can be user input ==="
rg -nP 'key.*=.*request' openwisp_monitoring/ -r py
echo ""
echo "=== Check _OPERATORS to understand what's validated ==="
rg -nP '_OPERATORS\s*=' openwisp_monitoring/db/backends/influx_base.py -A10Repository: openwisp/openwisp-monitoring
Length of output: 1489
🏁 Script executed:
#!/bin/bash
# Check the full path from view to database - is there any input sanitization?
echo "=== Check what parameters are passed through chart.read ==="
sed -n '456,500p' openwisp_monitoring/db/backends/influx_base.py
echo ""
echo "=== Check if chart configuration is user-defined ==="
rg -nP 'class.*Chart' openwisp_monitoring/monitoring/base/models.py | head -5Repository: openwisp/openwisp-monitoring
Length of output: 1983
🏁 Script executed:
#!/bin/bash
# Check where metric.tags comes from - implementation details
echo "=== Full tags property implementation ==="
sed -n '260,280p' openwisp_monitoring/monitoring/base/models.py
echo ""
echo "=== Check if Chart configuration is user-editable ==="
rg -nP 'class AbstractChart' openwisp_monitoring/monitoring/base/models.py -A 30 | head -50
echo ""
echo "=== Check if tags can be user input ==="
rg -nP 'tags.*=' openwisp_monitoring/monitoring/base/models.py | head -20Repository: openwisp/openwisp-monitoring
Length of output: 2566
🏁 Script executed:
#!/bin/bash
# Check for any API endpoints that allow custom keys or metric names
echo "=== Check device API for chart/metric endpoints ==="
sed -n '260,280p' openwisp_monitoring/device/api/views.py
echo ""
echo "=== Look for any user-controllable metric creation ==="
rg -nP 'key.*request|tags.*request' openwisp_monitoring/ -t pyRepository: openwisp/openwisp-monitoring
Length of output: 1187
🏁 Script executed:
#!/bin/bash
# Let's trace what happens with the 'where' parameter in read()
echo "=== Check how 'where' parameter is passed to read() ==="
rg -nP '\.read\(' openwisp_monitoring/monitoring/base/models.py -B3 -A3
echo ""
echo "=== Check chart.read() to see where parameters come from ==="
rg -nP 'def read' openwisp_monitoring/monitoring/base/models.py -A 5 | head -40Repository: openwisp/openwisp-monitoring
Length of output: 1864
🏁 Script executed:
#!/bin/bash
# Final check: look at the actual SQL queries being constructed
echo "=== Check if any direct query injection is possible ==="
rg -nP 'where.*=' openwisp_monitoring/monitoring/base/models.py | head -15
echo ""
echo "=== Check the key constraint more carefully ==="
sed -n '60,120p' openwisp_monitoring/monitoring/base/models.pyRepository: openwisp/openwisp-monitoring
Length of output: 2412
🏁 Script executed:
#!/bin/bash
# Check where retention_policy comes from in read() calls
echo "=== Check retention_policy usage ==="
rg -nP 'retention_policy' openwisp_monitoring/monitoring/base/models.py -B2 -A2 | head -40
echo ""
echo "=== Check if retention_policy is user input ==="
rg -nP 'retention_policy.*request' openwisp_monitoring/ -t py
echo ""
echo "=== Final check: look at _clean_value to confirm it doesn't escape quotes ==="
sed -n '176,186p' openwisp_monitoring/db/backends/influx_base.pyRepository: openwisp/openwisp-monitoring
Length of output: 2173
🏁 Script executed:
#!/bin/bash
# Check if there's any quote escaping mechanism anywhere
echo "=== Search for any quote escaping ==="
rg -i "escape.*quote|quote.*escape|replace.*quote|sanitize" openwisp_monitoring/db/backends/influx_base.py
echo ""
echo "=== Check if field names from where clause are safe ==="
sed -n '1000,1010p' openwisp_monitoring/monitoring/base/models.py
echo ""
echo "=== Summary check: are tag values ever directly from user input? ==="
rg -nP 'tags\s*=.*request' openwisp_monitoring/ -t pyRepository: openwisp/openwisp-monitoring
Length of output: 691
🏁 Script executed:
#!/bin/bash
# Check if tag values can contain special characters by examining how they're set
echo "=== Check how main_tags and extra_tags are set ==="
rg -nP 'main_tags\s*=' openwisp_monitoring/ -t py -B2 -A2 | head -50
echo ""
echo "=== Check if there's admin interface for editing tags ==="
fd -t f admin.py openwisp_monitoring/ -x grep -l 'main_tags\|extra_tags' {}
echo ""
echo "=== Check API endpoints for metric creation/update ==="
rg -nP 'class.*Metric.*ViewSet|class.*Metric.*APIView' openwisp_monitoring/ -t pyRepository: openwisp/openwisp-monitoring
Length of output: 4064
🏁 Script executed:
#!/bin/bash
# Verify if JSON values could contain quotes
echo "=== Verify that JSONField values are not validated against special chars ==="
python3 << 'EOF'
import json
# JSONField stores arbitrary JSON, which can include quotes in string values
test_tags = {
"device": "router'1", # single quote
"interface": 'eth"0', # double quote
"location": "'; DROP TABLE metrics; --"
}
print("Example dangerous tag values JSON can hold:")
print(json.dumps(test_tags, indent=2))
print("\nWhen interpolated without escaping:")
for k, v in test_tags.items():
print(f"{k} = '{v}'") # This would create invalid InfluxQL
EOF
echo ""
echo "=== Check the actual vulnerability in _get_where_query ==="
sed -n '448,454p' openwisp_monitoring/db/backends/influx_base.pyRepository: openwisp/openwisp-monitoring
Length of output: 158
🏁 Script executed:
#!/bin/bash
# Check if there's an API to modify tags
echo "=== Check for API endpoints that could modify metrics ==="
fd -t f 'serializers.py' openwisp_monitoring/ -x grep -l 'Metric'
echo ""
echo "=== Look for admin.py that exposes metric editing ==="
fd -t f 'admin.py' openwisp_monitoring/ | head -5
echo ""
echo "=== Final verification: does _clean_value escape quotes? ==="
sed -n '167,186p' openwisp_monitoring/db/backends/influx_base.pyRepository: openwisp/openwisp-monitoring
Length of output: 1191
🏁 Script executed:
#!/bin/bash
# Comprehensive analysis of the vulnerability
python3 << 'EOF'
# Analyze the vulnerability
print("=== VULNERABILITY ANALYSIS ===\n")
print("1. INTERPOLATION WITHOUT ESCAPING:")
print(" Line 348: f\"{rp}.{key}\" - rp and key directly interpolated")
print(" Line 366: \"{0} = '{1}'\".format(*tag) - tag values directly interpolated")
print(" Line 453: f\"{field} = '{item}'\" - items directly interpolated")
print()
print("2. _clean_value() DOES NOT ESCAPE:")
print(" Only handles datetime conversion, returns non-datetime values as-is")
print(" Does NOT add quotes or escape special characters")
print()
print("3. _clean_operator() ONLY VALIDATES AGAINST ALLOWLIST:")
print(" Validates from _OPERATORS list")
print(" Does NOT escape field names")
print()
print("4. TAG SOURCES:")
print(" main_tags: JSONField from database model")
print(" extra_tags: JSONField from database model")
print(" object_id: CharField from database")
print(" These can theoretically contain single quotes if set via admin or API")
print()
print("5. EXAMPLE INJECTION:")
tag_with_quote = "device'; DROP TABLE *.* FROM *; --"
dangerous_query = f"SELECT value FROM traffic WHERE device = '{tag_with_quote}'"
print(f" If tag contains: {tag_with_quote}")
print(f" Query becomes: {dangerous_query}")
print(f" This breaks the InfluxQL query syntax")
print()
print("6. RISK LEVEL:")
print(" - Current call sites appear safe (metrics/tags from database)")
print(" - But NO escaping means future code changes could introduce vulnerability")
print(" - Defense-in-depth: code should always escape values")
EOFRepository: openwisp/openwisp-monitoring
Length of output: 1266
Add escaping for InfluxQL field/tag values and retention policy names.
The read() method (lines 295–381) and _get_where_query() method (lines 448–454) interpolate measurement names, field names, tag values, and retention policy names directly into InfluxQL without escaping. While _clean_value() validates operators and converts datetimes, it does not escape special characters like single quotes. Tag values originate from JSONField columns (main_tags, extra_tags) which can store arbitrary strings, and the admin interface allows modifying these values.
Even though current call sites retrieve metrics from the database (appearing trusted), the absence of escaping violates defense-in-depth principles. If tag values contain single quotes (e.g., via admin edits), queries will break. To prevent future vulnerabilities, apply proper InfluxQL escaping: backslash-escape single quotes in tag/field values, or use parameterized query methods if the InfluxDB client supports them.
Also applies to: lines 448–454 (_get_where_query())
🧰 Tools
🪛 Ruff (0.14.13)
319-321: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
319-321: Avoid specifying long messages outside the exception class
(TRY003)
334-336: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
334-336: Avoid specifying long messages outside the exception class
(TRY003)
349-349: Possible SQL injection vector through string-based query construction
(S608)
374-377: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@openwisp_monitoring/db/backends/influx_base.py` around lines 295 - 381, The
read() and _get_where_query() code interpolate identifiers and string literals
directly into InfluxQL; fix by adding a small escaping helper (e.g.,
_escape_influx_value or reuse/extend _clean_value) that backslash-escapes single
quotes in string literals and a helper to quote/escape identifiers
(measurement/retention/field/tag keys) with double quotes; then update read():
wrap retention policy and measurement in the identifier-quoting helper when
building from_clause, ensure field names inserted into fields list are quoted
with the identifier helper (while keeping DISTINCT()/COUNT() wrappers intact),
and pass tag values through the string-escaping helper when constructing
conditions; also update _get_where_query() to use the same helpers for tag
values and identifiers so all interpolated values are safely escaped.
| if "GROUP BY" not in query.upper(): | ||
| query = f"{query} {group_by}" | ||
| else: | ||
| # The query could have GROUP BY clause for a TAG | ||
| if group_by: | ||
| # The query already contains "GROUP BY", therefore | ||
| # we remove it from the "group_by" to avoid duplicating | ||
| # "GROUP BY" | ||
| group_by = group_by.replace("GROUP BY ", "") | ||
| # We only need to substitute the time function. | ||
| # The resulting query would be "GROUP BY time(<group_by>), <tag>" | ||
| query = re.sub(self._time_regex, group_by, query) | ||
| else: | ||
| # The query should not include the "GROUP by time()" | ||
| matches = re.search(self._group_by_time_tag_regex, query) | ||
| group_by_fields = matches.group(1) | ||
| if len(group_by_fields.split(",")) > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against GROUP BY without time() when stripping.
When strip=True and the query has GROUP BY without a time() clause, re.search(...) returns None, and matches.group(1) will raise. A small guard avoids runtime errors for tag-only grouping.
🛡️ Suggested guard
- matches = re.search(self._group_by_time_tag_regex, query)
- group_by_fields = matches.group(1)
+ matches = re.search(self._group_by_time_tag_regex, query)
+ if not matches:
+ return query
+ group_by_fields = matches.group(1)🤖 Prompt for AI Agents
In `@openwisp_monitoring/db/backends/influx_base.py` around lines 491 - 507, The
block that handles existing "GROUP BY" assumes
re.search(self._group_by_time_tag_regex, query) returns a match and directly
uses matches.group(1), which will raise when the query has GROUP BY with tags
only (no time()); update the logic in the method containing this block to check
whether matches is not None before calling matches.group(1) and handle the
tag-only case (e.g., skip stripping time(), treat group_by_fields as empty, or
proceed without replacing the time function) so that when strip=True and there's
a GROUP BY without time() the code does not attempt to access group 1; refer to
the variables/methods _group_by_time_tag_regex, matches, group_by_fields and the
surrounding GROUP BY handling branch to locate where to add the guard and
alternative flow.
Move the InfluxQL DatabaseClient implementation into a shared base class and make the InfluxDB backend client a thin subclass. This refactor paves the way for adding an InfluxDB 2.x backend in InfluxQL compatibility mode without duplicating the client implementation. No behavior changes expected; tests and ./run-qa-checks pass.
8594af9 to
5db9f66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@openwisp_monitoring/db/backends/influx_base.py`:
- Around line 92-128: The dbs cached_property currently reads configuration from
the global TIMESERIES_DB causing instance-level overrides to be ignored; update
the dbs method to use instance attributes (e.g., self.host, self.port,
self.user, self.password, self.options) instead of TIMESERIES_DB and derive
udp_writes/udp_port from self.options, while still constructing InfluxDBClient
instances (default, short, __all__) and using self.db_name so per-instance
configs passed via __init__ take effect.
- Around line 68-71: Update the InfluxBase backend to accept instance-level
overrides by changing the __init__ signature to __init__(self, db_name=None,
options=None, **config) and store instance attributes (e.g., self.host,
self.port, self.user, self.password, self.options) defaulting to TIMESERIES_DB
values when not provided; then modify the dbs property (property name: dbs) to
read from these instance attributes (self.host, self.port, self.user,
self.password, self.options) instead of directly accessing the global
TIMESERIES_DB so that per-instance configuration passed via __init__ is honored.
♻️ Duplicate comments (3)
openwisp_monitoring/db/backends/influx_base.py (3)
327-330: Tag values interpolated without escaping.Tag values from the
tagsdict are interpolated directly into InfluxQL using"{0} = '{1}'".format(*tag). If a tag value contains a single quote, this will break the query or potentially allow injection. This concern was raised in a prior review.
415-421:_get_where_queryalso lacks escaping foritemvalues.Similar to the
read()method, values initemsare interpolated directly without escaping single quotes, which could break queries or allow injection if values contain special characters.
470-481: Guard againstGROUP BYwithouttime()when stripping.When
strip=Trueand the query hasGROUP BYwithout atime()clause,re.search(...)returnsNone, andmatches.group(1)will raiseAttributeError. This was flagged in a prior review.
🧹 Nitpick comments (3)
openwisp_monitoring/db/backends/influx_base.py (3)
207-214: Chain exception for better traceability.When re-raising
TimeseriesWriteException, chain it to preserve the original stack trace.♻️ Suggested fix
- raise TimeseriesWriteException + raise TimeseriesWriteException from exception
354-357: Rename unused loop variablemeasurement.The
measurementvariable from tuple unpacking is not used in the loop body.♻️ Suggested fix
- for (measurement, tag), group_points in result.items(): + for (_measurement, tag), group_points in result.items():
282-303: Chain exceptions when re-raising in DISTINCT/COUNT handling.When catching
ValueErrorand raisingself.client_error, chain the exception to preserve context.♻️ Suggested fix
except ValueError: raise self.client_error( f'Cannot perform DISTINCT on "{field}" as it is not present in the fields list.' - ) + ) from NoneApply similarly to the COUNT block at lines 300-303.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
openwisp_monitoring/db/backends/influx_base.pyopenwisp_monitoring/db/backends/influxdb/client.py
🧰 Additional context used
🧬 Code graph analysis (2)
openwisp_monitoring/db/backends/influxdb/client.py (1)
openwisp_monitoring/db/backends/influx_base.py (2)
db(88-90)BaseInfluxDatabaseClient(25-551)
openwisp_monitoring/db/backends/influx_base.py (2)
openwisp_monitoring/utils.py (1)
retry(17-35)openwisp_monitoring/db/exceptions.py (1)
TimeseriesWriteException(1-2)
🪛 Ruff (0.14.13)
openwisp_monitoring/db/backends/influx_base.py
26-54: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
55-55: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
57-66: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
137-140: Avoid specifying long messages outside the exception class
(TRY003)
207-207: Do not catch blind exception: Exception
(BLE001)
214-214: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
286-288: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
286-288: Avoid specifying long messages outside the exception class
(TRY003)
301-303: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
301-303: Avoid specifying long messages outside the exception class
(TRY003)
316-316: Possible SQL injection vector through string-based query construction
(S608)
341-344: Avoid specifying long messages outside the exception class
(TRY003)
356-356: Loop control variable measurement not used within loop body
Rename unused measurement to _measurement
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Python==3.13 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=5.2.0
- GitHub Check: Python==3.10 | django~=4.2.0
- GitHub Check: Python==3.10 | django~=5.1.0
- GitHub Check: Python==3.10 | django~=5.2.0
- GitHub Check: Python==3.11 | django~=4.2.0
- GitHub Check: Python==3.12 | django~=4.2.0
- GitHub Check: Python==3.13 | django~=5.2.0
- GitHub Check: Python==3.11 | django~=5.1.0
- GitHub Check: Python==3.12 | django~=5.1.0
- GitHub Check: Python==3.11 | django~=5.2.0
🔇 Additional comments (4)
openwisp_monitoring/db/backends/influx_base.py (3)
1-23: Imports and module setup look good.Standard library and Django imports are well-organized. The
@retrydecorator import fromopenwisp_monitoring.utilsintegrates with the existing retry infrastructure.
216-260: Write methods are well-structured.The
_get_timestamp,write, andbatch_writemethods properly handle timestamp formatting and organize data points by database and retention policy before writing.
521-550:_get_top_fieldsimplementation is correct.The method properly queries for summed fields, filters out None values, sorts by value, and returns the top N field names with the
sum_prefix removed.openwisp_monitoring/db/backends/influxdb/client.py (1)
1-11: Clean minimal subclass for backward compatibility.The
DatabaseClientclass correctly inherits fromBaseInfluxDatabaseClientand serves as a thin wrapper to maintain backward compatibility with existing imports and the backend loader. The docstring clearly explains its purpose.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| def __init__(self, db_name=None): | ||
| self._db = None | ||
| self.db_name = db_name or TIMESERIES_DB["NAME"] | ||
| self.client_error = InfluxDBClientError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__init__ signature doesn't match stated PR objectives.
The PR description and linked issue #731 explicitly state the goal is to support "instance-level configuration overrides via an __init__(self, db_name=None, options=None, **config) signature." However, the current implementation only accepts db_name.
Combined with the dbs property (lines 92-128) reading directly from the global TIMESERIES_DB, this means instance-level overrides for host, port, user, password, and OPTIONS aren't actually supported yet.
🛠️ Suggested signature to align with PR objectives
- def __init__(self, db_name=None):
- self._db = None
- self.db_name = db_name or TIMESERIES_DB["NAME"]
- self.client_error = InfluxDBClientError
+ def __init__(self, db_name=None, options=None, **config):
+ self._db = None
+ self.db_name = db_name or TIMESERIES_DB["NAME"]
+ self.host = config.get("host") or TIMESERIES_DB["HOST"]
+ self.port = config.get("port") or TIMESERIES_DB["PORT"]
+ self.user = config.get("user") or TIMESERIES_DB["USER"]
+ self.password = config.get("password") or TIMESERIES_DB["PASSWORD"]
+ self.options = options if options is not None else TIMESERIES_DB.get("OPTIONS", {})
+ self.client_error = InfluxDBClientErrorThen update dbs to use self.host, self.port, etc. instead of TIMESERIES_DB[...].
🤖 Prompt for AI Agents
In `@openwisp_monitoring/db/backends/influx_base.py` around lines 68 - 71, Update
the InfluxBase backend to accept instance-level overrides by changing the
__init__ signature to __init__(self, db_name=None, options=None, **config) and
store instance attributes (e.g., self.host, self.port, self.user, self.password,
self.options) defaulting to TIMESERIES_DB values when not provided; then modify
the dbs property (property name: dbs) to read from these instance attributes
(self.host, self.port, self.user, self.password, self.options) instead of
directly accessing the global TIMESERIES_DB so that per-instance configuration
passed via __init__ is honored.
| @cached_property | ||
| def dbs(self): | ||
| dbs = { | ||
| "default": InfluxDBClient( | ||
| TIMESERIES_DB["HOST"], | ||
| TIMESERIES_DB["PORT"], | ||
| TIMESERIES_DB["USER"], | ||
| TIMESERIES_DB["PASSWORD"], | ||
| self.db_name, | ||
| use_udp=TIMESERIES_DB.get("OPTIONS", {}).get("udp_writes", False), | ||
| udp_port=TIMESERIES_DB.get("OPTIONS", {}).get("udp_port", 8089), | ||
| ), | ||
| } | ||
| if TIMESERIES_DB.get("OPTIONS", {}).get("udp_writes", False): | ||
| # When using UDP, InfluxDB allows only using one retention policy | ||
| # per port. Therefore, we need to have different instances of | ||
| # InfluxDBClient. | ||
| dbs["short"] = InfluxDBClient( | ||
| TIMESERIES_DB["HOST"], | ||
| TIMESERIES_DB["PORT"], | ||
| TIMESERIES_DB["USER"], | ||
| TIMESERIES_DB["PASSWORD"], | ||
| self.db_name, | ||
| use_udp=TIMESERIES_DB.get("OPTIONS", {}).get("udp_writes", False), | ||
| udp_port=TIMESERIES_DB.get("OPTIONS", {}).get("udp_port", 8089) + 1, | ||
| ) | ||
| dbs["__all__"] = InfluxDBClient( | ||
| TIMESERIES_DB["HOST"], | ||
| TIMESERIES_DB["PORT"], | ||
| TIMESERIES_DB["USER"], | ||
| TIMESERIES_DB["PASSWORD"], | ||
| self.db_name, | ||
| ) | ||
| else: | ||
| dbs["short"] = dbs["default"] | ||
| dbs["__all__"] = dbs["default"] | ||
| return dbs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbs reads from global state, preventing instance-level overrides.
This property hardcodes reads from TIMESERIES_DB global rather than instance attributes. If __init__ is updated to accept config overrides, this property should use self.host, self.port, self.user, self.password, and self.options instead.
🤖 Prompt for AI Agents
In `@openwisp_monitoring/db/backends/influx_base.py` around lines 92 - 128, The
dbs cached_property currently reads configuration from the global TIMESERIES_DB
causing instance-level overrides to be ignored; update the dbs method to use
instance attributes (e.g., self.host, self.port, self.user, self.password,
self.options) instead of TIMESERIES_DB and derive udp_writes/udp_port from
self.options, while still constructing InfluxDBClient instances (default, short,
__all__) and using self.db_name so per-instance configs passed via __init__ take
effect.
What
Why
The existing client logic is tightly coupled to the global TIMESERIES_DB configuration dict (import-time global) openwisp_monitoring/db/backends/init.py.
That makes future backends (e.g. InfluxDB 2.x in InfluxQL compatibility mode) awkward because they would either:
duplicate the entire client implementation, or rely on mutating the global dict (hard to test and error-prone).
This refactor keeps current behavior but makes the client reusable by allowing instance-level configuration overrides via:
This enables:
Checklist
(refactor-only; existing test suite covers behavior)
Reference to Existing Issue
Closes #731
Description of Changes
This PR is a refactor-only change: it moves the InfluxQL client logic into a shared base class and keeps the InfluxDB backend client as a thin subclass. The goal is to reduce coupling to the global TIMESERIES_DB settings and avoid duplication when introducing an InfluxDB 2.x backend in InfluxQL compatibility mode.
Screenshot