Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Tenancy #7060

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Multi-Tenancy #7060

wants to merge 1 commit into from

Conversation

mgilham
Copy link

@mgilham mgilham commented Feb 6, 2025

Summary

This PR offers shared-instance multi-tenancy for SigNoz. It is not ready to be merged in its current form, but it is functional, and it meets my company's needs for multi-tenancy in our environment.

I am opening the PR to get feedback from owners and see if this is a direction the project wants to go, and hopefully find alignment before doing more preparation for merging. I'm not picky about or attached to the design. If there are aspects of design you'd like to see changed, I can spend some amount of time to make changes. I'm on #contributing in Slack.

Related Issues / PR's

Design Sketch

The scope of changes include:

  • identifying the tenant associated with a request
  • authorizing and isolating access to telemetry data in ClickHouse for the tenant
  • authorizing and isolating access to user data in sqlite by tenant
Tenant identity
  • We assume there is exactly one tenant associated with a given request for a given user. The tenant is identified by a string.
  • This current implementation assumes the tenant is the Organization.
  • The tenant is plumbed through the system where needed, in some places as a direct function parameter, and in others as a Context value.
  • The tenant is sent through Context to the custom Prometheus ClickHouse metrics reader and applied in the SQL there as well.
  • Tenant is also applied as a key to the query cache lookups.
Telemetry data
  • We define ClickHouse parameterized views to "protect" the base tables. The views take the tenant as a parameter.
    • The views are used as drop-in replacements for the base tables in many places in the query service.
    • Some of the views also take the time window as parameters for performance reasons. In theory this might not be necessary as the ClickHouse query optimizer could push the predicates down, but when I checked for that I was not seeing reassuring query plans.
    • As an alternative to parameterized views, we could include the tenant filter in the dynamic SQL built inside the query service. There are pros and cons to both approaches we could discuss further (security, coupling, flexibility, performance, etc.)
  • The views can be defined differently on the back end to implement tenancy in different ways. This may or may not be an advantage depending on the direction the project wants to take.
  • In some cases we add columns in or on top of the existing base tables to materialize the tenant identifier.
  • As an alternative, we could simplify the design and view logic by establishing a common standard column for all relevant tables. This would require more table changes, but the simplicity could be worthwhile. For the telemetry intake, I would recommend considering the otel service.namespace, which would help in de-duplicating service.name across tenants.
User data in sqllite
  • Filters are added to restrict all CRUD operations to the organization of the user making the request.

Open Items

  • Custom user ClickHouse queries are not yet isolated.
    • This is mostly implemented in this PR, but the version of ClickHouse would need to upgrade to support SQL security settings to allow users to query the views without access to the base tables.
    • Custom queries are parsed to ensure the tenant parameter to the views match the authenticated tenant for the request. This parsing could be improved upon for generality, or discarded in favor of an alternate solution, e.g. custom per-tenant views.
  • Tests still need to be updated
  • Infra metrics
  • SigNoz internal telemetry for multiple orgs?
  • New Role for a meta-admin that can access data for all tenants?

Important

Adds multi-tenancy support to SigNoz by introducing tenant-specific data isolation and access control mechanisms.

  • Behavior:
    • Introduces multi-tenancy by associating requests with a tenant, identified by a string, assumed to be the Organization.
    • Tenant information is passed through context and used in SQL queries for data isolation.
    • Adds tenant-specific views in ClickHouse for telemetry data isolation.
  • Database:
    • Modifies SQL queries in rules/db.go and app/dashboards/model.go to include tenant filtering using common.TenantSqlPredicate().
    • Updates CreateRuleTx, EditRuleTx, and DeleteRuleTx in rules/db.go to handle tenant-specific operations.
  • Configuration:
    • Adds TenantDSN to ClickHouseConfig in config.go for tenant-specific ClickHouse connections.
    • Deprecates TenantClickHouseUrl environment variable.
  • Context Management:
    • Adds ContextTenantKey in constants.go for tenant identification in context.
    • Updates context handling in server.go and rules/manager.go to include tenant information.
  • Misc:
    • Updates provider.go to handle tenant-specific ClickHouse connections.
    • Adds tenant handling in PrepareTracesQuery in traces/v4/query_builder.go.

This description was created by Ellipsis for 1931180. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 1931180 in 2 minutes and 20 seconds

More details
  • Looked at 2796 lines of code in 47 files
  • Skipped 0 files when reviewing.
  • Skipped posting 9 drafted comments based on config settings.
1. pkg/rules/db.go:156
  • Draft comment:
    Good: SQL statements now include a tenant filter by appending common.TenantSqlPredicate(ctx). This is important for multi-tenancy isolation.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
2. pkg/rules/manager.go:344
  • Draft comment:
    Minor: The tenant is retrieved from context in EditRule; ensure that the tenant value is set correctly upstream in auth middleware.
  • Reason this comment was not posted:
    Confidence changes required: 30% <= threshold 50%
    None
3. pkg/telemetrystore/telemetrystore.go:90
  • Draft comment:
    Good: The provider wraps query execution with hook calls (WrapBeforeQuery and WrapAfterQuery) which is a clear design for extensible behavior.
  • Reason this comment was not posted:
    Confidence changes required: 0% <= threshold 50%
    None
4. pkg/signoz/config.go:108
  • Draft comment:
    Suggestion: Consider logging a warning when deprecated environment variables are used, so users can transition to new ones.
  • Reason this comment was not posted:
    Confidence changes required: 30% <= threshold 50%
    None
5. pkg/common/user.go:19
  • Draft comment:
    TenantSqlPredicate builds a SQL predicate by interpolating the tenant directly. For safety against SQL injection, consider using parameterized queries or sanitizing the tenant value thoroughly.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. pkg/telemetrystore/clickhousetelemetrystore/provider.go:50
  • Draft comment:
    When TenantDSN is set, the DSN passed to clickhouse.ParseDSN is still config.ClickHouse.DSN. Likely the intended behavior is to parse config.ClickHouse.TenantDSN instead, so the tenant connection uses the correct DSN.
  • Reason this comment was not posted:
    Marked as duplicate.
7. pkg/rules/manager.go:344
  • Draft comment:
    The code directly type-asserts the tenant from the context (ctx.Value(constants.ContextTenantKey).(string)). Consider using the comma-ok idiom to avoid potential panics if the tenant is missing.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. pkg/rules/db.go:229
  • Draft comment:
    Building SQL queries by concatenating strings with tenant predicates (via common.TenantSqlPredicate(ctx)) can be error-prone. Consider using parameterized queries for improved security and maintainability.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. pkg/telemetrystore/telemetrystore.go:20
  • Draft comment:
    The TelemetryStore interface now requires both ClickHouseDB() and TenantClickHouseDB(). Ensure that callers handle the case when TenantClickHouseDB() may be nil.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50%
    None

Workflow ID: wflow_yiUpZ75qVOtTVTRb


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -46,17 +47,38 @@ func New(ctx context.Context, providerSettings factory.ProviderSettings, config
return nil, err
}

var tenantConn driver.Conn
if config.ClickHouse.TenantDSN != "" {
options, err := clickhouse.ParseDSN(config.ClickHouse.DSN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: When TenantDSN is provided, the DSN parsed is still config.ClickHouse.DSN. It should instead use config.ClickHouse.TenantDSN to correctly initialize the tenant connection.

@@ -32,7 +32,8 @@ type ClickHouseQuerySettings struct {
}

type ClickHouseConfig struct {
DSN string `mapstructure:"dsn"`
DSN string `mapstructure:"dsn"`
TenantDSN string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider adding a mapstructure tag for TenantDSN to allow configuring it via config files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant