-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
File source mongodb source
The MongoDB source connector currently builds a change stream pipeline where:
$projectis always applied$matchon collections is optional and only enabled when exclusiveCollectionFilter is set
As a result, MongoDB:
- Reads all change events from the oplog
- Serializes and sends events that the connector later discards
- Performs unnecessary CPU, memory, and network work
Even when downstream bindings are scoped to specific collections, the server still processes:
- Unrelated collections
- Unused change events
Insert a $match stage immediately after $changeStream, derived from the known bindings (db + collection), and apply it by default, unless explicitly disabled.
Conceptually
It always restrict to bound collections unless explicitly disabled
if !allowAllCollections {
$match on ns.db + ns.coll
}
Pipeline order:
$changeStream$match<- NEW (early)$project
Why this is safe
- Does not change connector semantics
- Only removes events that would be ignored downstream anyway
- Uses existing binding metadata already known to the connector
- Matches CDC best practices (server-side filtering first)
Benefits
- Reduced oplog scanning work
- Lower MongoDB CPU and memory usage
- Reduced network traffic
- Lower connector-side processing cost
Metadata
Metadata
Assignees
Labels
No labels