Skip to content

Improve performance of the structured data generator #124

@hariso

Description

@hariso

Feature description

The following pipeline can produce 6-7k msg/s (on my machine and on a c7i.xlarge EC2 instance). That is pretty slow when compared to real data sources, such as Postgres (which can achieve 10x more).

One possibility is always to return one pre-generated record. That's what we also do with the file generator.

Example pipeline:

version: "2.2"
pipelines:
  - id: generator-kafka
    status: running
    connectors:
      - id: generator
        type: source
        plugin: builtin:generator
        settings:
          collections.users.format.type: structured
          collections.users.format.options.id: int
          collections.users.format.options.name: string
          collections.users.format.options.email: string
          collections.users.format.options.position: string
          collections.users.format.options.salary: int
          collections.users.format.options.full_time: bool
          collections.users.format.options.hire_date: time
          collections.users.format.options.created_at: time
          collections.users.format.options.updated_at: time
          sdk.schema.extract.key.enabled: false
          sdk.schema.extract.payload.enabled: false
      - id: kafka-destination
        type: destination
        plugin: builtin:kafka
        name: kafka-destination
        settings:
          servers: "benchi-kafka:9092"
          topic: "generator.to.kafka"
          compression: "none"

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or requesttriageTo be triaged

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions