pipelinewise-tap-mongodb

This is a Singer tap that produces JSON-formatted data following the Singer spec from a MongoDB source.

Custom setup

install with meltano add --custom extractor tap-mongodb
- namespace: tap-mongodb
- pip_url: git+https://github.com/JamieSplitit/pipelinewise-tap-mongodb.git
- executable name: tap-mongodb
- capabilities: state,catalog,discover,log-based
- settings: password,user,host,auth_database,database,srv,port:integer,replica_set,ssl,verify_mode,include_schemas_in_destination_stream_name,update_buffer_size:integer,await_time_ms:integer

Set up local dev environment:

make setup

Activate virtual environment

. venv/bin/activate

Set up Config file

Create json file called config.json, with the following contents:

{
  "password": "<password>",
  "user": "<username>",
  "host": "<host ip address>",
  "auth_database": "<database name to authenticate on>",
  "database": "<database name to sync from>"
}

The following parameters are optional for your config file:

Name	Type	Default value	Description
`srv`	Boolean	false	uses a `mongodb+srv` protocol to connect. Disables the usage of `port` argument if set to `True`
`port`	Integer	false	Connection port. Required if a non-srv connection is being used.
`replica_set`	string	null	name of replica set
`ssl`	Boolean	false	can be set to true to connect using ssl
`verify_mode`	Boolean	true	Default SSL verify mode
`include_schemas_in_destination_stream_name`	Boolean	false	forces the stream names to take the form `<database_name>-<collection_name>` instead of `<collection_name>`
`update_buffer_size`	int	1	[LOG_BASED] The size of the buffer that holds detected update operations in memory, the buffer is flushed once the size is reached
`await_time_ms`	int	1000	[LOG_BASED] The maximum amount of time in milliseconds the loge_base method waits for new data changes before exiting.

All of the above attributes are required by the tap to connect to your mongo instance. here is a sample configuration file.

Run in discovery mode

Run the following command and redirect the output into the catalog file

tap-mongodb --config ~/config.json --discover > ~/catalog.json

Your catalog file should now look like this:

{
  "streams": [
    {
      "table_name": "<table name>",
      "tap_stream_id": "<tap_stream_id>",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "row-count":<int>,
            "is-view": <bool>,
            "database-name": "<database name>",
            "table-key-properties": [
              "_id"
            ],
            "valid-replication-keys": [
              "_id"
            ]
          }
        }
      ],
      "stream": "<stream name>",
      "schema": {
        "type": "object"
      }
    }
  ]
}

Edit Catalog file

Using valid json, edit the config.json file

To select a stream, enter the following to the stream's metadata:

"selected": true,
"replication-method": "<replication method>",

<replication-method> must be either FULL_TABLE, INCREMENTAL or LOG_BASED, if it's INCREMENTAL, make sure to add a "replication-key".

For example, if you were to edit the example stream to select the stream as well as add a projection, config.json should look this:

{
  "streams": [
    {
      "table_name": "<table name>",
      "tap_stream_id": "<tap_stream_id>",
      "metadata": [
        {
          "breadcrumb": [],
          "metadata": {
            "row-count": <int>,
            "is-view": <bool>,
            "database-name": "<database name>",
            "table-key-properties": [
              "_id"
            ],
            "valid-replication-keys": [
              "_id"
            ],
            "selected": true,
            "replication-method": "<replication method>"
          }
        }
      ],
      "stream": "<stream name>",
      "schema": {
        "type": "object"
      }
    }
  ]
}

Run in sync mode:

tap-mongodb --config ~/config.json --catalog ~/catalog.json

The tap will write bookmarks to stdout which can be captured and passed as an optional --state state.json parameter to the tap for the next sync.

Logging configuration

The tap uses a predefined logging config if none is provided, however, you can set your own config by setting the environment variable LOGGING_CONFIG_FILE as the path to the logging config. A sample config is available here.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
bin		bin
tap_mongodb		tap_mongodb
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pylintrc		pylintrc
sample_config.json		sample_config.json
sample_logging.conf		sample_logging.conf
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pipelinewise-tap-mongodb

Custom setup

Set up local dev environment:

Activate virtual environment

Set up Config file

Run in discovery mode

Edit Catalog file

Using valid json, edit the config.json file

Run in sync mode:

Logging configuration

About

Uh oh!

Releases

Packages

Languages

License

Jamie-B22/pipelinewise-tap-mongodb

Folders and files

Latest commit

History

Repository files navigation

pipelinewise-tap-mongodb

Custom setup

Set up local dev environment:

Activate virtual environment

Set up Config file

Run in discovery mode

Edit Catalog file

Using valid json, edit the config.json file

Run in sync mode:

Logging configuration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages