Azure Blob Storage file input plugin for Embulk

Embulk file input plugin read files stored on Microsoft Azure Blob Storage

embulk-input-azure_blog_storage v0.2.0+ requires Embulk v0.9.12+

Overview

Plugin type: file input
Resume supported: no
Cleanup supported: yes

Configuration

First, create Azure Storage Account.

account_name: storage account name (string, required)
account_key: primary access key (string, required)
container: container name data stored (string, required)
path_prefix: prefix of target keys (string, required) (string, required)
incremental: enables incremental loading(boolean, optional. default: true). If incremental loading is enabled, config diff for the next execution will include last_path parameter so that next execution skips files before the path. Otherwise, last_path will not be included.
path_match_pattern: regexp to match file paths. If a file path doesn't match with this pattern, the file will be skipped (regexp string, optional)
total_file_count_limit: maximum number of files to read (integer, optional)

Proxy configuration

proxy:
- type: (string, required, default: null)
  - http: use HTTP Proxy
- host: (string, required)
- port: (int, required, default: 8080)
- user: (string, optional)
- password: (string, optional)

Example

in:
  type: azure_blob_storage
  account_name: myaccount
  account_key: myaccount_key
  container: my-container
  path_prefix: logs/csv-

Example for "sample_01.csv.gz" , generated by embulk example

in:
  type: azure_blob_storage
  account_name: myaccount
  account_key: myaccount_key
  container: my-container
  path_prefix: logs/csv-
  decoders:
  - {type: gzip}
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    header_line: true
    columns:
    - {name: id, type: long}
    - {name: account, type: long}
    - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
    - {name: purchase, type: timestamp, format: '%Y%m%d'}
    - {name: comment, type: string}
out: {type: stdout}

To filter files using regexp:

in:
  type: sftp
  path_prefix: logs/csv-
  ...
  path_match_pattern: \.csv$   # a file will be skipped if its path doesn't match with this pattern

  ## some examples of regexp:
  #path_match_pattern: /archive/         # match files in .../archive/... directory
  #path_match_pattern: /data1/|/data2/   # match files in .../data1/... or .../data2/... directory
  #path_match_pattern: .csv$|.csv.gz$    # match files whose suffix is .csv or .csv.gz

With proxy

in:
  type: azure_blob_storage
  ...
  proxy:
      type: http
      host: proxy_host
      port: 8080
      user: proxy_user
      password: proxy_secret_pass

Build

$ ./gradlew gem  # -t to watch change of files and rebuild continuously

Test

$ ./gradlew test  # -t to watch change of files and rebuild continuously

To run unit tests, we need to configure the following environment variables.

Additionally, following files will be needed to upload to existing GCS bucket.

When environment variables are not set, skip some test cases.

AZURE_ACCOUNT_NAME
AZURE_ACCOUNT_KEY
AZURE_CONTAINER
AZURE_CONTAINER_IMPORT_DIRECTORY (optional, if needed)

If you're using Mac OS X El Capitan and GUI Applications(IDE), like as follows.

$ vi ~/Library/LaunchAgents/environment.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>my.startup</string>
  <key>ProgramArguments</key>
  <array>
    <string>sh</string>
    <string>-c</string>
    <string>
      launchctl setenv AZURE_ACCOUNT_NAME my-account-name
      launchctl setenv AZURE_ACCOUNT_KEY my-account-key
      launchctl setenv AZURE_CONTAINER my-container
      launchctl setenv AZURE_CONTAINER_IMPORT_DIRECTORY unittests
    </string>
  </array>
  <key>RunAtLoad</key>
  <true/>
</dict>
</plist>

$ launchctl load ~/Library/LaunchAgents/environment.plist
$ launchctl getenv AZURE_ACCOUNT_NAME //try to get value.

Then start your applications.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
config/checkstyle		config/checkstyle
gradle		gradle
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
README.md		README.md
appveyor.yml		appveyor.yml
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure Blob Storage file input plugin for Embulk

Overview

Configuration

Proxy configuration

Example

Build

Test

About

Releases

Packages

Contributors 4

Languages

embulk/embulk-input-azure_blob_storage

Folders and files

Latest commit

History

Repository files navigation

Azure Blob Storage file input plugin for Embulk

Overview

Configuration

Proxy configuration

Example

Build

Test

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages