GitHub - teralytics/ansible-hdfs: An Ansible role for configuring HDFS

HDFS

Introduction

This role installs HDFS on Ubuntu/Debian Linux servers.

Role dependencies

The role requires java and zookeeper to be installed, configured and running.

Example

- hosts: hadoop_hosts
  become: True
  roles:
  - hdfs

For an example inventory please check the inventory file.

If hdfs_ssh_fence is set to true the playbook has to be run with the -K option of ansible-playbook!

Configuration

This role supports two different modes of installation:

Single Namenode with Secondary Namenode
Two Namenodes in HA mode

The number of namenodes specifies the mode. If two namenodes are specified HDFS will be installed in an HA fashion.

For documentation details of HDFS please refer to the official Hadoop documentation.

Preparation of your Inventory

This role makes use of groups to figure out which server needs which installation. The groups are listed below:

namenodes
datanodes
secondarynamenode (Single NN setup only)
zookeeper_hosts (High availability mode only)
journalnodes (High availability mode only)

Alternatively variables like hdfs_namenodes can be overwritten (see defaults/main.yml).

Important variables:

The following gives a list of important variables that have to be set for a specific deployment. Most variables can be set in group_vars or host_vars.

hdfs_cluster_name: Name of your cluster
hdfs_parent_dir: Where to install HDFS to
hdfs_version: Hadoop version to use
hdfs_tmpdir: Where to write HDFS tmp files
hdfs_namenode_dir_list: Files of namenodes
hdfs_datanode_dir_list: Files of datanodes
hdfs_namenode_checkpoint_dir_list: Files of secondary namenode
hdfs_distribution_method: Should tar.gz be 'downloaded', 'local_file' or 'compile' install?
hdfs_bootstrap: Should the cluster be formatted? (If you have an already existing installation this option is not recommended)
hdfs_host_domain_name: Only set this variable if your host entries are not FQDNs. E.g. value: "node.dns.example.com"
hdfs_upgrade: Only set this variable to perform an upgrade (given that hdfs_version is changed)
hdfs_upgrade_force: Only set this variable to force an upgrade (the playbook will run even if the version hasn't changed. Good when something went wrong and a node has been already upgraded)

For more configuration variables see the documentation in defaults/main.yml.

If hdfs_upgrade is set to true the playbook will assume an upgrade is taking place and some input from the user might be required.

Additional HDFS configuration

Additional configuration to hdfs-site.xml and core-site.xml can be added by overwriting the following variables:

hdfs_site_additional_properties
core_site_additional_properties

Description of playbooks

This section gives a brief description on what each playbook does.

Native (experimental)

CURRENTLY ONLY WORKS WITH Ubuntu 14.04. (16.04. has a newer protobuf version and compilation fails)

This playbook will compile hadoop on server hdfs_compile_node to enable hadoop native libraries (Compression codecs and HDFS Short-Circuit Local Reads). This playbook will install the development tools necessary to be able to compile hadoop. (Download and compilation may take a while depending on you internet connection and server power (10-20 min))

To activate this playbook enable set hdfs_distribution_method to compile.

Known issues:

Sometimes the git download fails for the first time. Just run it again.

Options:

hdfs_compile_node: Server to compile on
hdfs_compile_from_git: True if it should download the latest version from github.com
hdfs_compile_version: Version to download from github (tags usable by e.g. 'tags/rel/release-2.7.2' or 'HEAD')
hdfs_fetch_folder: Local folder to download the compiled tar to.

base

This playbook installs the hadoop binaries and creates links for easy usage.

config

This playbook writes the configuration files.

upgrade

This playbook upgrades HDFS in a controlled way (applicable only to HA modes). This follows a procedure of no downtime that can be summarized as follows:

Prepare rolling upgrade, wait for "Proceed with rolling upgrade" ..1. Perform upgrade of active namenode (by means of failover to standby) ..2. Failover to newly upgraded namenode, upgrade the second namenode
Perform upgrade of the datanodes in a rolling fashion ..1. Stop running datanode (check if running) ..2. Install the new version ..3. Restart it with the new program version (check if running)
Finalize the rolling upgrade

Be prepared to react to some input made by the playbook specially when dealing with starting and stopping of services. If anything goes wrong, and some nodes were already upgraded, run the playbook again setting hdfs_upgrade_force set to True. This process is idempotent.

user

This playbook will create a user hdfs_user, generate an ssh-key for it, distribute the key and register all servers in known_hosts file of each other.

ssh_fence

This playbook sets up SSH access for the hdfs_user between the namenode servers. Used if an SSH fence is the preferred method as fencing method. (See HA Documentation)

namenode

This playbook writes configuration files needed only by the namenode, creates folder and sets up services for namenode and zkfc.

datanode

This playbook creates the folders specified in hdfs_datanode_dir_list and registers the hdfs-datanode service.

journalnode

This playbook will install the journal node service.

secondarynamenode

This playbook will install and register hdfs-secondarynamenode service.

bootstrap_ha

This playbook bootstraps a cluster in HA mode

bootstrap_spof

This playbook bootstraps a cluster in SPOF mode. (One namenode and one secondary namenode)

Testing

The tests are run using molecule and a docker container.

Requirements:

Docker
molecule (pip module)
docker-py (pip module)

Running tests

From the root folder run molecule test.

License

Apache 2.0

Author information

Bertrand Bossy
Florian Froese
Laurent Hoss

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
defaults		defaults
handlers		handlers
meta		meta
molecule/default		molecule/default
tasks		tasks
templates		templates
.gitignore		.gitignore
.travis.yml		.travis.yml
.yamllint		.yamllint
LICENSE		LICENSE
README.md		README.md
ansible.cfg		ansible.cfg
inventory		inventory
requirements.yml		requirements.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HDFS

Introduction

Role dependencies

Example

Configuration

Preparation of your Inventory

Important variables:

Additional HDFS configuration

Description of playbooks

Native (experimental)

base

config

upgrade

user

ssh_fence

namenode

datanode

journalnode

secondarynamenode

bootstrap_ha

bootstrap_spof

Testing

Requirements:

Running tests

License

Author information

About

Releases

Packages

Contributors 10

Languages

License

teralytics/ansible-hdfs

Folders and files

Latest commit

History

Repository files navigation

HDFS

Introduction

Role dependencies

Example

Configuration

Preparation of your Inventory

Important variables:

Additional HDFS configuration

Description of playbooks

Native (experimental)

base

config

upgrade

user

ssh_fence

namenode

datanode

journalnode

secondarynamenode

bootstrap_ha

bootstrap_spof

Testing

Requirements:

Running tests

License

Author information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages