Configure Hadoop and Start the Cluster Services Using Ansible

6 min readDec 14, 2020

So guys let’s start with task description

Task Description📄

🔰 Configure Hadoop and start cluster services using Ansible Playbook

In this article , you will see how ansible will help you to configure the hadoop and start cluster services. Before this I will give some info about RedHat Ansible and Hadoop .

RedHat Ansible

Ansible is simple open source IT engine which automates application deployment, intra service orchestration, cloud provisioning and many other IT tools.

Ansible is easy to deploy because it does not use any agents or custom security infrastructure.

Ansible uses playbook to describe automation jobs, and playbook uses very simple language i.e. YAML (It’s a human-readable data serialization language & is commonly used for configuration files, but could be used in many applications where data is being stored)which is very easy for humans to understand, read and write. Hence the advantage is that even the IT infrastructure support guys can read and understand the playbook and debug if needed (YAML — It is in human readable form).

Ansible is designed for multi-tier deployment. Ansible does not manage one system at time, it models IT infrastructure by describing all of your systems are interrelated. Ansible is completely agentless which means Ansible works by connecting your nodes through ssh(by default). But if you want other method for connection like Kerberos, Ansible gives that option to you.

After connecting to your nodes, Ansible pushes small programs called as “Ansible Modules”. Ansible runs that modules on your nodes and removes them when finished. Ansible manages your inventory in simple text files (These are the hosts file). Ansible uses the hosts file where one can group the hosts and can control the actions on a specific group in the playbooks.

HADOOP

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

So , now you have a little bit idea about Hadoop & RedHat Ansible , or How Ansible Work .

Doing Ansible practical first you have a clear manual approach that help you to do automation .

start with hadoop cluster using ansible playbook

Pre-requisite To Configure the Hadoop Cluster using Ansible

→ ansible is written in python , command for install Ansible mentioned below

pip3 install ansible

by using above command ansible install in your system , to check the ansible will install or not use below mentioned command .

ansible --version

→ Configuring the inventory file listing thenote all Managed node information. see below ss for better clarification.

Note: run above commands in Controller Node

MANAGED NODE

Now jump to Managed Node here you see that in managed node no java file and hadoop file present .

Controller node

Configuring the Ansible configuration file by Mentioning the inverntory file location in the configuration file.The Anisble Configuring is located in the location /etc/ansible/ansible.cfg

ip.txt is like a key for ansible which contain ip address , user_name , passsword .which help ansible to connect with given ip system.

Checking the availability of managed node use below mentioned command

ansible all --list-hosts

here above ip is my managed node ip and other one is controller node ip.

Now check the connectivity between the managed node and controller node for this use below mentioned command.

ansible all -m ping

it shows the connectivity between MN & CN

now our CN is connected with MN you can see that ping : pong written in last means everything is fine.

Before play the ansible book

see playbook of namenode

- hosts: namenode
  tasks:
  - name: "Copying the hadoop Software File"
    copy:
     src: "hadoop-1.2.1-1.x86_64.rpm"
     dest: "/root/"
     
  - name: "Copying the Jdk Software File"
    copy:
     src: "jdk-8u171-linux-x64.rpm"
     dest: "/root/"
     
  - name: "Installing Jdk"
    shell: "rpm -ivh jdk-8u171-linux-x64.rpm"
    register: Java
    ignore_errors: yes
    
  - name: "Installing Hadoop"
    shell: "rpm -ivh hadoop-1.2.1-1.x86_64.rpm  --force"
    register: Hadoop
    ignore_errors: yes
 
  - name: "Copying the core-site.xml file"
    copy:
      src: "core-site.xml"
      dest: "/etc/hadoop/core-site.xml"
      
  - name: "Copying the hdfs-site.xml file"
    copy:
      src: "namenode-hdfs-site.xml"
      dest: "/etc/hadoop/hdfs-site.xml"
     
  - name: "Creating a directory"
    file:
      state: directory
      path: "nn"
      
  - name: "Formatting Namenode"
    shell: "echo Y |  hadoop namenode -format"
    register: format
  - name: "Starting the namenode"
    shell: "hadoop-daemon.sh start namenode"
    ignore_errors: yes
    register: hadoop_started
    
  - name: "checking status of namenode"
    shell: "jps"
    register: jps

in this playbook copied the Jdk and Hadoop software and installed the softwares configured the hdfs-site.xml and core-site.xml files by copied the files from the Controlled Node created a directory and formatted the namenode started the service of the namenode.

Playbook Datanode Configuration:

- hosts: datanode
  tasks:
  - name: "Copying the hadoop Software File"
    copy:
     src: "hadoop-1.2.1-1.x86_64.rpm"
     dest: "/root/"
     
  - name: "Copying the Jdk Software File"
    copy:
     src: "jdk-8u171-linux-x64.rpm"
     dest: "/root/"
     
  - name: "Installing Jdk"
    shell: "rpm -ivh jdk-8u171-linux-x64.rpm"
    register: Java
    ignore_errors: yes
    
  - name: "Installing Hadoop"
    shell: "rpm -ivh hadoop-1.2.1-1.x86_64.rpm  --force"
    register: Hadoop
    ignore_errors: yes
 
  - name: "Copying the core-site.xml file"
    copy:
      src: "core-site.xml"
      dest: "/etc/hadoop/core-site.xml"
      
  - name: "Copying the hdfs-site.xml file"
    copy:
      src: "datanode-hdfs-site.xml"
      dest: "/etc/hadoop/hdfs-site.xml"
     
  - name: "Creating a directory"
    file:
      state: directory
      path: "dn"
  - name: "Starting the datanode"
    shell: "hadoop-daemon.sh start datanode"
    ignore_errors: yes
    register: hadoop_started
    
  - name: "checking status of datanode"
    shell: "jps"
    register: jps

In this playbook copied the files of Hadoop Software and Jdk software and installed the softwares configured the hdfs-site.xml and core-site.xml files by copied the files from the Controlled Node created a directory and started the service of the datanode.

By running the playbook of the namenode in the controlled node as ansible-playbook namenode.yml

now run the ansible playbook

ansible-playbook <playbook_name>

copy Jdk & Hadoop software & install Jdk file

create hdfs and core file in namenode & start the name node services

you can see that namenode services start

it is my datanode and you can see that Jdk and core software install in my datanode.

Now , at last

here we changed the font color to show you datanode give storage to the namenode.

That,s all at last my datanode is connected to the namenode and provide approx 50 GB storage.

Hopefully, you learn something new from the article as well as enjoy it.

I tried to explain as much as possible. Hope You learned Something from here. Feel free to check out my LinkedIn profile mentioned below and obviously feel free to comment. And feedback dena n bhule .

Linkedin profile :- https://www.linkedin.com/in/jatin-lodhi-9230571a7/

Thanks, Everyone for reading. That’s all… Signing Off… 😊