Configure Hadoop and Start the Cluster Services Using Ansible
So guys let’s start with task description
Task Description📄
🔰 Configure Hadoop and start cluster services using Ansible Playbook
In this article , you will see how ansible will help you to configure the hadoop and start cluster services. Before this I will give some info about RedHat Ansible and Hadoop .
RedHat Ansible
Ansible is simple open source IT engine which automates application deployment, intra service orchestration, cloud provisioning and many other IT tools.
Ansible is easy to deploy because it does not use any agents or custom security infrastructure.
Ansible uses playbook to describe automation jobs, and playbook uses very simple language i.e. YAML (It’s a human-readable data serialization language & is commonly used for configuration files, but could be used in many applications where data is being stored)which is very easy for humans to understand, read and write. Hence the advantage is that even the IT infrastructure support guys can read and understand the playbook and debug if needed (YAML — It is in human readable form).
Ansible is designed for multi-tier deployment. Ansible does not manage one system at time, it models IT infrastructure by describing all of your systems are interrelated. Ansible is completely agentless which means Ansible works by connecting your nodes through ssh(by default). But if you want other method for connection like Kerberos, Ansible gives that option to you.
After connecting to your nodes, Ansible pushes small programs called as “Ansible Modules”. Ansible runs that modules on your nodes and removes them when finished. Ansible manages your inventory in simple text files (These are the hosts file). Ansible uses the hosts file where one can group the hosts and can control the actions on a specific group in the playbooks.
HADOOP
Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.
So , now you have a little bit idea about Hadoop & RedHat Ansible , or How Ansible Work .
Doing Ansible practical first you have a clear manual approach that help you to do automation .
start with hadoop cluster using ansible playbook
- Pre-requisite To Configure the Hadoop Cluster using Ansible
→ ansible is written in python , command for install Ansible mentioned below
pip3 install ansible
by using above command ansible install in your system , to check the ansible will install or not use below mentioned command .
ansible --version
→ Configuring the inventory file listing thenote all Managed node information. see below ss for better clarification.
Note: run above commands in Controller Node
MANAGED NODE
Now jump to Managed Node here you see that in managed node no java file and hadoop file present .
Controller node
ip.txt is like a key for ansible which contain ip address , user_name , passsword .which help ansible to connect with given ip system.
Checking the availability of managed node use below mentioned command
ansible all --list-hosts
Now check the connectivity between the managed node and controller node for this use below mentioned command.
ansible all -m ping
now our CN is connected with MN you can see that ping : pong written in last means everything is fine.
Before play the ansible book
see playbook of namenode
- hosts: namenode
tasks:
- name: "Copying the hadoop Software File"
copy:
src: "hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/"
- name: "Copying the Jdk Software File"
copy:
src: "jdk-8u171-linux-x64.rpm"
dest: "/root/"
- name: "Installing Jdk"
shell: "rpm -ivh jdk-8u171-linux-x64.rpm"
register: Java
ignore_errors: yes
- name: "Installing Hadoop"
shell: "rpm -ivh hadoop-1.2.1-1.x86_64.rpm --force"
register: Hadoop
ignore_errors: yes
- name: "Copying the core-site.xml file"
copy:
src: "core-site.xml"
dest: "/etc/hadoop/core-site.xml"
- name: "Copying the hdfs-site.xml file"
copy:
src: "namenode-hdfs-site.xml"
dest: "/etc/hadoop/hdfs-site.xml"
- name: "Creating a directory"
file:
state: directory
path: "nn"
- name: "Formatting Namenode"
shell: "echo Y | hadoop namenode -format"
register: format
- name: "Starting the namenode"
shell: "hadoop-daemon.sh start namenode"
ignore_errors: yes
register: hadoop_started
- name: "checking status of namenode"
shell: "jps"
register: jps
in this playbook copied the Jdk and Hadoop software and installed the softwares configured the hdfs-site.xml and core-site.xml files by copied the files from the Controlled Node created a directory and formatted the namenode started the service of the namenode.
Playbook Datanode Configuration:
- hosts: datanode
tasks:
- name: "Copying the hadoop Software File"
copy:
src: "hadoop-1.2.1-1.x86_64.rpm"
dest: "/root/"
- name: "Copying the Jdk Software File"
copy:
src: "jdk-8u171-linux-x64.rpm"
dest: "/root/"
- name: "Installing Jdk"
shell: "rpm -ivh jdk-8u171-linux-x64.rpm"
register: Java
ignore_errors: yes
- name: "Installing Hadoop"
shell: "rpm -ivh hadoop-1.2.1-1.x86_64.rpm --force"
register: Hadoop
ignore_errors: yes
- name: "Copying the core-site.xml file"
copy:
src: "core-site.xml"
dest: "/etc/hadoop/core-site.xml"
- name: "Copying the hdfs-site.xml file"
copy:
src: "datanode-hdfs-site.xml"
dest: "/etc/hadoop/hdfs-site.xml"
- name: "Creating a directory"
file:
state: directory
path: "dn"
- name: "Starting the datanode"
shell: "hadoop-daemon.sh start datanode"
ignore_errors: yes
register: hadoop_started
- name: "checking status of datanode"
shell: "jps"
register: jps
In this playbook copied the files of Hadoop Software and Jdk software and installed the softwares configured the hdfs-site.xml and core-site.xml files by copied the files from the Controlled Node created a directory and started the service of the datanode.
By running the playbook of the namenode in the controlled node as ansible-playbook namenode.yml
- now run the ansible playbook
ansible-playbook <playbook_name>
Now , at last
That,s all at last my datanode is connected to the namenode and provide approx 50 GB storage.
Hopefully, you learn something new from the article as well as enjoy it.
I tried to explain as much as possible. Hope You learned Something from here. Feel free to check out my LinkedIn profile mentioned below and obviously feel free to comment. And feedback dena n bhule .
Linkedin profile :- https://www.linkedin.com/in/jatin-lodhi-9230571a7/