Hadoop hdfs distributed mode in GCP

We will try to abstract the main concepts that will be used in the creation of the cluster:

  • hadoop: Allows the processing of data set in several computers, the components that allow us to do that are hdfs and mapreduce.
  • Hdfs: (Hadoop Distributed File System) It is used to replicate the data and keep copies of data in the worker nodes
  • Namenode: Stores files and the metadata of the data set.
  • Datanode: Manages storage and block creation and replication
  • core-site.xml: the file to set where hdfs is working
  • hdfs-site.xml: is the file to set the directories to the data and metadata

For demo purposes, we will use the Google Cloud Platform (GCP), where I have a free account and we will create the hadoop cluster.

ARCHITECTURE

Within the google platform we will create a project called: hadoop-hdfs-demo:

2019-03-02-222107_474x430_scrot.png

Image01: The hdfs cluster architectura
  • Plataforma: Google Cloud Platform
  • SO: Ubuntu 16.04
  • 1 vCPU + 3.75 GB memory
  • 10 GB standard persistent disk

vms-hadoop-gcp.png

Image02: The hdfs cluster in Google cloud platform

Note: They are basic characteristics for demo, for production you need more resources according to your big data.

INSTALLATION AND CONFIGURATION

Before you start, I suggest using a terminal that allows you to execute several commands in many terminals, “terminator” works well for me.  Let’s start!

These steps are in  (master and workers):

  • Create a hadoop user and hadoop home

sudo useradd -m -d /home/hadoop hadoop

sudo passwd hadoop

  • make “hadoop” has root privileges

sudo visudo

  • Add the second line after root

# User privilege specification
root            ALL=(ALL:ALL) ALL
hadoop      ALL=(ALL:ALL) ALL

  • Update the versions of packages in ubuntu

sudo apt-get update

  • Install Java:

sudo apt-get install openjdk-8-jdk

sudo nano /etc/environment

  • Add JAVA_HOME environment variables

sudo nano /etc/environment

#Add the below line at the end on the file

JAVA_HOME=”/usr/lib/jvm/java-8-openjdk-amd64″

  • Download hadoop

use this link to get more versions: http://www-us.apache.org/dist/hadoop/common/

wget http://www-us.apache.org/dist/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz

  • Unzip content

tar -xvf hadoop-3.1.1.tar.gz

 

  • Now, we write the content to core-site.xml and hdfs-site.xml

cd /home/hadoop/hadoop-3.1.1/etc/hadoop

nano core-site.xml ( this file is the same in master and  workers)

2019-02-09-232833_802x537_scrot.png

nano hdfs-site.xml (only master)

2019-02-09-233221_697x587_scrot.png

nano hdfs-site.xml (only workers)

2019-02-09-233319_682x576_scrot.png

  • Move hadoop directory to /usr/local/hadoop-3.1.1

sudo mv /home/hadoop/hadoop-3.1.1 /usr/local/hadoop-3.1.1

  • Add /usr/local/hadoop-3.1.1 to environment variables

sudo nano /etc/environment

2019-02-09-235111_976x144_scrot.png

  • Make the connection between master and workers, first generate the ssh key

# master and worker

ssh-keygen

  • In master, copy the id_rsa.pub key from master and paste in authorized_keys in  the same master and workers

# print the value

cat ~/.ssh/id_rsa.pub

# paste the value in authorized_keys ( if this file doesn’t exist, create it)

sudo nano ~/.ssh/authorized_keys

* Do same from workers to master

  • After that, I should be able to connect the virtual machines

Master

ssh hadoop@hadoop-worker01

ssh hadoop@hadoop-worker02

Workers

ssh hadoop@hadoop-master

  • Modifying the hosts file, we use the private ips in google cloud platform:

In master

hadoop@hadoop-master:~/.ssh$ cat /etc/hosts

127.0.0.1 localhost
10.142.0.6 hadoop-master
10.142.0.3 hadoop-worker01
10.142.0.4 hadoop-worker02

In worker01

hadoop@hadoop-worker01:~/.ssh$ cat /etc/hosts
127.0.0.1 localhost

In worker02

hadoop@hadoop-worker02:~/.ssh$ cat /etc/hosts
127.0.0.1 localhost
  • Configure workers file (master and workers)

sudo nano /usr/local/hadoop-3.1.1/etc/hadoop/workers

In master

hadoop@hadoop-master:~$ cat /usr/local/hadoop-3.1.1/etc/hadoop/workers
hadoop-worker01
hadoop-worker02

In worker01

hadoop@hadoop-worker01:~$ cat /usr/local/hadoop-3.1.1/etc/hadoop/workers
hadoop-worker01
hadoop-worker02

In worker01

hadoop@hadoop-worker02:~$ cat /usr/local/hadoop-3.1.1/etc/hadoop/workers
hadoop-worker01
hadoop-worker02
  • Format hdfs ( only master )

cd /usr/local/hadoop-3.1.1/bin

yes | ./hdfs namenode -format

  • Start hdfs  ( only master )

cd /usr/local/hadoop-3.1.1/sbin

./start-all.sh

2019-03-02-232348_1196x662_scrot.png

If you want to stop the hdfs service use ./stop-all.sh

  • Test hdfs

./hdfs dfs -ls /.

./hdfs dfs -mkdir /demo

nano demo.txt

./hdfs dfs -copyFromLocal demo.txt /demo

./hdfs dfs -cat /demo/demo.txt

2019-03-02-232632_1151x580_scrot.png

If this doesn’t work:

Delete the directory name, data and tmp in /home/hadoop ( master and workers), they are generated automatically and do the format again.

If you have difficulties please, do not hesitate to write!  Good look!

Katsuhi 2018

This year the event was with a larger call, I saw that the number of children exceeded the previous year so I bought many more books. I added dynamic games that helped me build trust with the children. After that, I received support from some volunteers, whom I thank very much, more than all from my family. Finally, I coordinated with the children’s parents.

I shared some photos of the activities in which we had a lot of fun.

40683800_308447523042938_371477214694211584_n.jpg

It was really wonderful!

Waiting for katsuhi 2019 …

 

Katsuhi 2017

The first time that I did the katsuhi social program, I wrote a basic and light structure that would allow me to execute it in a short timeframe. Then I gathered materials second hand that did not cost much and repaired them.

21017761_10211728557723775_206944713_o

I shared some photos of the first event, children with interest and desire to read and share.

21268457_10211819639200755_1885161137_o.jpg

21291165_10211819622440336_2084184142_n.jpg

It was a wonderful experience!

Katsuhi Social

What is the Katsuhi social program?

Katsuhi is a social program with the purpose of bringing diversion and culture to the children of different communities in our country. We started in our region of Ayacucho, exactly in the Yanamilla Community ( With the consent of the representatives of the community).

Our goals:

  1. Frequent values ​​practice
  2. Critical thinking
  3. Love for reading
  4. Development

Activities

  1. Presentation
  2. Recharge energy
  3. A personal greeting
  4. Greeting in couples
  5. Reading and theater
  6. Feedback
  7. The scream: snowball

This program has no political, religious or economic interest. Our only interest is the children’s learning because we know that they are our future and as individuals, we are aware that we can help now to make a better country.

Three quick ways to start with Kubernetes

2019-02-16-182143_819x482_scrot.png

There are many ways to start with Kuberntetes, I will write three of them with which you can make your first project quickly.

1. Minikube: 

Is a simple way to create a cluster of Kubernetes in single-node. We need to install first the following:

  • Install virtual box

    Minikube pulls an image into a virtual box. We will use these commands to install it:

sudo apt-get update && sudo apt-get dist-upgrade && sudo apt-get autoremove
sudo apt-get -y install gcc make linux-headers-$(uname -r) dkms
wget -q https://www.virtualbox.org/download/oracle_vbox_2016.asc -O- | sudo apt-key add -
wget -q https://www.virtualbox.org/download/oracle_vbox.asc -O- | sudo apt-key add -
sudo sh -c 'echo "deb http://download.virtualbox.org/virtualbox/debian $(lsb_release -sc) contrib" >> /etc/apt/sources.list'
sudo apt-get update
sudo apt-get install virtualbox-5.2

Check if virtual box is installed:  VBoxManage -v

  • Install Kubectl

    To execute kubernetes commands:

sudo apt-get update && sudo apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl

Check if Kubectl is installed: kubectl version

  • Install Minikube

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 \
&& chmod +x minikube

sudo cp minikube /usr/local/bin && rm minikube

Check if minikube is installed: minikube version

  • Start Minikube

minikube start

  • Creating a hello-minikube container

kubectl run hello-minikube --image=gcr.io/google-samples/node-hello:1.0 --port=8080

  • Expose the service

kubectl expose deployment hello-minikube --type=NodePort

  • We can see the pod  status

kubectl get pod

2019-02-16-011113_746x65_scrot

  • We can see the service

minikube service hello-minikube --url

  • Delete the pod

kubectl delete deployment hello-minikube

  • Deleted the service

kubectl delete service hello-minikube

2. Kubernetes on Google Cloud Platform:

We need to have a Google Cloud Platform account if you have one go to console: https://console.cloud.google.com

Select kubernetes cluster on left menu:

2019-02-16-025616_307x425_scrot

Click on create cluster:2019-02-16-011535_529x263_scrot

In Google we will create 03 workers, the master is created by default.

2019-02-16-011647_925x847_scrot.png

2019-02-16-032857_1165x325_scrot.png

Click on “Connect”, and click on “Run in cloud shell”

2019-02-16-032647_942x564_scrot.png

Test: kubectl get nodes

2019-02-16-032822_1714x318_scrot.png

We have a kubernetes cluster created on GCP

3. Labs play with k8s

Play with Kubernetes to run K8 cluster in seconds: https://labs.play-with-k8s.com

Click on “Add new Instance” to start a node. Enter the console and you can follow the guide to start the cluster. ( If  the terminal doesn’t show, refresh the page).

2019-02-16-183639_1547x801_scrot

 

With these three tools, we can start playing with Kuberntes, If you have questions, do not hesitate to write.

Thank you!

 

 

Welcome to Katsuhi

 

Katsuhi is a friendly space to share my knowledge about the use of technology. These writings may be useful for you, maybe you’re in an implementation stage and you have a lot of desire to keep learning. As well as other spaces on the internet, they were useful in helping me solve problems. I love that the information is shared and used in a good way.

This blog is sharing stuff that has caught my attention and also things I have learned. Some of them that have made battle and that I will be adding as an interesting and useful collection.

I also use katsuhi to write about some personal projects that I am doing with the purpose of contributing to the development of my country.

Now a little about me. My name is Edith Puclla, I am from AyacuchoPerú and I am basically passionate about learning and sharing knowledge related to technology, I love how the computer works, the growth of technology and playing with them. I am one of the people who think we should always dream and try and try to achieve it.