What's the best Hadoop thread book

Hadoop introduced Yarn in the 2.x version. When I started working with big data, the company was still using the original set of job trackers and task trackers, which was pretty impressive. I was very impressed with the fact that at that time, to say yarn is good, I had to criticize how bad it was before. It was pretty revolutionary. Indeed, at the time, the workload of most companies was not great, especially many small and medium-sized businesses. It's the exploration stage of the small Hadoop cluster, and the problem isn't that exaggerated.

Figure 1: The occurrence of the fighting in these years

During the high-speed iteration of the big data ecosystem over the past few years, new features have been continuously introduced, old problems have been continuously resolved, and we look forward to each new version. Yarn is a newly introduced computing platform primarily responsible for the collaboration of resources in a distributed environment. Yet another resource negotiator suggests that yarn manages resources. Well that's true. A.

My friends in the same group took my yarn book for a week and then over dinner they said, “Uncle Min, what kind of yarn is this?” I said, “Yarn is like a classroom, the teacher has to move to the classroom To be able to attend class, but the content of the class is self-administered. "I feel a bit like a business now. The landlord is the person who owns the property. We rent the business and pay the monthly rent. Which business is our pipe. The role of resource management is referred to in Yarn as the ResourceManager and the space in a particular store is referred to as the NodeManager. The process of running applications and resources is achieved through multiple communications in the middle. The specific official website provides a brief picture:


Figure 2: Yarn structure on the official website



ResourceManager and NodeManager form the master-slave relationship node in our cluster. We are now going to create this structural environment.

With our previous establishment, we can basically assume that the master-slave structure of the yarn was started. In our directory / usr / local / svr / hadoop / sbin there are already some scripts with idle key thread keywords.According to the naming convention of the script with which hdfs was previously started, we find start-yarn.sh, run it and then equalize from what is missing.


Figure 3: Thread writing

We stop our HDFS, stop other things and focus on yarn.

Figure 4: Stop the HDFS cluster

Let's try starting the yarn and checking the progress:




Figure 5: Process structure

Then look at the process structure from the node


Figure 6: Process structure on the slave node



The desired structure already exists. Let's go to the UI and see that the default resource access is 8080:

Figure 7: Information about the user interface

It can be seen that there is only one node, the memory allocated above is written with 8G and the number of CPU cores is 8 cores.

The first is the knot problem. We can see the NodeManager on other slave nodes. Let's check the logs on the slave nodes:

Figure 8: The error message above the user interface

When we see this we should be able to understand that our slave node is not configured with the boss's address and the address is used by default. So we add the configuration:

To simplify the configuration, I added a soft link under the / root directory:

We change our configuration file:

Figure 9: Adding the boss for the slave node

Sync the configuration and restart Yarn:

Figure 10: Restart

Check the thread again

Figure 11: Effect on the user interface

The following problem affects memory and CPU. We didn't make any other configurations. The display here does not match our machine. Each of our machines is dual-core with 4G memory and CPU. So we can understand that this is not a real computer CPU and memory, but a parameter that we need to actively adjust. We query the official website to find and change our configuration:

Figure 12: Changing the resource configuration

You can finally see the effect:

Figure 13: The effect of ui


At this point our mini yarn was completely built and then it went to open the shop. Our Hadoop development god has known this for a long time. They prepared a HelloWord version of the yarn program for us, which we also ran earlier, A Wave. This program is called the Hadoop Yarn Applications Distributed Tray, which solves the embarrassment that we're excited not to know what to do after the build is complete ^^. One thing here is that before the hdfs was stopped for the screenshot effect, the yarn itself does not depend on hdfs, but if the operation of hdfs is used in the program it needs to be started and the following commands should be executed in sequence:




Figure 14: Start hdfs to run the thread program


We can see how this program works on the user interface:


Figure 15: The implementation is recorded in the user interface

Interpret the function of this program. After this program has run, the shell script that was passed will be executed in each assigned container. These two parameters

Specifies the contents of the script to be run, and the containers it contains indicate the number of containers run. Let's follow the wave of this yarn program. In the control panel, click the History-> Logs. The information of the container assigned to the container is displayed in the container:



Sixteen: history



Figure 17: Protocols

Figure 18: Container record

We can read that there are two containers assigned in nodes 02 and 04, and we log into the node computer directly to see the execution contents according to the results of the log:

Figure 19: Results of the date run

This is the end result of running our program!

Let's study this yarn program together in the next article.

Follow my other articles, you can scan the QR code