Hadoop Tutorial: Big Data & Hadoop Ė Restaurant Analogy
Let us take an analogy of an eating place to understand the problems related to Big Data and the way Hadoop solved that trouble.
Bob is a businessman who has opened a small eating place. Initially, in his eating place, he used to receive two orders consistent with an hour and he had one chef with one food shelf in his eating place which turned into sufficient sufficient to address all the orders.
Now let us compare the eating place example with the conventional scenario in which information became getting generated at a steady price and our conventional systems like RDBMS are capable enough to handle it, just like Bobís chef. Here, you could relate the statistics garage with the eating placeís meals shelf and the conventional processing unit with the chef as shown inside the parent above.
After a few months, Bob's notion of increasing his commercial enterprise, and therefore, he began taking on-line orders and brought few extra cuisines to the eating placeís menu so one can engage a larger audience. Because of this transition, the rate at which they were receiving orders rose to an alarming parent of 10 orders in line with an hour and it became pretty difficult for an unmarried prepare dinner to manage up with the cutting-edge state of affairs. Aware of the state of affairs in processing the orders, Bob began considering the solution.
Similarly, within the Big Data situation, the facts started getting generated at an alarming fee because of the introduction of various facts growth drivers consisting of social media, smartphones, etc.
Now, the traditional machine, much like the cook in Bobís restaurant, becomes not efficient enough to handle this unexpected change. Thus, there was a want for a different form of solutions approach to managing up with this trouble.
After loads of research, Bob came up with a solution wherein he hired four more cooks to tackle the large rate of orders being received. Everything becomes going quite well, however, this answer caused one extra problem. Since 4 chefs have been sharing the equal meals shelf, the very food shelf turned into turning into the bottleneck of the whole system. Hence, the solution was not that efficient as Bob's idea.
Similarly, to address the problem of processing massive information sets, more than one processing units had been set up in order to method the facts in parallel (similar to Bob hired 4 chefs). But even in this case, bringing a couple of processing devices became not a powerful answer because the centralized storage unit has become the bottleneck.
In different words, the overall performance of the whole device is driven by using the overall performance of the central garage unit. Therefore, the moment our central storage is going down, the whole gadget receives compromised. Hence, again there has been a need to clear up this single point of failure.
Bob got here up with another efficient solution, he divided all the cooks into two hierarchies, that's a Junior and a Head chef, and assigned every junior chef with a meals shelf. Let us count on that the dish is Meat Sauce. Now, in step with Bobís plan, one junior chef will put together a meat and the opposite junior chef will prepare the sauce. Moving ahead they'll transfer both meat and sauce to the top chef, in which the pinnacle chef will put together the beef sauce after combining both the ingredients, which then will be added as the very last order.
Hadoop features in a similar style to Bobís restaurant. As the meals shelf is shipped in Bobís restaurant, similarly, in Hadoop, the facts is stored in a distributed fashion with replications, to provide fault tolerance. For parallel processing, first, the information is processed via the slaves in which it's far saved for a few intermediate consequences and then those intermediate consequences are merged with the aid of the grasp node to send the final result.
Now, you need to have were given a concept of why Big Data is a trouble declaration and how Hadoop solves it.
As we simply mentioned above, there were three primary demanding situations with Big Data:
The first hassle is storing the enormous quantity of facts
Storing big data in a traditional device isn't always possible. The purpose is obvious, the storage will be restrained to one gadget and the information is increasing at a superb rate.
The second trouble is storing heterogeneous data
Now we realize that storing is a problem, however, allow me to inform you it is just one a part of the problem. The information isn't only large, however it's also present in various codecs i.E. Unstructured, semi-structured and structured. So, you want to make sure which you have the gadget to shop different types of information that are generated from diverse sources.
Finally letís focus on the third problem, which is the processing speed
Now the time taken to procedure this massive quantity of facts is quite excessive as the information to be processed is simply too large.
To resolve the storage problem and processing difficulty, core additives have been created in Hadoop Ė HDFS, and YARN. HDFS solves the storage difficulty as it stores the records in a distributed style and is without difficulty scalable. And, YARN solves the processing problem by reducing the processing time drastically. Moving ahead, permit us to understand what is Hadoop?