Hadoop Tutorial:
Big Data & Hadoop – Restaurant Analogy
Let us
take an analogy of an eating place to understand the problems related to Big
Data and the way Hadoop solved that trouble.
Bob is
a businessman who has opened a small eating place. Initially, in his eating
place, he used to receive two orders consistent with an hour and he had one chef
with one food shelf in his eating place which turned into sufficient sufficient
to address all the orders.
Now
let us compare the eating place example with the conventional scenario in which
information became getting generated at a steady price and our conventional
systems like RDBMS are capable enough to handle it, just like Bob’s chef. Here,
you could relate the statistics garage with the eating place’s meals shelf and
the conventional processing unit with the chef as shown inside the parent
above.
After
a few months, Bob's notion of increasing his commercial enterprise, and therefore,
he began taking on-line orders and brought few extra cuisines to the eating
place’s menu so one can engage a larger audience. Because of this transition,
the rate at which they were receiving orders rose to an alarming parent of 10
orders in line with an hour and it became pretty difficult for an unmarried prepare
dinner to manage up with the cutting-edge state of affairs. Aware of the state
of affairs in processing the orders, Bob began considering the solution.
Similarly,
within the Big Data situation, the facts started getting generated at an
alarming fee because of the introduction of various facts growth drivers
consisting of social media, smartphones, etc.
Now,
the traditional machine, much like the cook in Bob’s restaurant, becomes not
efficient enough to handle this unexpected change. Thus, there was a want for a
different form of solutions approach to managing up with this trouble.
After
loads of research, Bob came up with a solution wherein he hired four more cooks
to tackle the large rate of orders being received. Everything becomes going
quite well, however, this answer caused one extra problem. Since 4 chefs have
been sharing the equal meals shelf, the very food shelf turned into turning
into the bottleneck of the whole system. Hence, the solution was not that
efficient as Bob's idea.
Similarly,
to address the problem of processing massive information sets, more than one
processing units had been set up in order to method the facts in parallel
(similar to Bob hired 4 chefs). But even in this case, bringing a couple of processing
devices became not a powerful answer because the centralized storage unit has
become the bottleneck.
In
different words, the overall performance of the whole device is driven by using
the overall performance of the central garage unit. Therefore, the moment our
central storage is going down, the whole gadget receives compromised. Hence,
again there has been a need to clear up this single point of failure.
Bob
got here up with another efficient solution, he divided all the cooks into two
hierarchies, that's a Junior and a Head chef, and assigned every junior chef
with a meals shelf. Let us count on that the dish is Meat Sauce. Now, in step
with Bob’s plan, one junior chef will put together a meat and the opposite junior
chef will prepare the sauce. Moving ahead they'll transfer both meat and sauce
to the top chef, in which the pinnacle chef will put together the beef sauce
after combining both the ingredients, which then will be added as the very last
order.
Hadoop
features in a similar style to Bob’s restaurant. As the meals shelf is shipped
in Bob’s restaurant, similarly, in Hadoop, the facts is stored in a distributed
fashion with replications, to provide fault tolerance. For parallel processing,
first, the information is processed via the slaves in which it's far saved for
a few intermediate consequences and then those intermediate consequences are
merged with the aid of the grasp node to send the final result.
Now,
you need to have were given a concept of why Big Data is a trouble declaration
and how Hadoop solves it.
As we simply
mentioned above, there were three primary demanding situations with Big Data:
The first hassle is storing the enormous quantity of facts
Storing big
data in a traditional device isn't always possible. The purpose is obvious, the
storage will be restrained to one gadget and the information is increasing at a
superb rate.
The second trouble is storing
heterogeneous data
Now we
realize that storing is a problem, however, allow me to inform you it is just one a
part of the problem. The information isn't only large, however it's also
present in various codecs i.E. Unstructured, semi-structured and structured.
So, you want to make sure which you have the gadget to shop different types of
information that are generated from diverse sources.
Finally let’s focus on the third
problem, which is the processing speed
Now the time
taken to procedure this massive quantity of facts is quite excessive as the
information to be processed is simply too large.
To resolve
the storage problem and processing difficulty, core additives have been created
in Hadoop – HDFS, and YARN. HDFS solves the storage difficulty as it stores the
records in a distributed style and is without difficulty scalable. And, YARN
solves the processing problem by reducing the processing time drastically.
Moving ahead, permit us to understand what is Hadoop?