Components of Hadoop

Hadoop

Introduction


Before the boom of Hadoop, the storage and processing of big data was a big challenge. But now that Hadoop is available, companies have realized the business impact of Big Data and how understanding this data will drive the growth. Big Data has many useful and insightful applications.
Hadoop is the straight answer for processing Big Data. Also, Hadoop ecosystem is a combination of technologies that have a proficient advantage in solving business problems.
Let us understand the components in the Hadoop Ecosystem:

Before that, In my previous blog I bring in my view of understanding of Hadoop. Don’t miss it if you have not read it yet: Click here to read it.

Core Hadoop:HDFS:


HDFS stands for Hadoop Distributed File System for managing big data sets with High Volume, Velocity, and Variety. It implements the master-slave architecture where Master is Name node and slave is data node.HDFS is well known for Big Data storage.

Map Reduce:

Map Reduce is a programming model designed to process high volume distributed data. The platform is built using Java for better exception handling. Map Reduce includes two daemons, Job Tracker and Task Tracker.

YARN:

YARN stands for Yet Another Resource Negotiator. It is also called as MapReduce 2(MRv2). The two major functionalities of Job Tracker in MRv1, resource management, and job scheduling/ monitoring are split into separate daemons which are ResourceManager, NodeManager, and ApplicationMaster.

Pig:

Apache Pig is a high-level language built on top of MapReduce for analyzing large datasets with simple ad-hoc data analysis programs. Pig is also known as Data Flow language. Pig scripts internally will be converted to map-reduce programs.

Hive:

Apache Hive is another high-level query language and data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

Mahout:

It is a scalable machine learning library designed for building predictive analytics on Big Data.

Apache Sqoop:

It is a tool designed for bulk data transfers between relational databases and Hadoop.
What does it do? Import and export to and from HDFS.Import and export to and from Hive. Import and export to HBase.

Do you know the reason to prefer Spark over Map reduce ? Know more

About the Author

priyabrat

Priyabrat Bishwal

is a Data Engineer at Societe Generale Global Solution Centre. A big data enthusiast and passionate in the area of data science and Machine Learning . In addition, He is currently pursuing M. Tech programme on Data Science from Bit Pilani. He likes writing blogs and always eager to help students from science background. You can reach out to Priyabrat at [email protected] For more detail, follow him on his: linkedin page