Hadoop

Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment, typically in a Big Data context. At it's core, Apache Hadoop is
open-source software framework for distributed storage (Hadoop Distributed File System or HDFS) and distributed processing (MapReduce). It was launched in 2011 as open-source software under the Apache Software Foundation.

Hadoop is a buzzword and thus often revers to the ecosystem around big data technologies. As such, it can refer not only to Apache Hadoop (which includes Hadoop HDFS, Hadoop, MapReduce, Hadoop Yarn and Hadoop Common) but also Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Cloudera Impala, Apache Flume, Apache Sqoop, Apache Oozie and Apache Storm. In a management context, Hadoop often revers to the general ability to store and use big datasets.

Important to know is that Apacha Hadoop is used through the use of shell scripts (and thus not a visual or interactive software). Over time, main vendors such as IBM, Microsoft, Google, SAP, HortonWorks have simplified the approach and commands towards Hadoop or even abstracted it into a managed solution readily available to write and read data (e.g. Azure HDInsights).

Published on