Is Hadoop in real time

Hadoop

Apache Hadoop is a framework from the ApacheSoftware Foundation (ASF) for scalable, distributed software. Originally created by open source developer Doug Cutting, Hadoop became a top-level project of the Apache Software Foundation in January 2008, and Cutting was elected chairman of the ASF Foundation in September 2010.


Hadoop enables intensive computing processes with extensive amounts of data in the range of petabytes. Developed on the basis of Java, Hadoop works with the MapReduce algorithm from Google with configurable classes for Map, Reduce and Combine phases. Hadoop is also used on the Facebook, IBM and Yahoo web platforms. Companies that support Hadoop include EMC, Microsoft, SAP, and Teradata.

The Hadoop Distributed File System (HDFS) is a distributed file system with a high fault tolerance towards hardware failures. It is used to store several 100 million files on multiple storage components and servers. It divides the files into data blocks with a fixed length and distributes them to the connected hardware. A master node (NameNode) processes the incoming queries and organizes the storage of the metadata of files in the slaves.

Hadoop includes several extensions, including the HBase database, a free implementation of Google BigTable for managing billions of rows within a Hadoop cluster. Hive enables data warehousing with Hadoop. This query language has a SQL-based syntax and was developed by Facebook. Further extensions are Pig for the development of MapReduce programs in the programming language Pig Latin, ZooKeeper for the configuration of distributed systems and Chukwa for monitoring distributed systems in real time.

With Apache Spark, the Apache Software Foundation has developed a much faster open source framework.