Motivation
No lengthy commentary is needed to communicate the growing importance of big data technologies. Look no further than the rounds of funding [1][2][3] that companies like Cloudera, Hortonworks and MapR have attracted in recent months. It is widely expected that the market for Hadoop will likely grow to $20 Billion by 2018.
The key motivations for the growth of big data technologies includes:
- The growing need to process ever increasing volumes of data. This growth in data is not limited to web scale companies alone. Businesses of all sizes are seeing this growth.
- Not all data conforms to a well-defined structure/schema, so there is a need to supplement (if not replace) the traditional data processing and analysis tools such as EDWs.
- Ability to take advantage of deep compute analytics using massively parallel, commodity based clusters. We will see examples of deep compute analysis a little bit later but this is a growing area of deriving knowledge from the data.
- Overall simplicity (from the standpoint of the analyst/ developer authoring the query) that hides the non-trivial complexity of the underlying infrastructure.
- Price-performance benefit accorded due to the commodity based clusters and fault tolerance.
- The ability to tap into fast paced innovation taking place within the “Hadoop” ecosystem. Consider that Map Reduce, which has been the underpinning of Hadoop ecosystem for years, is being replaced by projects such as Yarn in recent months. Read More…