by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
GCN : May 2014
HADOOP IS A disruptive force in the traditional data management space. However, there are both good and bad sides to the disruption, as well as some ugly marketing hype fueling it. The good side of Hadoop s disruption is in the realm of big data. Hadoop is an open source, Java-based ecosystem of technologies that exploit many low-cost, commodity machines to process huge amounts of data in a reason- able time. The bottom line is that Hadoop works for big data, functions well at a low cost and is improving every day. A recent Forrester report called Hadoop s momentum "unstoppable." Currently there are hun- dreds, even thousands, of contributors to the Hadoop community, including dozens of large companies like Mi- crosoft, IBM, Teradata, Intel and many others. Hadoop has proven a robust way to process big data; its ecosystem of complementary technologies is growing every day. But there s a bad side to Hadoop s disruption. First, its very success is causing many players to jump in, which increases the confu- sion and pace of change in the technology. The current state of Hadoop is in radical flux. Every part of the ecosystem is undergoing both rapid accel- eration experimentation. Furthermore, parts of the ecosystem are extremely immature. When I tech edited the book, "Professional Hadoop Solutions," I saw firsthand how some newer technologies like Oozie had schemas for configuration files that were very immature and will undergo significant change as they mature. Hadoop 2.0 only came out in 2013 with a new foun- dational layer called YARN. Now there is Hadoop Spark, a more general-purpose parallel computation approach that is faster than and competes with Hadoop MapReduce. It is not unrealistic to say that the tech- nology is experiencing both extreme success and extreme churn simultaneously. Second, there is immatu- rity in terms of features that increase the risk of adoption. As with any other emerging technology, it is best to keep away from the bleeding edge of the technology and stick to the more stable core components. The ugly side of Hadoop s disruption is the technol- ogy overreach fueled by the marketing departments of numerous new entrants to the Hadoop/big data space. Hor- tonworks Inc., which focuses on the support of Hadoop and just received a $100 million investment, recently published a whitepaper titled, "A Modern Data Architecture with Apache Hadoop: The Journey to a Data Lake." The paper makes the case for augmenting your current enterprise data warehouse and data management architecture with a Hadoop installation to create a "data lake." Of course, data lake is a newly minted term that basically promises a single place to store all your data where it can be analyzed by numerous applications at any time. It s a play for "Hadoop Ev- erywhere and Hadoop for ALL DATA." To say this is a bold statement by Hortonworks is being kind. The vision of a data lake is not a bad vision -- a store-everything approach is worthwhile. However, it is wildly unrealistic to say that Hadoop can get you that today. Executing successfully on that vision is a minimum of five years out. On the positive side, let me add that I do believe that Hadoop can achieve this vision if it continues on its current trajectory -- it is just not there today. For example, the Hadoop File System is geared towards extremely large files, which a store-everything ap- proach would not accommo- date. Additionally, Hadoop s analysis features are geared to processing homogeneous data like Web logs, sensor data and clickstream data, which is at odds with the vision of storing everything including a wide variety of heterogeneous formats. A reality check comparing Hadoop s current status for handling data management tasks (outside of its big data realm) to mature data man- agement technologies like ETL and data warehouses can only conclude that hyperbole like the Hortonworks whitepaper is a classic case of technology overreach. So, government IT manag- ers should be wary of hyperbo- le and focus on known success areas to use the right tools for the right challenge. For right now, Hadoop successfully tackles big data. For any other use of Hadoop at this time, your mantra is caveat emptor. • --- Michael C. Daconta is vice president of advanced technol- ogy at InCadence Strategic Solutions and the former metadata program manager for the Homeland Security Department. HADOOP WORKS FOR BIG DATA AND IMPROVES EVERY DAY. NOW HERE'S THE BAD NEWS. REALITY CHECK BY MICHAEL DACONTA The current state of Hadoop is in radical flux. Every part of the ecosystem is undergoing both rapid acceleration and experimentation. GCN MAY 2014 • GCN.COM 13