by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
GCN : December 2013
GCN DECEMBER 2013 • GCN.COM 23 GABIT ETHERNET•MAPREDUCE •LOADING•INGEST DING•INGESTION•EXTRA CTION•ACCUMULO LO HADOOP•ETL•N OSQL•TRANSFOR CTION•ACCUM ULO RMATIO N •ANALYTI RM ATIO A SOUND IT FOUNDATION 1 If there is a prerequisite for the successful implementation of big data analytics, it would be an IT infrastructure with the nec- essary bandwidth and storage. The National Oceanic and Atmospheric Administration, for example, built a high- speed gigabit network to give researchers secure access to large volumes of com- plex, high-resolution climate and weath- er images. N-Wave, a 10-gigabit Ethernet wide-area network, connects NOAA's high-performance computing sites, data archives and researchers. It lets scientists collaborate and transfer data without net- work constraints. Before N-Wave, NOAA scientists often had to ship hard drives to each other to share data. Now NOAA even has the ability to scale to 100-gigabit Eth- ernet and beyond as research demands increase and next-generation services are added to the network. N-Wave relies on Cisco's Carrier Routing System, a self- healing mesh network with redundant routers, able to reroute traffic if a com- munications link goes down. For agencies that don't want to invest in hardware and other systems for large data workloads, the cloud is an option, said Mark Ryland, chief solutions architect for Amazon Web Services. In fact, the cloud is better suited to the dynamic aspects of big data, he said. The Security and Exchange Commission's Market Information Data Analytics System runs in the AWS cloud. MIDAS, an internal system that gives the SEC information about security orders, every day scoops up 1 billion records that are time-stamped to the microsecond, including tapes and proprietary feeds of each stock exchange, all posted orders and quotes, and trades both on- and off- exchange, according to the SEC. With MIDAS in the cloud, "there is no hardware to support, no software upgrades to maintain, no data feeds to handle, and hence no SEC resources are required for these tasks," Gregg Berman, the SEC's associate director in the Office of Analytics and Research, Division of Trad- ing, said in an address to the SIFMA Tech conference. "If we need to perform a very large analysis, we employ multiple servers and invoke parallel jobs. Access to process- ing power is just not an issue, at least not at present," he said. In the emerging big data ecosystem, storage providers offer the enterprise- ready infrastructure on which all the ana- lytic tools run, said Dale Wickizer, chief technology officer of NetApp U.S. Public Sector. NetApp provides storage for the Energy Department's Sequoia supercom- puter, which generates more than 50 pet- abytes of data. TOOLS FOR DATA INGESTION 2 More than simply gathering and process- ing data for later use or storage, data inges- tion often involves altering individual files by editing their content and/or format- ting them to fit into a larger document so that they can be quickly accessed. And the complexity increases when dealing with streaming data, multiple data sources, for- mats and millions of records. Apache Flume is a distributed system for collecting, aggregating and moving log data from multiple sources and writing it to a centralized data store such as Hadoop Distributed File System (HDFS). Accord- ing to a Dr. Dobb's article, Flume is becom- ing a de facto standard for directing data streams into Hadoop because it is robust and easy to configure. Apache Sqoop is designed to transfer data between Hadoop and relational da- tabases. Sqoop automates the import and export of data from relational databases, enterprise data warehouses and NoSQL systems, according to the Apache Software Foundation. It uses Map Reduce to import and export data, which provides parallel operation as well as fault tolerance. Because agencies need more analytics Big Data. It's massive. It comes in all types of formats. It's dynamic, changing. Government managers are looking to derive value from the mountains of data collected by their agencies to tackle a whole host of issues, including cybersecurity, fraud detection, crime prevention, medical research, weather modeling, intellectual property protection, operation efficiency and situational awareness. A growing challenge is choosing the right technology to aid in collecting, processing, analyzing and storing massive amounts of data, especially since the pool of big data tools keeps expanding. Unfortunately, there are no "must have" tool sets, since an initial big data deploy- ment will be driven by an individual agency's business requirements, according to the TechAmerica Foundation's report, "Demysti- fying Big Data." Tools that ingest and extract data, index it, translate it and then clean it up for analy- sis and presentation are part of the eco- system, said Barbara Toohill, vice president and director of Mitre's Homeland Security Systems Engineering and Development Institute. Agency managers need to under- stand the problem they are trying to solve and then determine what data is needed to solve that problem before investing heav- ily in tools, Toohill advised an audience of government and industry representatives at a recent FCW Executive Briefing on Big Data in Washington, D.C. "One of the challenges is that tools sound great in PowerPoint pre- sentations, but are much more challenging when people start using them," she said. Still, there are core technologies that come in both open-source and proprietary solutions to support any agency's big data portfolio.