by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
GCN : December 2013
GCN DECEMBER 2013 • GCN.COM 25 ingly support integration with Hadoop. Talend provides traditional ETL capabili- ties but also simplifies big data integration. The company's Open Studio for Big Data offers a unified open-source environment that simplifies the loading, extraction, transformation and processing of large and diverse data sets. Pentaho's enterprise Kettle ETL engine -- called Pentaho Data Integration -- consists of a core data integration engine and GUI applications that allow the user to define data integration jobs and transformations. Universal information access is an emerging area of big data that combines elements of database and search technolo- gies, giving users a single point of access to all data, regardless of source, format or location. UIA offers the reporting and visu- alization features commonly found in busi- ness intelligence applications. Attivio's Active Intelligence Engine re- portedly unifies disconnected systems, combining enterprise search, business in- telligence and big data technologies. AIE ingests all types of structured and unstruc- tured content and builds a schema-less index that can be accessed with a single query. Cambridge Semantics' Anzo Unstruc- tured combines data from databases, spreadsheets and documents from any source across the enterprise and automati- cally discovers new relationships between data. It has been said that a picture is worth a thousand words, but when it comes to data analytics just giving users pretty graphs or charts is not enough, according to Francois Ajenstat, director of product management for Tableau Software. Ultimately users need to use the data to answer questions and solve problems, he said. Tableau is a self-service business intelligence tool that lets people of any skill level create data visualizations, reports or dashboards from databases, spreadsheets and big data sources, according to company officials. The Florida Department of Juvenile Jus- tice is using Tableau to present a clearer picture of children in the justice system and the effectiveness of the state's innova- tive reform efforts. Sometimes a mix of data and geospatial analytics can help bring data to life. Us- ing analytics and customer relationship management software, analysts at the Texas Parks and Wildlife Department can pinpoint trends in leisure activities, parks utilization and purchasing patterns, all the way down to different neighborhoods, ac- cording to TPWD officials. Using Business Analytics, a part of Esri's ArcGIS geospatial data analysis tool, TPWD analysts mined Census and ZIP code information and used probability matching to get a better handle on their customers. Using geographic data and SAS Analytics, TPWD has even been able to stock fish in lakes closer to where anglers live or promote special hunts to hunters. The amount of data will only increase with expansion of the Internet of Things, data-driven scientific discovery and ex- plosive growth of video, and agencies are already struggling to keep up. Current success comes from adding more or faster hardware or using tools that scale to high- performance computing datasets. But the cost of processing and storage may be- come prohibitive. One solution may come from the Na- tional Institutes of Health. The agency's Big Data to Knowledge initiative aims to advance the science and utility of big data in biomedical and behavioral research and to create innovative approaches, methods, software and tools for big data. Meanwhile, the National Cancer Insti- tute this year set up pilot projects to test the feasibility of a "cancer knowledge cloud" that would combine storage reposi- tories and computing power in the cloud. But the real next big thing in big data could be an open-source cluster comput- ing system called Spark. It speeds pro- gramming and can run up to 100 times faster than Hadoop MapReduce, accord- ing to its developers. Spark offers a gen- eral execution model that can optimize arbitrary operator graphs and supports in-memory computing, which lets it que- ry data faster than disk-based engines like Hadoop. To make programming faster, Spark provides clean, concise APIs in Sca- la, Java and Python. Spark was originally created at the Uni- versity of California, Berkeley's AMPLab, and more than 25 companies have contrib- uted code to Spark, making it the largest open-source, big data development com- munity, according to Silicon Angle. And it's moving into the mainstream. Recently, Cloudera announced direct support for Apache Spark, giving Cloudera users a way to perform rapid, resilient processing of in- memory datasets stored in Hadoop, as well as general data processing. Although it is an exciting time for big data, agencies should be cautious about the tools they select, said NetApp's Wick- izer. So many companies are popping up with big data offerings, agency managers have to consider whether all will be around in five to 10 years. "Agencies should always have a fallback plan, as well as a robust, underlying in- frastructure that enables them to quickly checkpoint and restart when problems do arise," he said. Agencies can also benefit from big data solutions that use an open ecosystem of partners that ensure a com- plete offering." • and storing massive amounts of data, especially since the pool of big data tools keeps expanding.