by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
GCN : February 2015
26 GCN OCTOBER 2012 • GCN.COM datasets stored in other locations and training biomedical scientists in big data techniques. In October last year it announced grants of nearly $32 million for fiscal 2014 to create 11 centers of excellence for big data computing, a consortium to develop a data discovery index and mea- sures to boost data science training and workforce development. NIH hopes to in- vest a total of $656 million in these proj- ects through 2020. While physical infrastructure for com- putational biomedical research has been growing for many years, the NIH said, as data gets bigger and more widely distrib- uted, “an appropriate virtual infrastruc- ture become vital.” FUNDAMENTAL CHALLENGES There are significant challenges to apply- ing big data to health care, especially with so many legacy datasets to be integrated and shared. Even the use of the term big data can cause confusion. “Within agencies there are different definitions and types of big data,” said Tim Hayes, senior director for customer health solutions at Creative Computing Solutions, Inc., and a former HHS employ- ee who worked on data analytics there. “You need to be sure, when mapping data from one database to another, that you can match various labels that are used. Two different agencies might use the term ‘research,’ for example, but they may not be compatible.” There are “very arcane differences” be- tween what you would assume are fun- damental and consistent definitions that turn out not to be consistent at all, Sivak agreed. “It’s a big problem for sure.” Another barrier is the lack of data sci- entists capable of working with and un- derstanding the needs of data analytics programs. The solution starts with recog- nizing that such people are not IT work- ers, but occupy a niche all their own. “A lot of what they do is not working with technology, but is in understand- Health datasets come in many orders of magnitude, but few are as large as the public health big data being gathered and analyzed by computers at the Energy Department’s Oak Ridge National Lab. About four years ago, the ORNL decided to amass as much public health care data as it could and subject it to the analytics engines of its most powerful computers. “We were in a unique position with our leadership computing resources and data science expertise, and we saw an opportunity to use health data to discover data-driven insights for better health care quality, integrity and policy,” said Sreenivas Sukumar, a researcher in ORNL’s computational sciences division. To analyze the datasets, researchers used the Lab’s multicore Titan, the second- most powerful computer in the world, Apollo, an in-memory Urika graph-computer built by Yarcdata, and distributed cloud computing-based machines. The lab also tapped some of the biggest producers of health related data, including the Cancer Genome Atlas, clinicaltrials.gov, Semantic MEDLINE, openFDA, DocGraph and the National Plan and Provider Enumeration System. In working with the data, the researchers initially encountered computing silos created by existing information architectures that did not scale to the analytics requirements of the large datasets. Consequently, the lab turned to an approach using graph computing, a scalable computing solution capable of uncovering relationships hidden in the data. The graph computing almost immediately provided insights into some of the datasets, including feedback on understanding fraud, waste and abuse within the federal health care system, according to ORNL researchers. In one case, the lab was able to identify a health care provider using multiple identities to bill patients. Another case showed guilt-by -association patterns that highlighted the potential for fraud before the provider began billing. Georgia Tourassi, director of How a computing powerhouse delivers health care insights Vs of big data According to the National Institute of Standards and Technology, big data consists of extensive datasets that require a scalable architecture for efficient storage, manipula- tion, and analysis. Commonly known as the ‘V ’s’ of big data, the characteristics of data that force new architectures include: VOLUME – the size of the dataset at rest, referring to both the data object size and number of data objects. Although big data doesn’t refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data. VELOCITY – the data in motion, or rate of flow, referring to both the acquisition rate and the update rate from real-time sensors, streaming video or financial systems. VARIETY – data at rest from multiple repositories, domains or types (from unstructured text or images to highly structured databases). VARIABILITY – the rate of change of the data from applications that generate a surge in the amount of data arriving in a given amount of time. VERACITY – the completeness and accuracy of the data sources, the provenance of the data, its integrity and its governance. BIG DATA 26 GCN FEBRUARY 2015 • GCN.COM 0215gcn_024-027.indd 26 2/2/15 9:52 AM