by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
GCN : November 2012
22 GCN NOVEMBER 2012 • GCN.COM BIG DATA it be in the headlines certainly wouldn't help the gov- ernment's position with the public." FILTERING AND SORTING While the processing power and fast storage retrieval that has driven other advances in big data manage- ment were required for text analytics to take off, the analytic tools themselves are strictly software. And what separates the products -- whether it is major of- ferings from IBM or SAS, or any of the dozens of appli- cations tailored to serve narrower markets -- is how the algorithms are tuned to filter, sort and analyze massive amounts of unstructured text. The devil, as they say, is in the details. "You need to look at each of the use cases and un- derstand what the complexities are," said Popkin. "If you're just trying to identify entities within a docu- ment set, that can be handled fairly easily without too much tuning. If you're looking for nuanced sentiment around a highly technical set of questions then you're either going to have to build the linguistic models and custom taxonomies to support those models or you may have to do a lot of careful training of a machine learning algorithm on a set of documents to be able to get the results that can be trusted." In general terms, text analytics involves the struc- turing of text from unstructured sources using various techniques for parsing words or phrases, and for de- tecting patterns and connections in the text. The al- gorithms, in short, contain rules for manipulating the input text. The rules may instruct the program in how to accomplish a variety of analytic tasks, including but not limited to categorizing documents, creating sum- maries, detecting relevance between documents, re- lationship extraction and analyzing the sentiments of those who created the text. Analyzing unstructured text is a much more chal- lenging feat than other types of data analytics because it is open-ended. With structured data, analysts know what to expect and can write rules accordingly: "If the number in column 10 is greater than 50 send the record to collections." Simple enough. But how does an analyst at DHS write a rule that can tell whether a tweet saying, "I bombed last night," is from a terrorist or a self-critical performer? HOW IT WORKS In general, there are two basic kinds of text analytics: nat- ural language processing and statistical pattern analysis. With natural language processing, the software uses complex sets of "if-then" rules specifically written to an- alyze the language models as they are understood by hu- mans. Increasingly, however, NLP is being supplement- CAN TEXT ANALYTICS DETECT BIO-HAZARDS? One of the most ambitious attempts to bring the power of text analytics to bear in the interest of public safety is about to go into field testing. Funded by the Homeland Security Department, the National Collaborative for Bio-Preparedness (NCB-Prepared) is designed to monitor emergency medical services reports, poison center data and a wide array of other data sets, including social media, to detect signs of biological threats. NCB-Prepared is in a demonstration phase of development among primary partners the University of North Carolina at Chapel Hill, North Carolina State University and the SAS Institute. "We also intersect with state agencies and other groups," said Dr. Charles Cairns, chair of emergency medicine at UNC and principal inves- tigator at NCB-Prepared. The overall theme is that we can get data early, data closer to the point of illness or injury, data that represents the earliest signals in a health threat." Already, Cairns says, the project has demonstrated great promise. One of the first things they did was to start looking at emergency medical service records, he said. "Using that approach we were able to detect a gastrointestinal outbreak a full two months before it was recognized by the standard reporting. So the power of this approach was demonstrated by using EMS records." The NCB-Prepared analytic system employs SAS text analytics software running on North Carolina State University's cloud-based Virtual Computing Lab to scan rapidly expanding data sets for patterns that may indicate an emerging threat to public health. The system is designed to be scalable to accommodate both growing data sets and adoption by public health agencies across the country. "We've already expanded to include information from South Caro- lina as well as North Carolina," Cairns said. "And we're looking beyond poison center data and EMS data, to take a look at population data and healthcare infrastructure data. We look at aspects of social media, and we now have some national data sets that start focusing on things like foodborne illness." The project is expected to move into the field testing stage by the end of this year. Cairns said the hope is eventually to have veterinary data, wildlife data, pharmacy retail data and as many data sources as can provide insight into early recognition and better situational awareness. "Currently we're looking at 13 million records, and that number is expanding rapidly," he said. "Frankly, it has been just extraordinarily successful. We've had another 20 data set owners contact us wanting to participate." •