by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
GCN : November 2012
24 GCN NOVEMBER 2012 • GCN.COM ed with machine learning techniques that use statistical techniques to analyze bod- ies of text. In fact, the most sophisticated version of natural language processing -- at least the version that is publicly acknowledged -- was the technique used by IBM's Watson supercomputer to easily defeat the top two all-time Jeopardy game champions in a highly publicized contest two years ago. Watson's performance was impressive. Though the computer had to "under- stand" complex and often tricky questions before it could search its data stores for the right answer, the machine managed to earn more than three times as much winnings as its two human competitors. Accordingly to Frank Stein, director of IBM's Analytics Solution Center, the com- pany is repurposing the same technology and refining it for use in other sectors. "The first industry that we are going after is the healthcare industry," Stein said. "The great thing about using a system like Watson is that it has all the knowl- edge you put into it but it doesn't tend to be biased. There was a professor at Columbia medical school who was talk- ing about how they would miss things because of their bias. They might assume that the person is living in New York when they diagnosis a symptom and they don't realize that the person was recently in Mozambique or some other place." "What's powerful about text analyt- ics and sentiment analysis is that it is essentially saying to a computer system, 'go read these books for me and relay the ones that have this in it or that answer this question,'" said Popkin. Popkin warns, however, that text ana- lytics technologies aren't bulletproof. "If you have a pool of un-mined text, you need to start with a hypothesis," said Pop- kin. "What do you think you're going to find? What is it that you are specifically looking for in that text?" In many cases, depending upon the type of text and whose text it is, even the hypothesis may be difficult to shape accu- rately. "Understanding the demograph- ics of the audience that you are mining, understanding the language of the demo- graphic, understanding any underlying biases, understanding the phrases -- there is a whole set of things you need to take into account when you're doing those analyses," said Popkin. "It's easy to do this wrong. I think you need to be careful in using these tools." Accordingly, vendors and analysts agree that whether they reside in-house or out- side, agencies will want to enlist informa- tion or library scientists in designing and refining their text analytics efforts. NEXT STEPS All those involved with text analytics agree that the need for tools to analyze unstructured text is only going to grow. "The government is struggling in all organizations with how to harness big data," said SAS's McNeill. "You don't have to boil the ocean when you have text an- alytics. You can extract just what is rel- evant to begin with and then investigate that for the value." Chris Biow, federal CTO at MarkLogic, a software company that helps organiza- tions manage unstructured information and big data, agrees. "Any agency in the government that deals in any respect with the public should be using text analytics now," he said. "It's maybe only being used now in 20 percent of the cases where it should. It's as broad as treaty compli- ance versus watching public sentiment to- ward the United States overseas to predict a riot. All of that is out there." Unfortunately, the reluctance of orga- nizations to talk about their implementa- tions of text analytics means that there are few case studies to guide those inter- ested in possibly implementing it. "I think a lot of the analytics is in a tool- kit phase," said Gartner analyst Popkin. "Even for companies that are selling pack- ages there is still development integration that is going to be requested for a while. I think we are in the early days. It is really a set of features and functions that I think over time get embedded as part of other systems." Popkin advises agencies considering implementing text analytics to talk first to their existing vendors. "Much of this tech- nology comes from vendors that you're al- ready doing business with," he said. "Be careful about platform prolifera- tion. You probably already have three or four vendors you're doing business with that all could offer you text analytics as part of their existing applications." The most critical thing in initial for- ays into text analytics, warns Biow, is to keep in mind that the machines still aren't nearly as good at analyzing text as humans. "The machine's advantage is that it can do all the text," he said. "You don't have enough human beings to read it all. The machines will make a pass-over and hu- mans can then refine that. The machines are getting better in terms of the complex- ity and detail that they can extract, but not necessarily in terms of the quality." And results can definitely be improved as your users, library scientists and text analytics vendors start working together. "The best practice here," says Biow, "is setting reasonable expectations." • In general terms, text analytics involves the structuring of text from unstruc- tured sources using various techniques for parsing words or phrases, and for detecting patterns and connections in the text. The algorithms contain rules for manipulating the input text. The rules may instruct the program in how to accomplish a variety of ana- lytic tasks, including but not limited to categorizing documents, creating sum- maries, detecting relevance between documents, relationship extraction and analyzing the sentiments of those who created the text. What is text analytics? BIG DATA