by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
GCN : February 2013
GCN FEBRUARY 2013 • GCN.COM 29 sift through millions of documents. So far, ORNL has licensed Piranha to two companies, Potok said. Pro2Serve, a Knoxville, Tenn.-based provider of technical and engineering services for critical infrastructure protection, will incorporate the software in the services it offers government agencies. TextOre, based in Fairfax, Va., is incorporating Piranha into the company's suite of busi- ness analytical software and services to help analyze text data with greater speed and accuracy. Analysts using Piranha can select a document and quickly find other docu- ments that are a close match. If they se- lect an e-mail message of interest, clus- tering allows them to quickly find similar e-mails on other computers, thus poten- tially establishing a link. Piranha also lets analysts perform document sampling. A set of documents typically will contain common themes or topics. Representative themes from these documents can be quickly found. A hard drive may store thousands of docu- ments across many different topics, from finances to favorite restaurants. Ten or 20 representative documents from these themes can be found and used by an ana- lyst to determine what they mean. Piranha has a "recommender" capa- bility that lets users filter documents related to the subject they are research- ing. These documents can form the basis for searching for related ones and help reduce the number of documents that must be sifted through, Potok said. Then analysts can begin to determine how the documents are related. "If I have to put these documents into folders and group them, how would they be grouped?" he said. Users can start looking at the entities and words within the documents for connections. "What we do is go from millions of documents down to very relevant case information or intelligence and say, 'This is what I can act on immediately,'" Potok said. "This whole process now takes a matter of days instead of months, which is typical." Piranha was built to run on everything from a standalone PC to an entire cloud-based network housing millions of documents. To test the system, we used a single PC and the free trial ver- sion of Piranha available from the Oak Ridge website, which behaves just like the full version with the exception of being limited to just 128 documents. For our purposes, we used articles in GCN and other publications written by the same author. Loading the documents into Pira- nha was relatively quick. The program was recently updated to work with Microsoft Word files, XML files and most word processors and office document formats. It can also work with plain text files. For the time being, Piranha can analyze only text that appears on the screen --- the information a user would see when he opens a file. However, Oak Ridge is preparing a forensics ver- sion of the software that could look at metadata contained within files, according to Dr. Robert M. Patton, of the Computational Data Analytics team at Oak Ridge National Labora- tory. This metadata --- such as who authored the documents, who edited them, and when they were modified or created --- could prove invaluable for a researcher or investigator. The current version of Piranha requires a user to enter search terms into the program before the analysis takes place. That means anyone who doesn t know what he s looking for could miss important evidence. Piranha is a work in progress, and the researchers at Oak Ridge have already begun to tackle that aspect of Piranha with a new assisting program called Raptor. Already working in a test environ- ment, Raptor will allow an investigator to ask, "What do all these documents have in common?" Raptor will then return answers in the form of sug- gested search terms and datasets of documents within the larger group. It could learn, for example, that several of the documents seem to talk about a domestic terrorist attack, and several more contain informational facts about, say, Grand Central Station. At that point the investigator would be given suggested search terms that she could bring back to Piranha, but would also be directed to certain docu- ments within the larger set that could be read for additional information. The connection between plans for a general attack and a location could be gleaned from their hiding spots within the huge datasets. So instead of trying to sift through thousands or millions of documents, an impossible task for one person, an investigator might be directed to read 100 files with a common theme. That would allow a further refining of keywords that are pertinent to the investigation. Piranha will ingest Raptor Right now, Raptor is a separate pro- gram from Piranha. But Patton said that will soon change. Plans call for making Raptor into a module within Piranha, a move that would vastly improve the original program for use with large datasets. As powerful as Piranha is, the current version is a work in progress. Successfully whittling searches to nar- rower and more useful results is a skill that needs to be honed, at least until Raptor o cially comes along to make this process more intuitive. Therefore, for our smaller set of documents, we had to come up with our own search terms. But since we were familiar with the documents in question, it wasn t hard to find key- words that the software could sink its teeth into. And we were surprised with some of the results. Piranha was able to pick up patterns within the documents that would not be immediately obvious. For example, we found that the tech writer penned several stories about tablet computers, increasing in number each month to 30 articles from January to September. But then in October, the reporter sud- denly stopped writing about tablets altogether, and from October through December, the term appeared only once. An investigator could conclude that something major had happened in October. In a set of 1,000 documents, such a pattern would be di cult to discover without Piranha. With the software, it s as obvious as if it had been painted in bright red letters. Where Piranha could really shine is on huge datasets residing on multiple servers or even on millions of potential documents. In that case, it would be humanly impossible to find connec- tions without help. At the desktop level, a set of skilled queries using the software can save time. In the cloud, it would make the impossible actually possible. -- John Breeden II