by clicking on the page. A slider will appear, allowing you to adjust your zoom level. Return to the original size by clicking on the page again.
the page around when zoomed in by dragging it.
the zoom using the slider on the top right.
by clicking on the zoomed-in page.
by entering text in the search field and click on "In This Issue" or "All Issues" to search the current issue or the archive of back issues respectively.
by clicking on thumbnails to select pages, and then press the print button.
this publication and page.
displays a table of sections with thumbnails and descriptions.
displays thumbnails of every page in the issue. Click on a page to jump.
allows you to browse through every available issue.
GCN : February 2014
NCI plans to fund three different architectures for its cancer research pilot project, with three years of funding for each team. The agency is hoping that these architectures will scale, given that it expects data sets as large as 20 to 50 petabytes by 2019. "The pilot projects will allow us to evaluate with a really big data set what is and what isn't the most effective architecture for doing the kinds of analysis that scientists are interested in doing,'' said Geroge Komatsoulis, director of NCI's Center for Biomedical Informatics. "It's our intention to test these clouds to make sure they meet our performance requirements, but also to throw them open to cancer researchers who can vote with their feet.'' Komatsoulis said a key challenge is creating an ef cient API that preserves the security and integrity of the data. Experts said the data is likely to be encrypted both in transit and at rest, and that authentication and access controls must be applied to it. "We're looking to lower the barrier to entry for scientists,'' Komatsoulis said. "One of the purposes of having a solid API is that it gives us the opportunity to embed the security best practices in all of the programs.'' The compute and storage capabilities required by the NCI Cloud Initiative are available on the market today, said Prof. Jake Yue Chen, associate professor of bioinformatics at Indiana University. But he added that it will be a challenge to integrate these technologies in a massive shared repository that can meet the needs of the biomedical research community. Another issue is standardization of data and frameworks for analyzing data, as there is variability in the terminology that cancer researchers use. "One challenge is usability and user friendliness,'' Subha Madhavan, director of the Center of Biomedical Informatics at Georgetown University Medical Center. "We've got to deliver the information through easy-to-use mobile health and Web-based platforms.'' The path to the cancer cloud of clinical projects produce data sets that are terabytes in scale. So the mindset is changing to bring the tools and expertise to the data. It s a shared computing model that s emerging. Prof. Jake Yue Chen, associate profes- sor of bioinformatics at Indiana Univer- sity, says the NCI Cloud Initiative rep- resents a significant advancement for biomedical research. "This is very profound. It s like putting genomic information that s buried in the ivory towers of a few major research cen- ters and putting it into the hands of every cancer researcher. The more people who can analyze this data, the more insights we are going to get into cancer, Chen said. "I would imagine that orders of magnitude of knowledge would be gen- erated because more researchers will be able to analyze the data. BIOINFORMATICS AS A SERVICE The driver behind the NCI Cloud Initia- tive is the Cancer Genome Atlas, which will provide in-depth data on about 11,000 cancer patients, with an average of 500 gigabytes of data per patient. "We re going to have the DNA se- quence for their tumor and the matched normal control, Komatsoulis explained. "There is RNA sequencing, medical im- ages and clinical data. In addition, there is epigenetics, which are modifications to the DNA itself that impact the way vari- ous genes that exist in these patients get turned on and turned off. By September 2014, we expect to have generated, if not fully received, 2.5 petabytes of data. This is the shape of things to come. The NCI cancer clouds will provide a complete bioinformatics infrastructure as a service, with built-in compute, storage, security and analytics. NCI has not de- fined the cloud-based infrastructure that it wants; instead, it is looking for innova- tive architectures that will meet the needs of the biomedical research community. "We really don t know yet what is the best technology or what is the best way to structure the data so it can be computed on efficiently, Komatsoulis says. "What we re looking for is innovation. This is one of those cases where the government has the opportunity to enable the scien- tific community to innovate to solve an important problem. The NCI Cloud Initiative is part of a trend where biomedical research data is processed in public clouds. For example, the 1000 Genomes database is available via Amazon s Elastic Compute Cloud. Similarly, Georgetown University Lom- bardi Cancer Center is using Amazon Web Services for gene sequencing related to breast and colorectal cancers. "Our IT team is small. It would take years for us to set up an infrastructure to manage terabytes of data. But in a matter of weeks, we can set up our data on Ama- zon s cloud, Madhavan said. "The cloud is a game changer for researchers like ours who want to do big data analysis. Industry analysts said they expect to see more government-sponsored big data projects adopt a cloud infrastructure for compute, storage and analytics. "This sounds like a perfect example where cloud computing is a better ar- rangement given the bandwidth limits associated with downloading large data sets, said Shawn McCarthy, research di- rector at IDC Government Insights. "Put- ting data in a shared resource is becoming more popular because you can standard- ize the data. When everybody builds their own databases, you end up with different APIs and data name fields that are differ- ent. People spend more time normalizing the data than doing their analysis of it. • 30 GCN FEBRUARY 2014 • GCN.COM CASE STUDY CLOUD COMPUTING