Lessons Learned from Magellan

Two years ago, the Department of Energy's Office of Advanced Scientific Computing Research launched the Magellan project, a research and development effort aimed at harnessing cloud computing for the most demanding information processing of the national labs. A distributed testbed infrastructure was deployed at the Argonne Leadership Computing Facility and the National Energy Research Scienti cComputingCenter, then benchmarked for some of the most challenging applications such as a particle physics experiment at the Large Hadron Collider or measuring the expansion of the universe. They also tested Hadoop, MapReduce, and the Hadoop ecosystem on massive scientific problems in the cloud. Their final results, published in December, show both the potential and current limitations of cloud computing for cutting-edge science. The primary appeal of cloud computing for the national labs was flexibility and agility. Through virtualization, researchers could create whatever custom computing environment they need, bring their own software stack, and try out new environments. Resources are also more flexible in the cloud, and researchers enjoyed being able to rapidly scale to a problem and tap into economies of scale for massive data sets and workflows. Another benefit of the cloud for science was that it simplified collaboration, allowing researchers to share software and experiments with their peers. Hadoop and MapReduce showed promise for high-throughput data and very large workloads. Often, the High-Performance Computingsolutions in place at the national labs have scheduling policies that aren't compatible with this type of analysis. Problems with applying deep science to the cloud, however, currently outweigh the benefits for most applications, so the national labs will not be switching over from HPC just yet. Adapting to the cloud, porting applications, and building up infrastructure took considerable time and skill, raising costs. For most applications, which deal with truly massive workloads, have idiosyncratic needs, and are input/output intensive, traditional HPC currently performs better. Cloud worked best for applications that required minimal communication. The research team also had concerns about meeting the specific security and monitoring requirements of the national labs. Price was perhaps the biggest obstacle to implementing a cloud model, as using a commercial cloud would cost between 3 and 7 times as much as the current computing centers, which already pool resources to cut costs. Even switching over to private clouds would exceed a lab's budget. Cloud computing for deep research isn't doomed, however, as almost 40% of scientists would still want a cloud model even if performance suffered. There is also a lot of room for growth in this area, and even during the 2 years of the study, researchers marked dramatic improvements to the open source software powering the cloud such as Hadoop. To move forward, researchers sought improvements to MapReduce to better fit scientific data and workflow as well as ways to bring some of the benefits of the cloud to the traditional HPC platforms the national labs have spent decades perfecting.