OSCARS Extends JGI Network Capacity

In February 2010, the Joint Genome Institute (JGI) at Lawrence Berkeley National Laboratory (Berkeley Lab) had an immediate need for increased computing resources. Meanwhile, twenty miles away the Magellan Cloud computing cluster located at National Energy Research Scientific Computing (NERSC) had enough capacity to accommodate this urgent request. The Energy Sciences Network (ESnet) already connected the two facilities via high-speed circuits, but exactly how the DOE networking community could quickly supply resources to the aid of one of their members was not so obvious.

Brent Draney – the PI for the Magellan Project – spent a day at JGI brainstorming with system administrators and scientists on how NERSC and Magellan could best assist JGI. “We put everything on the table including the possibility of moving racks of nodes to JGI if it would help,” said Draney. “Openly discussing the immediate needs and constraints was critical to developing a workable plan. I don’t think I’ve ever seen three National User Facilities (NERSC, ESnet and JGI) come together so quickly and effectively to solve a problem before.”

Finding a Virtual Path Forward

The challenge quickly became apparent: How could NERSC give JGI access to available resources without JGI having to change to an unfamiliar computing platform? The easiest path forward was for NERSC to load the cluster on a truck and take it to JGI, while the fastest was for NERSC to create logins for JGI on its system. Neither solution was ideal. That’s when ESnet said it could bridge the gap by exporting JGI’s private network to NERSC.  Using virtual circuit technology, ESnet would build a separate networking infrastructure that, for all intents and purposes, would appear part of JGI.  The decision was quickly made and, despite being in the midst of acceptance testing, the Magellan team allocated 120 nodes to JGI.

Working against a tight deadline, technical staff at both centers collaborated with ESnet engineers to establish a dedicated 9 Gbps virtual circuit between JGI and NERSC's Magellan system over ESnet's Science Data Network (SDN). Using the ESnet-developed On-Demand Secure Circuits and Advance Reservation System (OSCARS), the virtual circuit was set up within an hour after the last details were finalized.

NERSC raided its closet spares for enough networking components to construct a JGI@NERSC local area network and migrated a block of Magellan cores over to JGI control.  This allowed NERSC and JGI staff to spend the next 24 hours configuring hundreds of processor cores on the Magellan system to mimic the computing environment of JGI's local compute clusters.

“We created a virtual layer 2 circuit which put both the NERSC cluster compute resources and the JGI data resources in the same Ethernet broadcast domain,” said Eli Dart, ESnet network engineer. “This allowed the NERSC cluster nodes to boot from JGI’s boot servers and mount the JGI file systems directly. The OSCARS automation tool allowed us to quickly set things up, without us having to log in a bunch of routers the way we used to.”

NERSC and JGI have now expanded their use of SDN to a total of three virtual circuits.  These new network services allow JGI researchers around the world to access increased computational capacity without changing their software or workflow.  JGI users still work as they always did: logging on to the Institute's network and submitting scientific computing jobs to batch queues managed by hardware located in Walnut Creek. But when the jobs reach the front of the queue, the information travels 20 miles directly to NERSC's Magellan system in Oakland, CA on reserved bandwidth.  Once a job concludes, the results are sent back to Walnut Creek to be saved on file systems at JGI.

"This solution would not be possible without a reliable, high-bandwidth research network like ESnet's SDN.” said Draney. “Although NERSC and JGI are miles apart, Magellan is linked to JGI's infrastructure on a single, secure, dedicated network connection. The information travels as if these systems are in the same room and connected by a local area network."

Lessons Learned

 “While JGI, NERSC and ESnet came together in a remarkable fashion to meet JGI needs there were some things that we would want to do differently next time,” commented Draney. “It would be better if we could logically change the network at NERSC to import JGI’s private network instead of having to build a dedicated LAN. Also, having a common control/monitoring infrastructure that grants all parties visibility into the network would help in resolving the final details and tuning the network. This is not how we would want to meet every challenge, but it gave us great insight into how we can adapt to serve our scientific customers better and how cloud computing and the network can be leveraged.”      

This ability to provision computing resources as needed can be used repeatedly to meet the on-demand computing demands of the research community. “We want people to start thinking about what the network can do for them,” said Dart. “The whole idea is that the network can now be a service interface. A service interface increases the network scientific utility.”