search

The Department of Energy is launching a $32 million program to study how scientific codes can make use of cloud technology.

Called Magellan, the program will be funded by the American Recovery and Reinvestment Act (ARRA), with the money to be split equally between the the two DOE centers that will be conducting the work: the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory and the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory.

One of the major questions the study hopes to answer is how well the DOE’s mid-range scientific workloads match up with various cloud architectures and how those architectures could be optimized for HPC applications.

Sharing supercomputer data centers is not a new concept, of course:

  • Teragrid is the world’s largest distributed cyberinfrastructure for open scientific research. It shares more than 250 teraflops of computing capability with more than 30 petabytes of online and archival data storage. Researchers can access more than 100 discipline-specific databases.
  • Open Science Grid was created in order to facilitate data analysis from the Large Hadron Collider. It is composed of service and resource providers, researchers from universities and national laboratories, as well as computing centers across the United States. Members independently own and manage the resources which make up the distributed facility. OSG is used by scientists and researchers for problems which are too computationally-intensive for a single data center or supercomputer.
  • Enabling Grids for E-sciencE (EGEE) is funded by the European Commission and connects more than 70 institutions in 27 European countries. Their multi-science computing Grid infrastructure allows researchers to share computing resources.
  • Future Grid (below) is a test-bed of geographically distributed heterogeneous computing systems, allowing isolatable, secure experiments. The project partners will integrate existing open-source software packages to create an easy-to-use software environment that supports the instantiation, execution and recording of grid and cloud computing experiments. FutureGrid is headquartered at Indiana University, had its first all-hands meeting 2-3 October.

The entire range of DOE scientific codes will be reviewed in FutureGrid, including energy research, climate modeling, bioinformatics, physics, and other research. But the focus will be on those codes that are typically run on HPC capacity clusters, which represent much of the computing infrastructure at DOE labs today.

In general, explains HPC Wire, codes that require supercomputers like the Cray XT and the IBM Blue Gene are not considered candidates for cloud environments. This is mainly because large-scale supercomputing apps tend to be tightly coupled, relying on high speed inter-node communication and a non-virtualized software stack for maximum performance.

Message Passing Interface (MPI) is a specification for an API that allows many computers to communicate with one another. MPI is the dominant programming model on all large-scale parallel machines, such as IBM Blue Gene/P and Cray XT5, as well as on Linux and Windows clusters of all sizes.

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL is analogous to the open industry standards OpenGL and OpenAL, for 3D graphics and computer audio, respectively. OpenCL extends the power of the GPU beyond graphics.

OpenCL is managed by the non-profit technology consortium Khronos Group. OpenCL is supported by the Mac Snow Leopard operating system (version 10.6). Apple submitted the initial proposal to the Khronos Group, and the cross-platform initiative was subsequently supported by AMD, IBM, Intel and Nvidia, among others.

The Hybrid Multicore Consortium, hopes to make hybrid supercomputing architectures as easy to program and use as monolithic platforms. Jeff Nichols, who runs the Oak Ridge lab, announced the consortium at SC09 in Portland. IBM built the Los Alamos hybrid Opteron-Cell cluster, called Roadrunner, rated just above 1 petaflops on the new Top 500 list.

All of the players in the Hybrid Multicore Consortium are betting that the ramp to exaflops of computing power – that’s 1,000 times the oomph of Roadrunner – is going to require hybrid architectures. “The programming model to get performance on these machines is going to have to be different,” says Nichols. Oak Ridge is jumping on the GPU bandwagon and plans to build a supercomputer based on nVidia’s Fermi GPU. By this time next year, ORNL will officially host three petascale platforms.

Petascale may be coming to a datacenter near you. Massive datacenters are being built along the Columbia River as well as in central Oregon and central Washington including Amazon near Boardman, Google in The Dallas, as well as Yahoo in Quincy and MSN in Wenatchee (in central Washington).

These datacenters are fed by 10 Megawatt substations with cooling water from the Columbia River. They often use Hadoop clusters inside facilities that look like 18 wheeler docking stations. The world’s two largest data center operators – Digital Realty Trust and IBM, standardize their designs around modular systems and repeatable designs.

Nvidia has been awarded a LEED Platinum certification for a new Silicon Valley data center. The facility will use 100 percent outside air for cooling.

At SC09, nVidia was in everything. Nvidia says Tesla server clusters deliver 10 times the performance than CPU-based clusters while consuming less power. Their CUDA parallel computing architecture powers 240 parallel processing cores in each Tesla processor. At the show nVidia announced the Fermi featuring up to 512 CUDA cores. nVidia’s RealityServer (below) brings complex 3D graphics to virtually any netbook or smartphone by crunching numbers on a server.

RealityServer moves the heavy lifting to a back-end, and streams results to virtually any device in real-time. Nvidia’s server manages complex graphics like fluid dynamics, architectural design, and 3D video games. Web services software uses Nvidia’s Tesla, a high-power GPU that contains 240 cores, which are programmed using the CUDA software toolkit.

OpenSimulator, often referred to as OpenSim, is an open source server platform for hosting virtual worlds and is most recognized for compatibility with Second Life. It is also capable of hosting alternative worlds with differing feature sets with multiple protocols, and was talked up by Intel’s Justin Ratner.

As part of the Supercomputing Conference this year ScienceSim made available a virtual world based on the open source package OpenSim. It’s a virtual environment for collaborative visualization and experimentation (pdf). Unlike proprietary virtual world platforms, such as Second Life, ScienceSim leverages open source building blocks (installation utilities, management tools, client viewers, etc.) based on OpenSimulator (OpenSim) software. The Second Life Client is available from Linden Labs. But today you’d be forgiven for asking if Second Life is still going, notes a skeptical BBC.

Intel will launch a new supercomputer optimized version of its “Nehalem-EX” processor in the first half of 2010, and announced that a beta program for their Ct technology which will be available by the end of 2009. Intel claims Xeon is ideal for the cloud.

Intel says it makes parallel programming in the C and C++ languages easier by automatically parallelizing code across multi-core and many-core processors.

Intel would like to see their Xeon with Larrabee co-processor become a dominant High Performance platform. Larrabee (up to 1 TeraFLOPS), essentially an x86 cluster on a chip, is expected to compete with nVidia’s Tesla/Fermi and AMD’s Radeon (up to 2.72 TeraFLOPS) of co-processor power.

An eight-GPU nVidia Fermi box would probably deliver something in the neighborhood of 15 teraflops single precision and half of that in double precision. A whole new wave of cheap, low-power FLOPS is now entering the market at commodity-like prices, notes HPC Wire.

Synthetic Aperture Radar on the Global Hawk UAV tracks cars and missiles with an onboard TeraFlop computer from Mercury Computer Systems. Mercury is also developing cellular multi-user detection with adaptive beamforming. Next generation SensorCraft will require teraflops.

Synthetic imaging uses multiple small spacecraft, operating cooperatively, to synthesize the optical qualities of a much larger single spacecraft while head-mounted multispectral sensors require handheld supercomputers.

Space agencies are exploring Reconfigurable Computing, using FPGAs, such as the new Convey computer (left). The Center for High-Performance Reconfigurable Computing is comprised of more than 30 leading organizations in the field.

While interoperable supercomputer applications running on the cloud are still largely a pipe dream, The Cloud seems destined to enable a new generation of mobile devices running wild with image recognition, voice transcription, medical and geoscience applications — or video games.

Intel CTO, Justin Ratner, said computer games could be the driver advancing the entire HPC industry at his keynote this week at SC09. Intel Software Network TV streamed many of the keynotes.

Gartner estimates cloud services is currently a $46 billion market and will be worth $150 billion by 2013, although that includes Google Ad Words. Perhaps the Supercomputer App Store is next.

Related Dailywireless articles include; Super Computer ‘09 News, World’s Most Expensive Computers, Supercomputer Clouds, Supercomputing Handhelds, Ocean Observatory Gets Funded, Ocean Observatories: The Ultimate Splash Page, Plug and Play Environmental Sensor Nets, Mobile Supercomputing, The Platform, Satellite RFID Tag, Shipboard AIS Gets a Satellite Swarm, Tracking Soldiers, Mapping Relief, Volcano Sensor Net, Remote Ocean Viewer, and Wireless Recon Airplanes.

Something to say?

You must be logged in to post a comment.