Monday, November 19, 2007

Interpreting the 4th fastest supercomputer in the world

In response to my blog entry (and comments about it posted on this blog) about `Eka' making rank 4 in the Top500 list of the world's supercomputers, I got this email from my friend Viral Shah who knows a bit about the field:

Ajay, it seems that the ranking of Eka at number 4 on the Top500 list has resulted in quite a lot of excitement. Hats off to the folks at CRD Labs for achieving the feat of assembling such a large computer in a short amount of time. As some of your readers noted, Eka is a cluster. It is roughly 2,000 nodes, consisting of roughly 15,000 processors and connected by Infiniband. Some readers noted that the benchmark is not representative of real scientific applications. 
Firstly, making a small cluster is quite easy. However, constructing such a large cluster, and operating it is no easy task. It requires some serious skills to administer it, tune the hardware and software for performance, and run scientific applications on it. Second, the Top500 is an interesting benchmark. Sure, it is not representative of a realistic workload, but over the years, the bar has been set quite high. If a general purpose computer does not achieve a good LINPACK score (the top500 benchmark), it is safe to conclude that something is terribly wrong. I am of course excluding special purpose computers that are built to solve specific problems, rather than get a high LINPACK score. 
That said, one needs to think this through clearly. Why was Eka built? To simply show that we can do it, and place a computer in the Top 10 supercomputers? To run specific scientific applications? I am guessing that the answer is "a bit of both". Almost always, it is safe to conclude that the full supercomputer is never used to solve one problem. What are the largest problems that will be run on Eka? What percentage of peak will they achieve? Would it have been a better idea to buy an "off the shelf system" such as the Cray XT4 or the SGI Altix and focus on programmer productivity, instead of getting a high LINPACK score? 
Computers such as Eka achieve extremely high and unrealistic flop rates on the LINPACK benchmark.Typically, they can achieve over 70% of the peak flop rate (Number of floating point operations per second). However, real applications often run at below 5% of the peak flop rate. Let's examine some other possibilities. 
Note that the software industry has been one of India's strong points. It is becoming increasingly clear, that, software is the key. For example, Apple's success with the iphone and ipod have as much to do with well designed software, as with the hardware. If you ask me, the big event at Supercomputing'07 was not that Eka placed at No. 4 on the Top500 list. For me, the most exciting event was one that you will not hear about in media - it has to do with the other part of the HPC Challenge, often called the beauty contest. Instead of asking "which computer can run LINPACK the fastest", it asks, "which programming language implements the benchmarks elegantly". 
The winners of the class II challenge this year were IBM's X10, and Interactive Supercomputing's Python Star-P. For me, the most surprising, and the coolest event was the revelation that some of the compiler work for X10 was done at IBM's research labs in India. This is cutting edge compiler technology, and the fact that part of the team was based in India is a strong statement about HPC innovation in India. 
Not to belittle the effort that went into Eka, but we should be asking the hard questions. I think it is fantastic that we can afford to build a $30 million supercomputer. But how are we going to program it? What applications will run on these large computers? Will we be able to address some important problems such as better meteorological forecasts for our farmers, or better groundwater modeling to solve our long term water problems, or allow our companies to gain that extra edge in the international arena? Are we better off buying computers from those who know how best to make them, and focusing our skill sets on what we are good at - developing software? 
Our universities are indeed not be up to this challenge yet, but, perhaps, we don't need to wait till they catch up. With all the resources available online, a hungry young person can learn this game - the material is all online, after all, like these classes at MIT, at UC Berkeley and at UC Santa Barbara.

No comments:

Post a Comment

Please note: Comments are moderated. Only civilised conversation is permitted on this blog. Criticism is perfectly okay; uncivilised language is not. We delete any comment which is spam, has personal attacks against anyone, or uses foul language. We delete any comment which does not contribute to the intellectual discussion about the blog article in question.

LaTeX mathematics works. This means that if you want to say $10 you have to say \$10.