
LIVEwire News Briefs:
Microway Announces Diagnostic Software For MPI Clusters
Microway showcased MPI Link-Checker at SC2004. This software is a diagnostic
tool that finds underperforming nodes in MPI based clusters. Formerly
available as a Beta release and free download on Microway's website, the new
commercial version contains enhancements that make it possible to detect hard
to find problems in large clusters, as well as look for bad cables,
motherboards, NICs, switches, BIOS's and OS's in real time. A new data
collection facility makes it possible to probe the cluster off line, and then
analyze the data collected at a later date.
Finding a bad node in a large cluster is not a trivial problem. MPI Link-
Checker can collect hundreds of megabytes of performance data on a cluster
over a four day validation burn in. Sifting through this information quickly
requires the right tool. The new release simplifies this problem and also
makes it possible to drill down into the analysis grids generated by large
clusters, dynamically view plots of transfer time and bandwidth versus packet
size for all the nodes in an analysis matrix, reduce analysis time by breaking
large clusters into groups of nodes and select the statistical method used to
view the data. This last feature when combined with off line collection makes
it possible to isolate intermittent problems that have heretofore been
impossible to find.
"MPI Link Checker," commented Stephen Fried, president of Microway, "Is the
first HPC product that makes it possible to diagnose really hard to find
Cluster failures, like intermittent cables that are not properly seated, while
at the same time being able to spot problems in MPI itself that are the result
of inefficient device drivers or the wrong choice of parameters, such as the
transition point between the Eager and Rendezvous protocols. MPI Link-Checker
will become an essential tool for all MPI based Linux clusters."
|