GOVERNMENT MUST OWN THE PROBLEM OF SUPERCOMPUTING
Commentary from the High-End Crusader
The long-awaited, comprehensive, and tightly argued report "Getting Up to
Speed: The Future of Supercomputing", issued in November by the National
Academies' National Research Council, deserves careful consideration. The
report demonstrates conclusively -- at least to this observer -- that the
government must take primary responsibility for the problem of supercomputing.
This proposal merits the close attention of the high-performance computing
community -- quite apart from the immense scope of what is at stake -- both
because it has the requisite boldness to make a difference to our desperate
plight and because there is sustained opposition to it from three broad
sources: 1) certain retrograde sectors of some federal agencies, notably
within both DOE and DoD, 2) certain computer vendors, who shall remain
nameless, and 3) the administration's Office of Management and Budget (OMB),
which recently emasculated the High-End Computing Revitalization Task Force
In one sense, the central proposal of "Getting Up to Speed" is an immediate
corollary of the core conviction of what may be called the "progressive camp"
in high-end computing, which apparently held a slim voting majority within the
committee that wrote the report. High-end computing progressives share the
well-founded belief that supercomputing is in deep trouble. Three simple
situations illustrate this trouble: 1) supercomputer architectures stagnate as
PC-based clusters dominate, 2) parallel programming languages stagnate as MPI
reigns supreme, and 3) computational-engineering applications stagnate as
inappropriate platforms and programming difficulties cause industry to "think
The committee on the future of supercomputing concluded that strong U.S.
government leadership and bold new government policies are required to meet
obvious national needs for supercomputing given the inevitable technological
consequences of continuing with a status quo in which technology advances are
driven almost exclusively by commercial market forces. The committee noted:
"Several factors have led to the recent reexamination of the rationale for
federal investment in research and development in support of high-performance
computing, including 1) continuing changes in the various component
technologies and their markets, 2) the evolution of the computing market,
particularly the high-end supercomputing segment, 3) experience with several
systems using the clustered processor architecture, and 4) the evolution of
the problems, many of them mission driven, for which supercomputers are used".
The committee's overall recommendation is this: "To meet the current and
future needs of the United States, the government agencies that depend on
supercomputing, together with the U.S. Congress, need to take primary
responsibility for accelerating advances in supercomputing and ensuring that
there are multiple strong domestic suppliers of both hardware and software".
In simple language, the government must own the problem of supercomputing,
just the way, for example, it would need to own the proposed mission to Mars
or the way it currently _does_ own the war in Iraq. Such problems cannot be
left to the private sector.
There are two potential misconceptions here. First, to say that the
government owns, i.e., takes primary responsibility for, the problem of
supercomputing is _not_ to say that the government will fund all the research
and development efforts to advance supercomputing that would not have occurred
in the absence of the new government program. Given the right sort of
government leadership, possibly including tough new legislation, we may
anticipate combined investment by government _and_ industry in funding
supercomputing advances. This is feasible provided the government changes the
vendor incentive space so that computer-industry research and development
decisions are no longer driven quite so exclusively by _current_ commercial
Second, to say that the government sets research priorities is _not_ to say
that the government will back particular technological solutions, at least not
until their merit has been demonstrated by extensive test and evaluation.
Rather, the difficult task of drawing up a (constantly evolving) roadmap for
the future of supercomputing consists of identifing a (constantly evolving)
set of problems that need to be solved in order for supercomputing to advance,
without any a priori bias as to which technological solutions best solve the
identified problems. The government needs to articulate the major roadblocks
that are holding supercomputing back, not specify the detailed solutions. But
this leadership is not for the faint-hearted; the government will need to
repeatedly redefine the nation's supercomputing research priorities. A
muscular approach is required. Also, this is a _permanent_ activist role for
"Getting Up to Speed" is a surprisingly comprehensive report that carefully
explains why the government must assume primary responsibility for the problem
of supercomputing, and then describes how it might go about doing this. In
this article, we will summarize the main ideas, correcting any mis-statements
that may have been slipped into the report by "retrograde forces". The most
glaring mis-statement is the unsubstantiated assertion, often repeated, that
the evolving supercomputer market will necessarily follow a particular
pessimal path. As a side benefit, correcting this flawed prophecy makes
government ownership of the problem of supercomputing infinitely more
Do We Understand Application Diversity?
The committee refers to the main problem early in the executive summary: "The
advances in mainstream computing brought about by improved processor
performance have enabled some former supercomputing needs to be addressed by
clusters of commodity processors. Yet important applications, some vital to
our nation's security, require technology that is only available in the most
advanced custom-built systems. We have been remiss in attending to the
conduct of long-term research and development and to the sustenance of the
industrial capabilities that will also be needed".
The familiar idea here is that some applications cannot be computed on even
large-scale configurations of some high-performance computer architectures
(generally speaking, on loosely-coupled systems). After much debate, there is
reasonable agreement nowadays about the existence of two broad classes of
high-performance computer architectures, which your correspondent refers to as
high-bandwidth and low-bandwidth systems, and also about the existence of at
least some important applications that require high-bandwidth systems.
What is totally absent from the report is any rational calculus to determine
the intrinsic relative weights of potential high-bandwidth and low-bandwidth
applications in general-purpose parallel computing. We all agree that any
casual survey of which applications are being run _today_ would show the clear
numerical dominance of low-bandwidth applications. But few people have asked
whether this dominance is something intrinsic to the nature of general-purpose
parallel computing or rather merely an artifact of the limited-capacity (i.e.,
low-bandwidth) machines that -- today at any rate -- dominate our shop floors.
You cannot deduce what a user community would like to compute from what it
happens to compute.
Statements that low-bandwidth systems, as a general rule, will always satisfy
the vast majority of parallel-computing applications are made repeatedly
throughout "Getting Up to Speed" without any discernible attempt at
substantiation. The authors even think the current majority of low-bandwidth
applications within all supercomputer applications will necessarily increase!
In your correspondent's humble opinion, this characterization of the necessary
evolution of the supercomputing market is a dangerous myth. Certainly, it
can't just be taken for granted, as if its truth were manifest.
Supercomputing is in trouble because a potentially significant fraction of
parallel computing (namely, the set of high-bandwidth applications) risks not
having the high-bandwidth systems it needs and because the current situation -- essentially, the current incentive space for vendors, together with some
remarkably stable nonuniform performance-scaling trends -- in which
technological advances are driven by _perceptions_ of what the commercial
marketplace will reward, seems guaranteed to foreclose any possiblity of
meaningful innovation in supercomputing technology.
We need to understand the extent to which, in supercomputing, the attitude
"The market doesn't want it; I won't offer it" is a self-fulfilling prophecy.
If you only sell low-bandwidth systems, the users who make up the
supercomputer market will make do -- for a while. Indeed, the report shows
that, because of nonuniform performance scaling, the current situation is not
sustainable. So, assuming the possibility of government leadership, why not
work now to change the supercomputing market? Vendors might be _amazed_ to
learn what a broad still-to-be-educated market truly wants, if only it had a
better understanding of what genuine supercomputing is and/or thought there
was some chance of getting it.
We can easily explain why system 's' does not compute application 'a' to the
satisfaction of user community 'u' (typically, this occurs if 'a' suffers from
_latency disease_ when it runs on 's'). The latency to access local memory
through local interconnect is quite large when measured in processor cycles.
The latency to access global memory through system interconnect is
considerably larger (for decent-size configurations, anyway). Now, it may be
that 'a' cannot be localized on 's' with the result that there is significant
short-range or long-range communication. It may also be that the required
communication cannot be parallelized on 's' with the result that some critical
processing resources lie fallow while waiting for high-latency communication
operations to complete. This is latency disease.
Like communication, synchronization is another source of latency disease. This
is obvious. Efficient parallel computing requires that large numbers of
parallel activities can share data well, i.e., cheaply, and can also
synchronize well. (No machine-wide barrier synchronizations, please!). Given
sufficient task variability, load balancing can be another significant issue.
Bandwidth is the starting point for solving any of these problems.
Moreover, it may be that it is hard to program application 'a' on system 's'
with the result that considerable time is spent getting 'a' up and running.
Fragmented memory, i.e., the programming model used in MPI message passing, is
the commonest cause of programming-difficulty disease. Since _time to
solution_ is the sum of programming time and execution time, it may be that
the utility function of user community 'u' assigns little value to a solution
obtained after such a long time. It may even be that 'a' cannot be computed
at all on 's' (i.e., the utility function assigns a value of zero to the
solution). The last two are examples of time-to-solution disease.
Do Commodity Processors Have A Special Character?
Commodity processors, by definition, are designed for a broad market and are
manufactured in large numbers. At present, because of a particular reading of
the potential commercial market, commodity processors are optimized for
applications that exhibit significant spatial and temporal locality. As a
result, commodity processors have no good mechanisms for increasing the rate
at which operands can be transferred between the processor and either the
local or the global memory. Commodity processors have been optimized to work
well when locality does away with the need for such communication.
Similarly, at present, because of a historical reluctance to rely on locality
for performance, custom processors are optimized for applications where there
is significant local or global communication. Consider communication to local
memory through local interconnect. A system built from custom processors will
provide high bandwidth to local memory -- this is essential -- and will also
provide some parallelism mechanism that sustains high memory-reference
concurrency (many outstanding loads in every cycle) in the face of different
memory-access patterns. Concurrency is necessary to turn potential operand
bandwidth into actual operand bandwidth. Whether this concurrency is provided
by vector processors or multithreaded processors or streaming processors (or
something entirely new) is secondary.
Given an application's need for significant local communication, we must have
both high hardware bandwidth to local memory and high memory-reference
concurrency to local memory to sustain that hardware bandwidth. A system is
_locally balanced_ if it can sustain its local hardware bandwidth.
Now, consider communication to global memory through system interconnect.
Although the abstract performance problems of local and global communication
are identical, in practice the differences can be significant. First, we
would like to provide high bandwidth to global memory. This is certainly
possible in principle. High-speed electrical- and optical-signaling
technologies enable high raw bandwidth to be provided at a reasonable cost.
High-radix routers enable tens of thousands of nodes to be connected with just
a few hops. However, the cost and power of providing bandwidth is decreasing
more slowly than the cost and power of providing logic, with the result that
the total system cost and power budgets of a large high-bandwidth system may
easily be dominated by the cost and power of the system interconnect.
Also, the report claims that "it is prohibitively expensive to provide flat
[network/] memory bandwidth across a supercomputer", with the resulting claimed
need to accept the inevitability of severe bandwidth taper. This assertion
may be criticized as needing _several_ qualifications (e.g., with respect to
the targeted performance regime), but there can be no question that providing
affordable scalable _reasonably uniform_ exceptional global bandwidth in the
system interconnect is a formidable engineering challenge. It is likely that
this problem will always be with us as we move to larger systems and higher
performance regimes over time.
Second, given exceptional global bandwidth in the system interconnect, how can
it be sustained? Consider a large parallel system built using vector
processors. If the system has a global (possibly distributed) shared memory
(i.e., if we are dealing with a true vector multiprocessor such as the Cray
X1), then vector loads can provide the memory-reference concurrency to help
tolerate the latency of global communication. However, if there are vector
SMP nodes in the system that perform global (i.e., inter-node) communication
using MPI (i.e., if we are dealing with a vector multiprocessor multicomputer
such as the Earth Simulator), then the only appreciable memory-reference
concurrency comes from really large messages, if the application allows them.
Any application with significant global communication and small messages
running on such a system will suffer from latency disease.
In the same way, a scalar uniprocessor or multiprocessor multicomputer (i.e.,
a parallel system that performs global communication with MPI) -- even one with
exceptional global bandwidth -- still has some of the performance properties of
a classical MPP. For decent performance, we must still both localize the
computation as much as possible and keep inter-node messages either rare or
large. People seem to forget what made the Cray T3E special: it had decent
bandwidth (for its day) _and_ it had external logic (the E-registers) that
allowed the Alpha processor to have a reasonable number of outstanding loads
(compared to a conventional scalar processor).
Given an application's need for significant global communication, we must have
both high hardware bandwidth to global memory and high memory-reference
concurrency to global memory to sustain that hardware bandwidth. A system is
_globally balanced_ if it can sustain its global hardware bandwidth.
Processors differ in having or not having scalable latency-hiding mechanisms
to sustain reasonable performance on nonlocal applications. This doesn't
concern you if all your applications are completely local. But suppose the
supercomputing market, as a result of government intervention, evolves in a
positive direction, and processors with latency-hiding mechanisms become
attractive to a broader market and are manufactured in larger numbers. Broad
appeal and high volume are what make something a "commodity" component.
Logically speaking, the architectural feature of a processor's being able to
sustain high memory bandwidth is _orthogonal_ to whether that processor has
broad appeal and high volume. Assuming a necessary distinction between the
core execution models of custom and commodity processors makes sense only if
you have rigid (and pessimistic) views of how the supercomputing market must
necessarily evolve, even in the best-case scenario where the government
assumes primary responsibility for the problem of supercomputing.
Your correspondent suggests that a sustainable future for supercomputing will
be further enabled by vigorous efforts to break down the distinctions between
market-driven, low-bandwidth commercial supercomputing and government-driven,
high-bandwidth national-security supercomputing -- between what today we call
commodity and custom supercomputing. (Both scientific and industrial
supercomputing have a broad range of requirements spanning these two
extremes). In short, we seek a diverse supercomputing market, with a balanced
mix of low-bandwidth and high-bandwidth applications in _each_ community,
across the broadest possible range of user communities.
This goal is obtainable for two reasons: 1) forceful government action can
change the supercomputing market for the better, and 2) many scientific and
industrial (and even some traditional commercial) customers will find it
increasingly difficult to meet their supercomputing needs with conventional
systems. We can see this by projecting current technology trends. In a word,
the intense pain felt by the national-security community today will become
much more widely shared as nonuniform technology scaling makes today's
locality, which will not scale as hoped, no longer a trustworthy source of
tomorrow's needed performance.
To repeat, the dichotomy between market-driven commercial computing and
government-driven national-security computing is not set in stone, but rather
is a function of the value placed on high-bandwidth computing by the broader
supercomputer market. This valuation itself is not set in stone or governed
by necessary rules, but rather can be modified over time by appropriate
government intervention, possibly including legislation that constrains what
computer vendors must do. Moreover, the current addiction to locality as the
only source of performance will be severely tested -- as we move forward -- by
nonuniform technology scaling, even by users who are complacent today ("If it
ain't broke, don't fix it").
What Do Current Trends Portend?
Most trends in high-performance computing are consequences of nonuniform
performance scaling among various components. The NRC report notes: "In
particular, the arithmetic performance increases much faster than the local
[or] global bandwidth of the system". Both local and global latency, "when
expressed in terms of the instructions [that could be] executed in the time it
takes to communicate to local [or global] memory", are increasing rapidly.
Nonuniform scaling of technology "poses a number of challenges for
supercomputer architecture, particularly for those applications that demand
high local or global bandwidth". In particular, what is tolerable today may
not be tolerable tomorrow. "For example, if processor speed increases but
[system] interconnect is not improved, then global communication may become a
bottleneck. At some point, parametric [i.e., just letting technology scaling
happen,] evolution breaks down and qualitative changes to hardware and
software are needed".
The trends are stark. "The divergence of memory speeds and computation speeds
... will ultimately force an innovation in architecture. By 2010, 170 loads
will need to be in flight at the same time to keep [local-]memory bandwidth
busy while waiting for memory latency, and 1,600 floating-point arithmetic
operations can be performed during this time. By 2020, 780 loads must be in
flight, and 94,000 arithmetic operations can be performed while waiting on
memory. These numbers are not sustainable". Indeed, "it is clear that
systems derived using simple parametric evolution are already greatly strained
and will break down completely by 2020". Your correspondent would modify one
quotation to read: "Changes in 1) processor and system architectures, and in
2) programming languages and systems, are required to hide large amounts of
latency with parallelism and _also_ to enhance the locality of computations".
But are commodity processors and/or interconnects, if market drivers of
technological innovation continue as they are, likely to make great strides in
either latency-hiding and/or locality-enhancing mechanisms? (Nota bene:
architectures underlie languages). If not, an even smaller fraction of all
scientific applications -- compared to the fraction that "gets by" today --
will find systems optimized for low-bandwidth commercial applications suited
to their needs. As the processor-memory performance gap scales, you need to
scale proportionally either the amount of exploitable locality or the ability
to tolerate latency. In the general case, i.e., for something other than an
embarrassly localizable application, you need to do both. A general-purpose
parallel computer must be able to abstract performance from an appropriate mix
of both parallelism and locality to deal with applications of different types.
Scaling locality is the tougher problem. Since we need all the performance
help we can get, design of new parallel programming languages that allow
programmer specification of locality -- without falling into the MPI trap of
forcing the programmer to specify everything -- is required. But the
fundamental optimization decision underlying high-bandwidth systems is still
correct: You should make latency tolerance your performance workhorse and then
exploit whatever locality you can get your hands on.
What The Government Must Do
Observing evolutionary trends that make the status quo unsustainable, the
committee wants the government to _force_ innovation so that scaling can
continue. "The growing gap between processor performance and global bandwidth
and latency is also expected to [require] innovation. By 2010, global
bandwidth will fall to 0.008 words/flop and latency will require 8,700 flops
to cover. These numbers are problematic for all but the most local of
applications. To overcome this global communication gap requires innovation
in architecture to provide more bandwidth and lower latency and in programming
[languages and] systems, and applications, to improve locality".
"Significant investments in both basic and applied research are needed now to
lay the groundwork for the innovations that will be required over the next 15
years to ensure the viability of high-end systems". Even low-end systems
"will eventually run out of steam without such investments".
"Given that leadership in supercomputing is essential to the government, that
supercomputing is expensive, and that market forces alone will not drive
progress in supercomputing-directed technologies, it is the role of government
to ensure that supercomputing appropriate to our needs is available both now
and in the future. That entails both having the necessary activities in place
in an ongoing fashion and providing the funding to support those activities".
"Progress in supercomputing depends critically on a sustained investment by
the government in basic research, in prototype development, in procurement,
and in ensuring the economic viability of suppliers".
All this goes without saying. The only thing that requires further thought is
how to implement a _supercomputing roadmap_. We quote the pertinent
recommendation: "The government agencies responsible for supercomputing should
underwrite a community effort to develop and maintain a roadmap that
identifies [the] key obstacles and synergies in all of supercomputing".
However, when the committee speculates on some possible outcomes of the
roadmap process, their thinking is dreadfully conventional. A clear roadmap
is required to anchor any integrated plan for federal investment. It must
identify the roadblocks rather than the solutions. Surprisingly, the
committee's roadmap speculations come dangerously close to specifying the
This is nonsense. The critical first sentence of the roadmap should read: "In
order to level the playing field, we hereby declare our interest in any
innovative technological solution to any of the following fundamental
problems, which we have identified as major roadblocks that threaten the
continued viability of supercomputing". For example, we are obligated to
articulate the basic communication and synchronization problems that must be
solved for supercomputing to advance. However, we must _not_ specify in the
roadmap any of the solutions, such as: "We probably need full-custom systems
for these applications". That may be true but it is not the purpose of a
roadmap. Rather, we identify and prioritize fundamental problems,
requirements, and objectives, all the while keeping an open mind as to what
the solutions might be. DARPA's HPCS program has been very good in doing
A supercomputing roadmap for 2005 might include the following problems: How do
we increase local and global bandwidth? How do we increase processor and
system parallelism? How do we increase processor and system locality? How do
we design parallelism and locality mechanisms that do not interfere with one
another? How do we program our high-performance machines? What programming
model is required to increase programmer productivity? What is an integrated
solution to both latency disease -- which includes locality enhancement as a
subproblem -- and programming-difficulty disease? Of course, as the name
implies, a roadmap is more than just a list of problems. (See the full NRC
report for details).
When DARPA was funding basic research in parallel computing in the 1980s and
1990s, it certainly set research priorities (and dismissed some lines of
investigation as fruitless), but it also leveled the playing field in that it
allowed many different ideas to compete. We have to get DARPA back into the
game of supporting basic research in supercomputing. The HPCS program is
excellent, but its goal is not to provide broad support for basic research in
supercomputing. Of course, the current HPCS teams are engaged in at least
some research activity. Still, your correspondent wishes that _more_ basic
research could be funded within the confines of the HPCS program; it would
give us something now.
But we also need something like HECRTF's Joint Program Office -- to use the
original IHEC term -- to assume primary responsibilty for developing and
maintaining the supercomputing roadmap. This may be a hard task but without a
set of fundamental supercomputing problems agreed upon by all the relevant
federal agencies, there can be no integrated federal investment plan to solve
them. The difficulty today is that, when people focus on supercomputing at
all, they address problems that are without interest -- for the most part.
The committee is basically correct in their recommendation that the government
must assume primary responsibility for the problem of supercomputing. Their
task in writing this report was essentially to marshal irrefutable arguments
to rebut those groups who -- either through regrettable ignorance or through
masked commercial interest -- deny that supercomputing is in deep trouble.
Breaking the impasse in this _ideological war_ is a key prerequisite for
getting both government leadership and sustained federal funding.
Reduced to its simplest terms, the committee's argument runs as follows: 1)
The government has an irreducible core of high-bandwidth national-security
applications that it must protect. 2) While the majority of computer vendors
offer low-bandwidth, locality-dependent supercomputers, only a few offer high-
bandwidth, parallelism-dependent supercomputers -- because current commercial
market forces drive most computer vendors' decisions _not_ to invest in
supercomputing-directed research and development. 3) Basic and applied
research in supercomputer-enabling technologies is at a historic low; this
research shortfall puts the long-term viability of both high-end and low-end
parallel systems at risk. High-bandwidth systems are (currently) at greatest
risk because the (current) market for such systems is too small. 4) Simple
extrapolation of the observed nonuniform performance scaling across distinct
component technologies demonstrates that the status quo is not sustainable:
absent significant innovations in processor architecture, system-interconnect
component technologies and topologies, system software, programming languages,
etc., etc., "parametric evolution of [conventional computer systems] is
unsustainable, and current machines have already moved into a problematic
region of the design space".
What needs to be added to the committee's report? Basically, the argument
from nonuniform performance scaling contradicts the committee's prophecy that
low-bandwidth applications will inevitably dominate the supercomputing market,
now and forever. Consider that national-security computing is similar to
emergency surgery while scientific and industrial computing is similar to
elective surgery. Elective supercomputing means that most user communities
downsize their computing goals and objectives to match the systems they can
afford to procure and program.
Now, consider a scientific application that is just "getting by". That is to
say, by careful programming, sufficient spatial and temporal locality has been
obtained so that the absence or weakness of latency-hiding mechanisms does not
cause the application to suffer overmuch from latency disease.
But let the so-called "processor-memory performance gap" (where is the
interconnect in this phrase?) grow sufficiently and suddenly the scientific
application is no longer getting by at all. In fact, either scaling of the
application or scaling of the technology tends to increase the relative
performance dependence on parallelism rather than on locality (because, in
general, heroic locality scaling is impossible).
What this means is that, over time, an application that was (barely) suited to
a low-bandwidth, locality-dependent system becomes _more suited_ to a high-
bandwidth, parallelism-dependent system (or at least to a machine that has
more of the attributes of such a system).
Your correspondent sees the possibility of a virtuous circle here. Government
leadership can ensure the availability of high-bandwidth machines, simply by
ensuring that sufficient research and development is done on their required
component technologies. This makes it possible in principle to increase the
supply of high-bandwidth systems. Because of nonuniform performance scaling,
low-bandwidth applications inevitably change over time to take on more of a
high-bandwidth character. Any increased supply of high-bandwidth machines
would cause at least some user communities to dust off their postponed
(possibly unwritten) high-bandwidth applications, which they had never dreamed
of being able to run. The rational economics here is not price/performance
but rather total cost of ownership measured against the increased utility of
faster solutions (or solutions obtained for the first time).
Low-bandwidth systems evolve to take on more of a high-bandwidth character as
the supercomputer market shifts. Finally, even some traditional commercial
user communities imagine and exploit high-bandwidth applications to add value
to their companies. What began as a federal initiative, with planning,
funding, and legislation, gains traction in the private sector as more and
more user communities see definite value in high-bandwidth systems.
Supercomputing for the few has become supercomputing for the many.
There is a third potential misconception here. If supercomputing isn't over
today, it won't be over in 15 years, or after _any_ finite interval of time.
Supercomputer design, indeed, all of computer architecture, is a never-ending
story, not of incremental improvements but of major innovations. There is no
point in achieving healthy supercomputer diversity -- in contrast to the
uniformly stagnant status quo -- by precious government intervention only to
sink back into a new unity (i.e., a new complacency). But what might easily
happen is that today's exceptional supercomputing becomes tomorrow's ordinary
supercomputing. It is foolish to prophesy that high-parallelism processors
will never move into mid-range systems (or even low-end systems). However, if
the distinction between (current) high-bandwidth systems and (current) low-
bandwidth systems begins to soften -- if this means anything more than raising
the bar for what constitutes a high-bandwidth system, then we will need to
reinvent this distinction (or some other).
Enough of this idealism. Take home the minimal message: if the government
takes charge of the problem of supercomputing, it could do good things.
The High-End Crusader, a noted expert in high-performance computing and
communications, shall remain anonymous. He alone bears responsibility for
these commentaries. Replies are welcome and may be sent to HPCwire editor Tim
Curns at email@example.com.