
Features:
HEC ARTICLE OFFERS MORE QUESTIONS THAN ANSWERS
LETTER TO THE EDITOR
A reader writes in to ask the High-End Crusader some questions regarding the
article in the October 08 issue of HPCwire. (108516.html)
Dear HEC,
1) I am intrigued by your three-dimensional locality phase space with
dimensions of reference distance, temporal locality, and spatial locality. I'm
not certain I understand this taxonomy; is the first dimension dealing with
the distinction between operands being resident in local versus non-local
memory, cache versus non-cache memory, or some combination of these?
I'm not sure how to voice my objection to your dimensions. Maybe it's because,
as you say, they are not orthogonal. Perhaps an orthogonal set could be
arrived at (there's probably little hope for a minimum spanning set...). Maybe
it's because they seem to encourage the idea of (re)structuring programs and
arbitrarily positioning them in the phase space. I opine that given an
arbitrary amount of effort, anyone could get a program to run arbitrarily
close to peak on any architecture, regardless of the inherent characteristics
of the problem solved by the program. I don't believe this is what we should
be trying to achieve. It seems better to use metrics or categorizations that
are more absolute in some sense - perhaps more permanently definitive of a
program or algorithm's "location" in some space.
2) I once heard of a different three-dimensional characterization of parallel
programs by way of their access patterns, using the dimensions (a) local
versus global, (b) regular versus irregular, and (c) static versus dynamic.
While they, too, are dependent on decomposition, they seem to be closer to
orthogonal than yours, at least to my weak understanding. Would you care to
compare or contrast these with yours?
3) There is a social or psychological tendency among the sorts of people that
become expert programmers to dwell on details, and this can lead them to never
become satisfied with a program. This is the categorical two-edged sword of
our field. There's always something that can be done to "improve" code or code
performance. If our tools (computers, languages) or our problems (programs,
algorithms) are complex, we can achieve deep expertise in them, and derive
great satisfaction from striving for optimization within the complex spaces
they inhabit. But this can easily become irrational. Why spend ten years
optimizing a program for an architecture if the architecture will become
obsolete in three and be replaced in six, if the program itself will be
rewritten to admit new science in eight, or if the program's behavior differs
so widely across the space of its data sets that performance improvement for
one problem degrades performance for another? It's great to have experts
handy, ready to jump on hard problems and solve them effectively, but it makes
sense to build better tools to apply to these solutions.
4) Whatever set of dimensions seem best, it would be good to have a set of
programs that exercise in them, separately and in combinations. This shouldn't
be too hard for totally synthetic benchmarks, especially if all we're looking
at is memory and interconnect behavior. Certainly, we should construct them
relevant to the dimensions we wish to measure, rather than rely on arguably
irrelevant ones merely because they already exist. I'm referring here to
something more "real" than linpack or the Livermore Loops (but more in the
sprit of the latter than the former), and less "real" than the NBP suite. In
other words, not real problems solved by real programs (because there's always
a way to tweak a program to solve its problem better on a given computer), but
rather something simpler that exposes and measures architectural
characteristics by easily and intuitively understandable code behaviors. I see
such dimensional benchmarks as being easier to debate and reason formally
about, and easier to achieve consensus agreement about.
Thank you for your efforts to spark discourse on these important matters.
T.M. DeBoni
Oakland, CA
|