
Features:
MAXIMIZING PERFORMANCE WITH INTER-PROCEDURAL OPTIMIZATION
by Dr. Fred Chow, Director of Compiler Engineering, PathScale
Developers are constantly seeking new ways to improve application performance.
While architecture and algorithms are important factors, development tools can
also have a significant impact on application performance. Traditional
compilers perform separate compilation on each program source file, forcing
them to make worst-case assumptions that result in poor application
performance. Developers need more sophisticated tools that can take advantage
of all program information to better apply analysis and optimization
techniques and create high performance applications.
The PathScale EKO Compiler Suite is a family of compilers for the AMD64
processor family that provides superior application performance for AMD
Opteron-based systems. It utilizes advanced processor features, including
complex addressing modes, large register sets, efficient parameter passing,
and SSE2 support to generate code.
The PathScale compilers provide 100 percent binary compatibility, with the
ability to mix and match the linking of GNU and PathScale compiled libraries
and objects. The front ends are source compatible with the GNU compiler suite
for C/C++ , while the FORTRAN 95 compiler provides support for the most common
Cray/SGI extensions. In addition, the PathScale EKO Compiler Suite is
available in installable Linux RPM format, and is verified on SuSE, Red Hat,
and Fedora Linux distributions.
Key features of the PathScale EKO Compiler Suite include:
- C, C++, and Fortran 77/90/95 compilers.
- Industry-leading optimizations.
- Complete support for 32-bit & 64-bit compilation.
- Code generation for AMD64 ABI and AMD Opteron.
- Compatibility with GNU/gcc tool chain and debuggers.
- Tested with popular third-party debuggers.
Inter-Procedural Analysis
Software applications are typically comprised of multiple source files that
are compiled separately and linked together to create an executable program.
This traditional approach, known as separate compilation, can limit most
commercial compilers. Incomplete program information is available during
compilation, forcing compilers to make worst-case assumptions about programs
that access external data or call external functions. PathScale EKO Compilers
perform whole program optimizations, enabling them to make intelligent
decisions on where and how to perform various optimizations. By collecting
information over the entire program, the PathScale compilers can make accurate
decisions regarding the applicability and safety of many optimization
techniques.
PathScale EKO compilers use inter-procedural analysis to link all source files
together early in the compilation process -- before most optimization and code
generation is performed. Using intermediate representation files, the
PathScale compilers perform inter-procedural analysis on the entire program,
invoke the back-end of the compilers to optimize and generate object code, and
finally invoke the standard linker to produce the final executable.
Analysis and Optimization
Inter-procedural analysis occurs in two steps: analysis and optimization. In
the analysis stage, information is collected for the whole program.
Application source code is analyzed on several levels, and the information
gathered is used during the optimization stage to make intelligent decisions
on how best to transform the application source code and achieve the best
possible performance.
The first step in analyzing application source code is understanding how
different functions relate to one another. To help this effort, the PathScale
compilers first construct a program call graph -- a representation of all
functions and their caller/callee relationship. Using the call graph, the
compiler determines whether each program variable is modified or referenced
inside a function call. Alias information is generated for every variable
whose address is taken, enabling pointer accesses to be optimized more
aggressively, minimizing their impact on application performance. All variable
information gathered during the analysis phase is stored, enabling the back-
end of the compiler to perform additional optimization steps.
Once the application source is analyzed, several types of optimizations are
performed automatically by the PathScale compilers:
Inlining. Perhaps the most important inter-procedural analysis performed,
inlining replaces calls to a function with the body of the function. Function
call overhead is eliminated, and the back-end phases of the compiler are able
to work on larger sections of code, potentially enabling the compiler to take
advantage of other optimizations that would have been impossible when less
code was available. For example, inlining may result in the formation of a
loop nest that enables aggressive loop transformations. Inlining is performed
with great care in the PathScale compilers in an effort to ensure performance
degradation does not result. Large function and program sizes can cause higher
instruction cache misses, run out of registers, use memory too frequently, or
slow down later stages of the compilation process. Because the PathScale
compilers take a whole program compilation approach, they are able to inline
any function into any other function, even if they are not located in the same
source file. As a result, the PathScale compilers are better able to perform
inlining more aggressively than traditional compilers.
Constant propagation and function cloning. Many function calls pass
constants, including variable addresses, as parameters. Replacing a formal
parameter with a known constant value creates opportunities for optimization.
For example, portions of a function often become unreachable and can be
deleted as dead code. Constants are exploited in the PathScale compilers in
two ways. First, if all calls to a function use a given constant, constant
propagation is performed, modifying the function without increasing program
size. Second, if a function is called with several constant parameters,
function cloning is performed to create alternate functions that use
customized parameters.
Dead variable elimination. Dead variable elimination removes all global
variables that are never used, as well as the code that updates them, thereby
speeding execution.
Dead function elimination. Dead functions -- functions that are never used
-- can result from inlining and cloning techniques, or from continual program
modification during the development process. Dead function elimination removes
such functions, saving valuable space, and reducing memory and cache
consumption.
Common padding. Common padding improves array alignments in FORTRAN common
blocks. Unlike traditional compilers that are unable to coordinate changes to
the layout of user variables in a common block, the PathScale compilers can
take advantage of all subroutines being available. By implementing common
padding, array alignments can be improved, enabling arrays to be vectorized
and accessed more efficiently and potentially reducing cache conflicts,
improving performance. A similar technique can rearrange C and C++ structures.
Common block splitting. Common block splitting divides a FORTRAN common
block into smaller pieces, reducing data cache conflicts during program
execution.
Procedure re-ordering. Procedure reordering organizes functions based on
their call relationship, potentially reducing instruction cache thrashing
during program execution.
Developers only need to compile and link with the -ipa option to begin taking
advantage of the performance benefits of inter-procedural analysis and
optimization. When the -ipa option is specified, the PathScale compilers
compile program code and link their intermediate form, perform inter-
procedural analysis and optimization, generate object code, and invoke the
standard linker to produce the executable -- all in a single, easy step.
Tuning for Maximum Performance
Developers may wish to tune applications for peak performance. To aid this
effort, the PathScale EKO compilers provide several options that can further
improve application performance. See
http://www.pathscale.com/whitepapers.html for more a complete copy of this
white paper.
|