HPCwire
 The global publication of record for High Performance Computing / August 27, 2004: Vol. 13, No. 34

  |  Table of Contents  |  

Features:

MAXIMIZING PERFORMANCE WITH INTER-PROCEDURAL OPTIMIZATION
by Dr. Fred Chow, Director of Compiler Engineering, PathScale

Developers are constantly seeking new ways to improve application performance. While architecture and algorithms are important factors, development tools can also have a significant impact on application performance. Traditional compilers perform separate compilation on each program source file, forcing them to make worst-case assumptions that result in poor application performance. Developers need more sophisticated tools that can take advantage of all program information to better apply analysis and optimization techniques and create high performance applications.

The PathScale EKO Compiler Suite is a family of compilers for the AMD64 processor family that provides superior application performance for AMD Opteron-based systems. It utilizes advanced processor features, including complex addressing modes, large register sets, efficient parameter passing, and SSE2 support to generate code.

The PathScale compilers provide 100 percent binary compatibility, with the ability to mix and match the linking of GNU and PathScale compiled libraries and objects. The front ends are source compatible with the GNU compiler suite for C/C++ , while the FORTRAN 95 compiler provides support for the most common Cray/SGI extensions. In addition, the PathScale EKO Compiler Suite is available in installable Linux RPM format, and is verified on SuSE, Red Hat, and Fedora Linux distributions.

Key features of the PathScale EKO Compiler Suite include:

  • C, C++, and Fortran 77/90/95 compilers.
  • Industry-leading optimizations.
  • Complete support for 32-bit & 64-bit compilation.
  • Code generation for AMD64 ABI and AMD Opteron.
  • Compatibility with GNU/gcc tool chain and debuggers.
  • Tested with popular third-party debuggers.

Inter-Procedural Analysis

Software applications are typically comprised of multiple source files that are compiled separately and linked together to create an executable program. This traditional approach, known as separate compilation, can limit most commercial compilers. Incomplete program information is available during compilation, forcing compilers to make worst-case assumptions about programs that access external data or call external functions. PathScale EKO Compilers perform whole program optimizations, enabling them to make intelligent decisions on where and how to perform various optimizations. By collecting information over the entire program, the PathScale compilers can make accurate decisions regarding the applicability and safety of many optimization techniques.

PathScale EKO compilers use inter-procedural analysis to link all source files together early in the compilation process -- before most optimization and code generation is performed. Using intermediate representation files, the PathScale compilers perform inter-procedural analysis on the entire program, invoke the back-end of the compilers to optimize and generate object code, and finally invoke the standard linker to produce the final executable.

Analysis and Optimization

Inter-procedural analysis occurs in two steps: analysis and optimization. In the analysis stage, information is collected for the whole program. Application source code is analyzed on several levels, and the information gathered is used during the optimization stage to make intelligent decisions on how best to transform the application source code and achieve the best possible performance.

The first step in analyzing application source code is understanding how different functions relate to one another. To help this effort, the PathScale compilers first construct a program call graph -- a representation of all functions and their caller/callee relationship. Using the call graph, the compiler determines whether each program variable is modified or referenced inside a function call. Alias information is generated for every variable whose address is taken, enabling pointer accesses to be optimized more aggressively, minimizing their impact on application performance. All variable information gathered during the analysis phase is stored, enabling the back- end of the compiler to perform additional optimization steps.

Once the application source is analyzed, several types of optimizations are performed automatically by the PathScale compilers:

  • Inlining. Perhaps the most important inter-procedural analysis performed, inlining replaces calls to a function with the body of the function. Function call overhead is eliminated, and the back-end phases of the compiler are able to work on larger sections of code, potentially enabling the compiler to take advantage of other optimizations that would have been impossible when less code was available. For example, inlining may result in the formation of a loop nest that enables aggressive loop transformations. Inlining is performed with great care in the PathScale compilers in an effort to ensure performance degradation does not result. Large function and program sizes can cause higher instruction cache misses, run out of registers, use memory too frequently, or slow down later stages of the compilation process. Because the PathScale compilers take a whole program compilation approach, they are able to inline any function into any other function, even if they are not located in the same source file. As a result, the PathScale compilers are better able to perform inlining more aggressively than traditional compilers.

  • Constant propagation and function cloning. Many function calls pass constants, including variable addresses, as parameters. Replacing a formal parameter with a known constant value creates opportunities for optimization. For example, portions of a function often become unreachable and can be deleted as dead code. Constants are exploited in the PathScale compilers in two ways. First, if all calls to a function use a given constant, constant propagation is performed, modifying the function without increasing program size. Second, if a function is called with several constant parameters, function cloning is performed to create alternate functions that use customized parameters.

  • Dead variable elimination. Dead variable elimination removes all global variables that are never used, as well as the code that updates them, thereby speeding execution.

  • Dead function elimination. Dead functions -- functions that are never used -- can result from inlining and cloning techniques, or from continual program modification during the development process. Dead function elimination removes such functions, saving valuable space, and reducing memory and cache consumption.

  • Common padding. Common padding improves array alignments in FORTRAN common blocks. Unlike traditional compilers that are unable to coordinate changes to the layout of user variables in a common block, the PathScale compilers can take advantage of all subroutines being available. By implementing common padding, array alignments can be improved, enabling arrays to be vectorized and accessed more efficiently and potentially reducing cache conflicts, improving performance. A similar technique can rearrange C and C++ structures.

  • Common block splitting. Common block splitting divides a FORTRAN common block into smaller pieces, reducing data cache conflicts during program execution.

  • Procedure re-ordering. Procedure reordering organizes functions based on their call relationship, potentially reducing instruction cache thrashing during program execution.

Developers only need to compile and link with the -ipa option to begin taking advantage of the performance benefits of inter-procedural analysis and optimization. When the -ipa option is specified, the PathScale compilers compile program code and link their intermediate form, perform inter- procedural analysis and optimization, generate object code, and invoke the standard linker to produce the executable -- all in a single, easy step.

Tuning for Maximum Performance

Developers may wish to tune applications for peak performance. To aid this effort, the PathScale EKO compilers provide several options that can further improve application performance. See http://www.pathscale.com/whitepapers.html for more a complete copy of this white paper.


Top of Page

  |  Table of Contents  |