景派HPC:Program Optimization

2024-01-15 09:45

Intel® oneAPI Base Toolkit

图片


Reason


   Performance Improvement



Optimization can significantly enhance the speed and efficiency of a program, leading to quicker execution and better overall performance.


   Resource Utilization:

Efficient code ensures that system resources such as memory and CPU are utilized effectively, preventing unnecessary strain on the hardware.


   Energy Efficiency:

Optimized programs consume fewer resources, contributing to energy savings and a more environmentally friendly operation, especially important for devices with limited power.

图片


Hotspots Checking (Intel VTune)

图片

图片

图片



Intel C\C++ compiler optimization flags

图片



Intel® oneAPI Math Kernel Library(MKL)


   Optimized Library for Scientific Computing

• The fastest and most-used math library for Intel®-based systems
• Core functions include BLAS, LAPACK, sparse solvers, fast Fourier transforms (FFT), random number generator functions (RNG), summary statistics, data fitting, and vector math
• Optimizes applications for current and future generations of Intel CPUs, GPUs, and other accelerators
• Is a seamless upgrade for previous users of the Intel® Math Kernel Library (Intel® MKL)

图片



Intel® Integrated Performance Primitives(IPP)

Intel IPP is an extensive library of ready-to-use, domain-specific functions that are highly optimized for diverse Intel architectures. Its royalty-free APIs help developers:

Take advantage of Single Instructon, Multiple Data (SIMD) instructions

Improve the performance of computation-intensive applications, including signal processing, data compression, video processing, and cryptography

Reduce cost and time to market for software development and maintenance

图片



OPENMP

图片

For (i=1;i<=10; i++)

{

task;

}

………openmp……..

Thread0: For (i=1;i<=5; i++)

Thread1: For (i=6;i<=10; i++)

图片



Cache optimization


   Reduced Memory Latency:


Caches provide faster access to data than main memory. Optimizing code for cache locality ensures that frequently used data is stored in the cache, reducing memory latency and improving overall program performance.



Enhanced CPU Utilization:


By utilizing the cache effectively, the CPU spends less time waiting for data from slower memory, allowing it to execute instructions more efficiently. This leads to better CPU utilization.

图片


Cache-Friendly Data Structures:

• Organize data structures to enhance spatial locality, placing related data close together in memory.

• Use contiguous memory allocation to improve cache line utilization.

• Choose data structures that minimize padding and reduce wasted space.

图片



Optimal Data Alignment

• Align data structures and arrays to the cache line size to ensure efficient use of cache.

• Misaligned data can result in partial cache line utilization and increased cache misses.

图片



Intel Intrinsic Functions

Intrinsics are assembly-coded functions that allow you to use C++ function calls and variables in place of assembly instructions. Intrinsics are expanded inline, eliminating function call overhead. While providing the same benefits as using inline assembly, intrinsics improve code readability, assist instruction scheduling, and help when debugging. They provide access to instructions that cannot be generated using the C and C++ languages standard constructs and allow code to leverage performance-enhancing features unique to specific processors.

图片

图片

图片

图片



Intel® Intrinsics Guide

https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=-mm512


文章作者:张实瑞   景派科技技术顾问

排版:景派科技 市场部

图片