Harnessing the Power of AMD CPUs and GPUs for Engineering Simulations

For years, engineering simulations have relied on clusters of high-performance CPU-based servers. However, the landscape is evolving with the integration of powerful GPUs, creating platforms that enhance computational capabilities.
Computational fluid dynamics (CFD) workloads are notorious for demanding memory and bandwidth. Historically, GPUs struggled to meet these needs due to limited onboard memory, making them less suitable for large-scale simulations; but with the advent of data center class GPUs and software advancements, the scenario has changed dramatically, leading to the exascale era.
Modern GPU-based solvers now complement CPU-based implementations, offering more features and performance. The AMD Instinct™ GPUs, with generous high-bandwidth memory (HBM) capacities, is a game-changer, bringing significant benefits to computer-aided engineering (CAE) workloads. GPU-based simulators deliver superior performance and efficiency, enhancing price-performance ratios and energy efficiency.
Manufacturers' data centers are embracing GPU accelerated workloads, integrating them seamlessly with existing CPU-based systems. Emerging applications in generative AI (GenAI) and deep learning are finding their place in these environments, and as GPU-accelerated simulations and AI applications continue to grow, the need for heterogeneous infrastructures capable of managing these mixed workloads becomes paramount. Engineering innovators require robust solutions to handle diverse tasks efficiently on shared platforms, harnessing the strengths of both CPUs and GPUs.
Maximizing GPU Resources for Optimal Performance
Despite their impressive benefits in terms of energy efficiency and price performance, data center GPUs represent a significant investment. Therefore, it's crucial to maximize their utilization to ensure these expensive resources are used efficiently. As engineering simulations and AI applications continue to evolve, future CAE data centers will need to accommodate a diverse array of heterogeneous compute-based workloads.
To achieve this, organizations must strategically optimize the allocation of these varied workloads. This involves balancing distributed CPU- and GPU-based simulators alongside machine-learning frameworks to enhance overall efficiency. By doing so, companies can minimize costs and ensure that resources are fully utilized, avoiding the pitfalls of underutilization.
The integration of GPUs into data centers is not just about adding more power; it's about smart resource management. Leveraging the strengths of both CPUs and GPUs allows for a more flexible and efficient computational environment. For instance, while CPUs handle complex, sequential tasks, GPUs excel at parallel processing, making them ideal for specific simulation and AI tasks. This synergy ensures that each type of processor is used where it performs best, leading to improved performance and cost-effectiveness.
As new GPU workloads such as GenAI and deep learning become more prevalent, the ability to manage these mixed workloads seamlessly on shared infrastructure becomes increasingly important. Organizations must develop robust solutions that can handle the dynamic nature of these tasks, ensuring all their compute assets are operating at full potential.
Altair Schedulers and AMD Instinct GPUs
Workload managers such as Altair® PBS Professional® and Altair® Grid Engine® have long been instrumental in optimizing simulation environments. These tools allocate resources among diverse engineering workloads, considering factors such as resource requirements, workload parallelism, deadlines, sharing policies, cluster topologies, and overall throughput and productivity.
Altair has established itself as a leader in bringing advanced workload management and scheduling capabilities to data center GPUs. This expertise is crucial as the integration of GPUs into simulation environments continues to grow. The relationship between Altair and AMD has further strengthened this capability, with Altair schedulers now providing advanced support for AMD Instinct MI200 Series GPUs.
Travis Karr, AMD corporate vice president for HPC, adds, “AMD’s Instinct™ GPUs, with their high-bandwidth memory and exceptional computational power, are designed to accelerate the most demanding CAE workloads. By working together with Altair’s advanced scheduling solutions, we’re enabling organizations to maximize the potential of their heterogeneous infrastructures, optimizing performance and energy efficiency while lowering costs.”
By optimizing workload placement and managing resources efficiently, Altair schedulers enhance the performance of both CPU- and GPU-based simulations. This balanced approach not only improves overall productivity but also minimizes costs and maximizes resource utilization. As a result, engineering teams can achieve higher throughput and better performance, driving innovation and maintaining a competitive edge.
Embracing the Future of Engineering Simulations
By leveraging Altair schedulers to manage clusters of AMD Instinct accelerated servers, our customers can streamline workload management while enhancing the throughput and cost-effectiveness of their engineering environments. Both PBS Professional and Altair Grid Engine offer robust integration features that simplify and optimize the use of AMD Instinct MI200 GPUs and the latest Instinct MI300 Series GPUs.
One of the key advantages of Altair schedulers is their ability to automatically detect and configure settings to work seamlessly using GPUs. This automation simplifies the user environment by setting the environment variables required by GPU-aware applications, reducing complexity for users and ensuring smooth operation.
Altair schedulers excel in optimizing placement of both CPU and GPU portions of accelerated workloads. They consider various factors such as GPU and CPU resource requirements, NUMA topology boundaries, and proximity to sockets, cores, and memory channels. This meticulous optimization ensures resources are used efficiently, enhancing performance and minimizing costs.
Configuration options and reporting are built in. Users can employ the qstat -f command to monitor GPU usage including compute unit occupancy, GPU RAM, System Direct Memory Access (SDMA) activity, and walltime. An example from a host with four MI210 GPUs:
$ qconf -se us-midc-mi210
hostname us-midc-mi210
load_scaling NONE
complex_values GPU=4(MI210_1[amd_id=0, \
device=/dev/dri/card1|/dev/dri/renderD128, \
uuid=GPU-74e9455f63e4f48f] MI210_2[amd_id=1, \
device=/dev/dri/card2|/dev/dri/renderD129, \
uuid=GPU-6de8c605c2212805] MI210_3[amd_id=2, \
device=/dev/dri/card3|/dev/dri/renderD130, \
uuid=GPU-600c883acb6a35a8] MI210_4[amd_id=3, \
device=/dev/dri/card4|/dev/dri/renderD131, \
uuid=GPU-850e4e3c316d4738])
Reliability is another critical aspect addressed by Altair schedulers. Through cgroups isolation, they ensure jobs can only access the devices allocated to them, preventing runtime errors and improving overall system stability. This isolation is crucial for maintaining the integrity of simulations and AI workloads, ensuring each task runs smoothly, without interference.
Additionally, Altair schedulers provide real-time visibility into GPU usage, offering detailed metrics per job, node, and device. This includes insights from AMD ROCm™ software and the job scheduler, allowing users to monitor and manage GPU performance effectively. This transparency is vital for optimizing resource utilization and ensuring the engineering environment operates at peak efficiency.
The integration of Altair schedulers with Instinct accelerators brings advanced features to AI-ready and HPC-ready data centers. By simplifying workload management, optimizing resource placement, and enhancing reliability, these tools empower customers to achieve higher throughput and cost-effectiveness.
“Our collaboration with AMD underscores our commitment to delivering the best workload management solutions for the most demanding HPC environments,” said Cameron Brunner, senior vice president, Altair HPCWorks. “By integrating Altair HPCWorks with AMD’s latest GPU technology, we empower customers with unmatched performance, efficiency, and scalability, ultimately helping them solve complex challenges faster and more effectively than ever before."
Take your engineering simulations to the next level and explore the possibilities with AMD Instinct™ GPUs and Altair's advanced scheduling solutions. Innovate today!
Visit www.AMD.com/Instinct to learn more about AMD Instinct GPUs.