Maximizing CAE Throughput with Altair Solvers and 4th Gen AMD EPYC™ Processors
Altair and AMD have a proud history of delivering breakthrough performance results for various engineering simulations. In March 2022, we put servers based on 3rd Gen AMD EPYC™ processors with AMD 3D V-Cache™ technology to the test. We showed that an Altair® Radioss®-based impact analysis simulation ran up to 1.8x faster with AMD 3D V-Cache compared to the AMD EPYC processors tested the year before. Similarly, we showed a 1.5x throughput gain running a thermal computational fluid dynamics (CFD) analysis with Altair® AcuSolve®.1
In our latest benchmark series, we evaluated the performance of the new AMD EPYC 9004 series processors running a series of benchmarks representative of real-world CAE workloads. These results demonstrated our most thorough, most impressive joint benchmarks to date.
Key Challenges in CAE
Regardless of industry, modern manufacturers face unprecedented challenges. As designs become more complex, engineers rely on advanced simulation and optimization techniques to meet design goals, ensure quality and manufacturability, and minimize material costs.
In computer-aided engineering (CAE), performance is no longer just a nice to have — it's an essential competitive feature. This is especially true in compute-intensive areas such as nonlinear structural analysis, electromagnetic simulation, and CFD. With faster simulation, manufacturers can design higher-quality products, reduce time-to-market, and avoid the need for expensive, time-consuming physical prototyping.
Energy efficiency is also imperative. To meet corporate sustainability goals, manufacturers increasingly need energy-efficient servers. They need to maximize throughput-per-watt to keep power requirements and associated CO2 emissions due to high-performance computing (HPC) data center operations to a minimum.
Putting 4th Gen AMD EPYC Processors to the Test
We ran nine tests involving four different Altair solvers to evaluate the performance and efficiency of the latest AMD EPYC 9004 series processor. We ran these tests using a diverse set of simulation techniques and models. We then compared the results for two different AMD EPYC 9004 series processors to a baseline 32-core 3rd generation AMD EPYC 75F3 processor used in prior benchmarks. The tests were as follows:
- A CFD simulation running AcuSolve, a general-purpose fluid and thermal simulation tool, modeling turbulent flow and pressure in an 8 million node model with an impinging nozzle.
- Two simulations involving Altair® OptiStruct®, a proven multiphysics structural solver for linear and nonlinear analysis — tests included static analysis of a car engine with 4.6 million degrees of freedom (DOF) and a car body optimization problem with 24.4 million DOF.
- Three simulations running Altair® Feko®, a general-purpose high-frequency electromagnetic simulator. The examples considered here computed radar cross section (RCS) and optimized antenna placement using different models and numerical methods.
- Three simulations involving Radioss, a leading solution for evaluating and optimizing product performance for nonlinear dynamic load problems. Models included a frontal collision of a 1 million shell Chrysler Neon, a 10 million shell Ford Taurus collision, and a pole impact test of a Toyota Venza comprised of 1.1 million shells and a battery pack.
These simulations are all compute-intensive and benefit from multi-core processors that can deliver a high degree of parallelism and high-memory bandwidth.
The Benchmark Environment
We ran the nine workloads described above on two different dual-socket AMD EPYC 9004-based systems with 32 and 64 core processors and compared the results to a baseline 3rd Gen AMD EPYC 75F3 processor with 32 cores. The specs of the different processors evaluated are summarized below.2
|CPU||Total CCDs3||Cores/threads||L3 Cache||Base/Max Boost Clock4|
|AMD EPYC 75F3||8||32/64||256 MB||2.95/4.00 GHz|
|AMD EPYC 9374F||8||32/64||256 MB||3.85/4.30 GHz|
|AMD EPYC 9554||8||64/128||256 MB||3.10/3.75 GHz|
All AMD EPYC 9004 series CPUs have 12 memory channels, a 50% improvement over the 3rd Gen AMD EPYC processors. The AMD EPYC 9004 series CPUs also supported DDR5 4800 memory vs. DDR4 3200 for the baseline 3rd Gen AMD EPYC CPU.
These tests were designed to characterize the performance of the frequency-optimized AMD EPYC 9374F processor with 32 cores vs. the higher-core AMD EPYC 9554 CPU with 64 cores per socket. The nine benchmarks described above were run across these three server configurations for a total of 27 performance tests.
Delivering Superior Simulation Throughput
For structural, electromagnetic, and CFD simulations, Altair solvers on AMD EPYC 9004 series processors deliver impressive performance gains. All workloads tested benefit from the high clock speeds, large core counts, increased memory bandwidth, and fast DDR5 memory of 4th Gen AMD EPYC processors.
In general, AMD EPYC 9374F processors will deliver outstanding performance per core and is a great choice for users more focused on optimizing value of each core and keeping license costs down. AMD EPYC 9554 is the best choice when throughput and overall performance are paramount.
Performance gains were especially impressive for AcuSolve, where testing showed the 64-core AMD EPYC 9554 provides up to ~2.24x performance uplift than the reference 3rd Gen AMD EPYC 75F3 processor. Similarly, for Feko, the AMD EPYC 9554-based server provides up to ~ 2.13x performance uplift.
For memory-bound solvers, 4th generation AMD EPYC delivers significant performance boosts thanks to the 12 memory channels (up 50% from previous generation). For AcuSolve, the AMD EPYC 9374F, with higher frequency but equal cores, delivered up to ~2.0x better relative performance than the reference AMD EPYC 75F3 powered server.
Solid performance was also observed with OptiStruct: up to ~1.65x better relative performance on AMD EPYC 9554-based servers.
Explicit simulations with Radioss, which are memory bound but also compute intensive, saw predictable performance gains with core count, achieving up to ~1.97x the performance with AMD EPYC 9554 over the previous generation dual-socket AMD EPYC 75F3-based system. These exceptional performance gains are the direct result of multiple innovations in the latest AMD EPYC 9004 series processors.
About AMD EPYC 9004 Series Processors
The latest AMD EPYC 9004 series CPUs build on AMD's history of innovation, with next generation 5nm technology, support for high-performance DDR5 DIMMs, and fast PCIe® Gen 5 I/O. These processors also have the capability for 12 memory channels with two DIMMs per channel for memory-hungry HPC and artificial intelligence (AI) workloads. AMD EPYC 9004 series CPUs also uniquely provide 128 PCIe5 lanes in a single-socket server and up to 160 PCIe5 lanes on two-socket servers, making the AMD EPYC 9004 series an ideal platform for workloads that benefit from fast interconnects in clustered environments or accelerators such as GPUs or FPGAs. Some key features of the AMD EPYC 9004 series that contribute to its exceptional performance are:
- Up to 96 cores and 12 TB of memory per CPU
- 12 memory channels, with 2 DIMMs per channel capability
- Support for DDR5-4800 memory
- Up to 160 PCIe5 lanes per two-socket server
- Support for AVX-512 instructions (256b data path)
Altair and AMD – Partners in Performance
The combination of state-of-the-art Altair solvers and 4th Gen AMD EPYC 9004 series processors can help CAE users dramatically boost simulation throughput, with testing showing up to 2.24x the performance of the previous generation for the SKUs tested for this article. Also impressive are the exceptional improvements in density and power efficiency. AMD EPYC 9004 series processors can deliver much higher efficiency in terms of throughput per watt generationally. For example, the 32-core AMD EPYC 9374F delivers roughly 2x the performance compared to the previous generation running our AcuSolve CFD simulation. This translates into ~40% improvement in throughput per watt.5
This ~2x boost in performance means that customers concerned about data footprint can achieve similar performance with half the number of AMD EPYC 9004-based servers compared to previous generation, which helps to reduce data center facility costs and ancillary costs related to racks, PDUs, network drops, and software licenses. Customers can also take advantage of AMD Instinct™ GPUs for even higher levels of throughput and efficiency. These dramatic improvements enable faster timelines, improved engineering efficiency, lower costs, and the ability to design better, more thoroughly simulated products.
For additional information on Altair solvers, visit https://altair.com. For additional information on the latest AMD EPYC 9004 series CPUs, visit https://amd.com/AMD EPYC.
1. See article: https://www.altair.com/newsroom/articles/Breakthrough-Computing-Performance-with-Altair-and-3rd-Gen-AMD-EPYC-Processors-with-AMD-3D-V-Cache-Technology
2. Detailed system configs:
System 1: CPUs: 2 x AMD EPYC 9554 (64 cores/socket, 128 cores/node); Base Freq: 3.10 GHz; 256 MB L3; 1.5 TB (24x) Dual-Rank DDR5-4800 64GB DIMMs, 1DIMM per channel; 1 x 256 GB SATA (OS) | 1 x 1 TB NVMe (data); BIOS Version 1001C, SMT=off, Determinism=performance, NPS=4, TDP/PPT=400; RHEL 8.6; OS settings: Clear caches before every run, NUMA balancing 0, randomize_va_space 0.
System 2: CPUs: 2 x AMD EPYC 9374F (32 cores/socket, 64 cores/node); Base Freq: 3.85 GHz; 256 MB L3; 1.5 TB (24x) Dual-Rank DDR5-4800 64GB DIMMs, 1DIMM per channel; 1 x 256 GB SATA (OS) | 1 x 1 TB NVMe (data); BIOS Version 1001C, SMT=off, Determinism=performance, NPS=4, TDP/PPT=400; RHEL 8.6; OS settings: Clear caches before every run, NUMA balancing 0, randomize_va_space 0.
System 3: CPUs: 2 x AMD EPYC 75F3 (32 cores/socket, 64 cores/node); Base Freq: 2.95 GHz; 256 MB L3; 1 TB (16x) Dual-Rank DDR4-3200 64GB DIMMs, 1DIMM per channel; 1 x 256 GB SATA (OS) | 1 x 1 TB NVMe (data); BIOS Version 1009B, SMT=off, X2APIC=on, IOMMU=off, APBDIS=1, Fixed SOC P-state=0, Determinism=power, NPS=4, DF C-states=off, PIO, EPIO, TSME=off, PCIe 10 bit tag=on; RHEL 8.6; OS settings: Clear caches before every run, NUMA balancing 0, randomize_va_space 0.
All systems solver versions: AcuSolve 2021.2 with Intel(R) MPI Library for Linux* OS, Version 2018 Update 4, Build 20180823 (id: 18555); Feko 2022.1-19287 with Intel(R) MPI Library for Linux* OS, Version 2021.2, Build 20210302 (id: f4f7c92cd); OptiStruct 2022.1 with Intel(R) MPI Library, Version 2021.2, Build 20210302; Radioss 2022.1 with Intel(R) MPI Library, Version 2021.2, Build 20210302.
3. In the AMD EPYC™ architecture, a CCD refers to a core complex die. Each CCD supports a variable number of processor cores and cache depending on the processor SKU.
4. Max Boost for AMD EPYC™ processors is the maximum frequency achievable by any single core on the processor under normal operating conditions for server systems. AMD EPYC-18.
5. The baseline AMD EPYC™ 75F3 CPU delivers 1 performance unit per 280 watts or 0.003574 units per watt. The AMD EPYC™ 9374F delivers 2.00 performance units per 400 watts or .00500 performance units per watt. This is roughly a 40% improvement in power efficiency (0.00500 / 0.003574)-1 and a 100% improvement in server density.