Altair_Blog_hero_1920x225

Partner Perspectives

Achieving Breakthrough Performance with Altair CAE Solvers on the Latest Intel® Xeon® Processors

By Altair | Intel |

Finding the right processor for compute-intensive CAE workloads can be daunting. Engineering and IT managers need to consider many factors, including price, performance, core counts, memory bandwidth, and the degree to which simulations can be parallelized. 

This article presents a series of recent CFD and crash benchmarks conducted by Intel, comparing the latest Intel® Xeon® Platinum processor and Intel® Xeon® Max Plus processors to previous generations running Altair standard benchmarks. 

 

The Need for Speed

In engineering simulation, performance is critical. Using advanced CAE tools and computer simulation early in the design cycle, engineers can improve quality, optimize materials, improve manufacturability, and avoid downstream problems that can impact customer satisfaction or increase warranty costs. High-throughput simulation helps engineers perform more thorough analyses, improve time to market, and enhance competitiveness.

 

About the Benchmarks

To characterize the performance of servers powered by the latest Intel Xeon processors, we ran three different workloads on two widely used Altair solvers.  

  • Altair® AcuSolve®: A general-purpose CFD and thermal simulation application that helps engineers explore a full range of flow, heat transfer, turbulence, and material analysis problems.
  • Altair® Radioss®: A versatile, high-performance explicit finite element (FE) solver. For many organizations, Radioss is their go-to tool for predicting dynamic, transient-load effects to improve safety and design more robust products.

We ran dozens of tests involving three standard Altair CFD and structural models to evaluate the performance and scalability of different server and processor configurations.


  • Altair Headquarters (HQ) CFD simulation (AcuSolve): A large model comprising 62 million nodes and 218 million elements simulating turbulence effects common in architecture, engineering, and construction (AEC) aerodynamics. 
  • Chrysler Neon (Neon1M/80ms) crash simulation (Radioss): A crash simulation involving a model comprised of 1 million elements over an 80 millisecond simulation time.1
  • Ford Taurus (t10M/8ms) crash simulation (Radioss): A crash simulation involving a model comprised of 10 million elements over an 8 millisecond simulation time.2

 

Server Configurations Tested

The simulation models above were run across multiple dual-processor (2P) servers having different processor configurations. 

The first two-server configurations tested were based on 3rd Generation Intel Xeon® Scalable Processors introduced in 2021. The second two servers described below used the latest 4th Generation Intel Xeon Scalable Processors announced in January 2023. 

The server configurations tested included the following:3

For some tests, throughput was also measured in two- and four-node cluster configurations to evaluate scaling. In clustered configurations, nodes were connected via an HDR-200 (200 GB/sec) InfiniBand switch. The Intel Xeon Max Series 9480 processor features up to 64MB of high-bandwidth memory (HBM) per socket. The 9480 also supports standard DDR5-4800 memory. HBM memory can be configured to operate in different modes. In these benchmarks, HBM was configured to act as a cache so that applications could transparently take advantage of the HBM memory with no code changes. All the configurations tested used solid-state drives.

 

Intel® Xeon® Delivers Breakthrough Performance

The benchmark results are summarized in the charts below.4 The first chart compares the relative performance improvement for each processor compared to the baseline Intel Xeon Platinum 8380-based server. The second chart shows how the AcuSolve HQ model scaled in a cluster configuration depending on the processor and number of nodes.5

In the single node comparison running AcuSolve, the top-of-the-line Intel Xeon Max 9480 CPU with HBM delivered a throughput improvement of up to 2.35x versus the 3rd Generation 8380 processor. Similarly, the Intel Xeon Platinum 8480+ delivered a 1.52x performance uplift running the Radioss crash simulation with the Ford Taurus model.


The second chart shows that AcuSolve scales exceptionally well in clustered configurations. The model running across 448 cores in the four-node Intel Xeon 8480+ cluster delivered over 3.5x the throughput of a single 8480+-based node with 112 cores, demonstrating 88% scaling efficiency across four nodes.6

In addition to the tests above, the AcuSolve model was run on a four-node cluster comprised of 3rd Generation Intel Xeon 6346 processors. On this cluster comprised of four nodes, each with 32 cores, the HQ model ran in 5,200 seconds, or approximately 1 hour and 27 minutes. The same model ran on a single node with the latest Intel Xeon Max Series 9480 CPUs in 5,260 seconds – within one minute of the clustered configuration. This means that the four-node cluster comprised of dual Intel Xeon Gold 6346-based servers can potentially be replaced by a single 9480-based node, reducing data center space requirements by up to 4x! The superior performance of the latest Intel Xeon Max Series 9480 CPUs translates into fewer racks, network drops, switches, ancillary equipment, and lower management costs. 

 

Significant Operational Savings

The latest Intel processors not only deliver better throughput — they are also more cost-effective. For example, the four-node Intel Xeon 6346-based cluster consumes approximately 1,640 watts.7 This translates into ~14,366 kWh annually, costing about $1,437.8 A single server comprised of two Intel Xeon Max 9480 processors delivers roughly the same throughput while consuming ~760 watts (~6,658 kWh annually) at a cost of about $667a 54% reduction in power and associated carbon dioxide emissions. While this savings of about $770 per year is modest (1,437-667), applying this to an environment with 400 3rd Generation Intel powered servers, annual power savings for servers alone amount to nearly $77,000 per year. Considering cooling and ancillary costs, annual energy savings in this scenario can be as high as $100,000 per year.9

 

Key Architectural Features

While some performance gains are attributable to the 40% increase in core count (56 cores for the 9480/8480+ vs. 40 cores for the 8380), core counts alone do not explain the 2.35x performance gain. Several innovations in 4th Generation Intel Xeon® CPUs contribute to these superior results across the three benchmarks:

  • Integrated high-bandwidth memory (HBM2e) in Intel Xeon Max Series
  • Eight x DDR5 memory channels (vs. six channels for 3rd Generation Intel® Xeon®)
  • CXL 1.1 connectivity and PCIe 5.0 support
  • UPI 2.0 for multi-socket scaling
  • Integrated acceleration engines
  • Instruction set extensions, including Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

 

Altair Solvers: Optimized for Intel Processors

Intel and Altair have a long history of collaboration, and Altair software is designed for high-performance computing on Intel architecture. Altair and Intel engineers work closely to optimize solver codes using Intel compilers, tools, and libraries such as the Intel® oneAPI Math Kernel Library (oneMKL). Altair has supported Intel’s AVX-512 instructions in Radioss since 2018 to boost solver throughput using vectorized execution.10

As an example of this continuing partnership, Intel and Altair recently announced a project to leverage Intel’s oneAPI Base and HPC Toolkits for OpenRadioss development.11 This is significant because leveraging these oneAPI libraries allows applications to automatically benefit from silicon-based features in the latest Intel processors without needing to modify applications. For customers, this means that investments in software are preserved, enabling customers to move quickly to new hardware platforms as their needs evolve.

 

Key Takeaways

The latest Intel Xeon Max Series and Intel Xeon Platinum CPUs deliver a significant performance boost and exceptional value for demanding structural and CAE simulations. They provide up to 2.35x higher throughput for AcuSolve CFD workloads and a 1.52x performance boost for Radioss-based crash simulations. Moreover, the latest Intel processors deliver significantly higher throughput per watt, enabling organizations to boost productivity while reducing their data center and carbon footprint. 

 

Learn More

To learn more about Altair engineering solvers, visit altair.com. To learn more about 4th Generation Intel Xeon Processors, visit intel.com/hpc.

 


 

1. The model files for the Chrysler Neon are available from Altair University.

2. The Taurus crash model can be downloaded from Altair University.

3. The actual benchmarks involved additional processor SKUs, but we’ve focused on these four for clarity.

4. Results for the Xeon Gold 6346 are intentionally omitted from this chart to avoid a misleading comparison because it has a lower number of cores (16) compared to the other two processors.

5. The specific configurations tested were as follows:
Intel® Xeon® 8380: Test by Intel as of 09/28/2022. 1-node, 2x Intel® Xeon® 8380, HT ON, Turbo ON, Quad, Total Memory 256 GB, BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode 0xd000270, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_6.crt1.x86_64, Altair AcuSolve 2021R2. 
Intel® Xeon® 6346: Test by Intel as of 10/08/2022. 4-nodes connected via HDR-200, 2x Intel® Xeon® 6346, 16 cores, HT ON, Turbo ON, Quad, Total Memory 256 GB, BIOS Version SE5C6200.86B.0020.P23.2103261309, ucode 0xd000270, Rocky Linux 8.6, kernel version 4.18.0-372.19.1.el8_6.crt1.x86_64, Altair AcuSolve 2021R2.
Intel® Xeon® 9480 (CPU Max Series): Test by Intel as of 10/03/2022. 1-node, 2x Intel® Xeon® CPU Max Series, HT ON, Turbo ON, SNC4, Total Memory 128 GB (HBM2e at 3200 MHz), BIOS Version SE5C7411.86B.8424.D03.2208100444, ucode 2c000020, CentOS Stream 8, kernel version 5.19.0-rc6.0712.intel_next.1.x86_64+server, Altair AcuSolve 2021R2.

6. The four-node cluster comprised of 2P Intel Platinum 8480+ servers ran 4.82x faster than the baseline Intel Xeon Platinum 8380-based node. The single node 2P Intel Platinum 8480+ server ran 1.37x faster than the baseline Intel Xeon Platinum 8380-based node. Therefore, the four-node cluster ran (4.82 / 1.37) = 3.52 faster than the single node 8480+ server. A performance uplift of 3.52 is 88% of the maximum 4x theoretical uplift.

7. Based on Intel estimates.

8. Power consumption estimates provided by Intel. Calculations assume an average rate of USD 0.10 per kWh. 24x365 hours per year.

9. Total costs including cooling will depend on the data center’s power usage efficiency or PUE, a metric familiar to data center operators. For a data center with a PUE of 1.2, assuming $77K is spent to power servers annually, total costs would be 1.2 * 77K = $92.4K. For less efficient data centers, costs would be higher. 

10. See the whitepaper Assuring Scalability: Altair Radioss™ delivers robust results quickly for crash-safe vehicle designs, May 2020.

11. See the announcement Intel and Altair Radioss open development to accelerate HPC.