High-Performance Computing Systems and Applications

: [Frontier, HPC]

- Overview

Driven by the development of AI and 5G networks, the ICT industry is ushering in a new turning point. The intelligent society has emerged in our daily life, and the world we live in is becoming more and more intelligent. However, many tasks remain to be solved. In the future, AI computing systems will play a pivotal role in the "next intelligent society", which explains why a lot of research is being carried out in related industries aimed at greatly improving performance.

With the vigorous development of industry and scientific research in all walks of life, important data has not only increased significantly in quantity, but also in complexity. It is true that the computing power of the hardware platform seems to have improved, but its computing speed is still insufficient and limited, and cannot meet the requirements of computing speed in today's world. High-performance computers (HPC) came into being to meet the needs of high-speed computing.

High performance computing (HPC) systems are groups of computers that work together to solve computationally intensive tasks. HPC systems can be custom-built supercomputers or clusters of individual computers. HPC systems can be run on-premises, in the cloud, or as a hybrid of both.

HPC systems are built with high-performance processors, high-speed memory and storage, and other advanced components. In a cluster, each computer is called a node, and each node is responsible for a different task. The nodes are connected through a network and designed to function as a single, powerful computing resource.

Supercomputing is one of the most well-known HPC solutions. In a supercomputer, thousands of compute nodes work together to complete one or more tasks, which is called parallel processing. This is similar to having thousands of PCs networked together, combining compute power to complete tasks faster.

Please refer to the following for more information:

Wikipedia: High Performance Computing
Wikipedia: Exascale Computing

- High-Performance Architecture (HPA)

AI solutions require purpose-built architectures that allow for fast model development, training, tuning and real-time intelligent interaction. This makes high-performance architecture (HPA) a core component of any robust AI infrastructure strategy, particularly for enterprises building AI for business infrastructure that supports growth and agility.

To realize the full potential of AI, organizations need to invest in infrastructure that is tailored to the specific needs of their AI workloads, ensuring high performance, scalability, and efficiency throughout the AI lifecycle.

High-performance architecture is the foundation of AI infrastructure. High-performance architecture (HPA) is a platform and environment designed and built to process massive amounts of data and solve complex problems at extremely high speeds.

HPA is the foundation of modern AI infrastructure, providing enterprises with the computing power and scalability required to develop and deploy AI at scale. HPA includes high-performance computing (HPC), high-performance networking, and high-performance storage.

Other features of HPA include:

Parallel processing: Clusters of high-performance processors handle similar workloads that can be distributed across multiple compute nodes, executed simultaneously, and the outputs consolidated for faster completion.
Built and optimized for high performance: All components in HPA are designed for high speed, high throughput, and low latency to maximize throughput.
Dynamic architecture: Compute, storage, and networking can scale up and down to meet the scale needs of workloads.

- Why Do We Need the HPC?

Specialized computing resources were necessary to help researchers and scientists extracts insights from massive data sets. Scientist need HPC because they hit a tipping point. At some point in research, there is a need to:

Expand the current study area (regional → national → global)
Integrate new data
Increase model resolution

But … processing on my desktop or a single server no longer works. Some typical computational barriers:

Time – processing on local systems is too slow or not feasible
CPU Capacity -- Can only run one model at a time
Develop, implement, and disseminate state-of-the-art techniques and tools so that models are more effectively applied to today’s decision-making
Management of Computer Systems – Science Groups don’t want to purchase and manage local computer systems – they want to focus on science

HPC is needed to tackle complex problems that require massive processing power and storage. It allows for faster and more efficient calculations, simulations, and analysis, making it crucial in various fields like scientific research, engineering, and data science.

HPC can also handle large datasets that would overwhelm conventional computers, enabling advancements in fields like AI and machine learning.

Speed and Efficiency: HPC can significantly reduce the time needed to run complex computations, allowing researchers and engineers to get results much faster than traditional methods.
Large Datasets: HPC systems are designed to handle and process massive amounts of data, which is essential for fields like data analysis, machine learning, and artificial intelligence.
Scalability: HPC allows users to scale their work easily to larger problems, making it easier to tackle increasingly complex challenges.
Diverse Applications: HPC is used in various fields, including weather forecasting, climate modeling, oil and gas exploration, drug discovery, and financial analysis.
Cost-Effectiveness: HPC can be a more affordable alternative to traditional supercomputing, especially when considering the vast computational and storage capabilities it offers.
Innovation and Discovery: HPC enables advancements in scientific research and engineering, leading to new discoveries and innovations.

- High-performance Computing (HPC) vs. Supercomputers

High-performance computing (HPC) and supercomputers are both types of computing that use advanced techniques and systems to solve complex problems.

HPC is sometimes used as a synonym for supercomputing. However, in other contexts, "supercomputer" is used to refer to a more powerful subset of "high-performance computers".

Here are some differences between HPC and supercomputers:

Hardware configuration and architecture: HPC systems are typically more flexible and scalable than supercomputers.
Hardware: Supercomputers are made up of thousands of compute nodes that work together to complete tasks. HPC systems are built using massive clusters of servers with one or more central processing units (CPUs).
Security: HPC is more of a headache for security. With connections that may run the gamut from secure onsite servers to publicly managed cloud instances and private cloud deployments, ensuring security across the HPC continuum is inherently challenging.

HPC systems are designed to handle large amounts of data, perform intricate simulations, and deliver results at high speeds. They are often measured in terms of floating-point operations per second (FLOPS).

Supercomputers are a class of computing machines that are built to handle tasks of unprecedented complexity. Supercomputing clusters are used by AI researchers to test model designs, hyperparameters, and datasets. They can improve AI models and advance AI capabilities.

HPC can be used for tasks such as: Handling the complex partial differential equations used to express the physics of weather. Supercomputers can be used for tasks such as: Real-time AI applications like autonomous cars and robotics

: [Lake Brienz, Switzerland]

- Exascale Computing

Exascale computing is the next milestone in supercomputer development. Exascale computers, capable of processing information faster than today's most powerful supercomputers, will give scientists a new tool to solve some of the biggest challenges facing our world, from climate change to understanding cancer to design New Materials.

Exascale computers are digital computers, broadly similar to today's computers and supercomputers, but with more powerful hardware. This makes them different from quantum computers, which represent an entirely new approach to building computers suitable for specific types of problems.

Exascale computing refers to a new era of supercomputing that can perform at least one exaflop per second, which is one quintillion (10^18) floating-point operations per second. This level of processing power enables scientists to tackle complex problems and conduct detailed simulations that were previously impossible.

Key aspects of exascale computing:

Performance: Exascale systems achieve a processing speed of one exaflop per second, significantly faster than previous generations of supercomputers.
Applications: Exascale computing enables advanced research and development in areas like climate modeling, drug discovery, materials science, and national security.
Infrastructure: Exascale systems require substantial infrastructure, including large data centers, specialized hardware, and sophisticated software stacks.
Impact: Exascale computing has the potential to accelerate scientific breakthroughs, improve decision-making, and address global challenges.
Examples: The Frontier supercomputer at Oak Ridge National Laboratory and the Aurora supercomputer at Argonne National Laboratory are examples of exascale systems.

- Memory Centric Computing

While various approaches are being developed to leverage systems for an intelligent society, data transfer between processors and memory still hinders system performance. Additionally, increased power consumption due to increased bandwidth remains an issue. The memory industry is currently developing high-bandwidth memory capable of meeting system requirements, contributing to improved system performance, but now is the time to develop more innovative memory technologies to meet the requirements of an intelligent society.

To achieve this, we need to not only improve memory itself, but also combine computation with memory to minimize unnecessary bandwidth usage. In other words, we need to use a different system-in-package technology to place memory close to the processor. In addition, the need to change the architecture of computing systems makes cross-industry collaboration essential.

Memory-centric computing is a processing paradigm where data processing is performed near or within the storage and memory devices where data is generated or stored. This contrasts with traditional processor-centric computing where data processing is primarily done in the CPU or other processors. By bringing processing closer to the data, memory-centric computing aims to reduce data movement and access latency, leading to improved performance and energy efficiency.

Core Principles of Memory-Centric Computing:

Processing-in-Memory (PIM): This involves using memory chips or structures to perform computations directly on the data stored within them.
Processing-Near-Memory (PnM): This involves integrating processing logic into memory controllers or near the memory structures, enabling faster data access and processing.
Data Movement Reduction: By performing computations closer to where data is stored, the need to move large amounts of data to and from processors is reduced.
Reduced Latency and Energy Consumption: Data access becomes faster and more energy-efficient due to the proximity of processing to storage.
Exploitation of Parallelism: Memory arrays can be used to perform massively parallel operations, further enhancing processing speed.

Benefits of Memory-Centric Computing:

Improved Performance: Faster data access and reduced data movement lead to significant performance improvements, especially for memory-bound workloads.
Reduced Energy Consumption: Lower latency and less data movement translate to lower energy consumption.
Enhanced Scalability: Memory-centric systems can be more scalable as they can handle larger datasets and workloads.
Suitable for Various Applications: Memory-centric computing is well-suited for applications like databases, machine learning, video processing, and more.

Examples of Memory-Centric Computing:

Processing-in-DRAM (PIM): Using the operational characteristics of DRAM to perform computations.
In-Memory Computing: Running calculations directly within memory.
Disaggregated Computing: Separating processors from memory and allowing them to share memory through a network.

- Applications of HPC

High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.

HPC is the use of parallel processing and supercomputers to run advanced and complex application programs. The system focuses on developing parallel processing systems by integrating both administration and computational methods. And for decades, HPC and supercomputing were intrinsically linked.

HPC systems are used in a variety of applications in science and engineering, including:

Molecular dynamics simulation
Computational fluid dynamics
Climate modeling
Computational chemistry
Structural mechanics and engineering
Electromagnetic simulation
Seismic imaging and analysis
Materials science and engineering
Astrophysical simulation
Machine learning and data analysis

[More to come ...]

Document Actions

Send this

Sections

Personal tools