Personal tools

HPC Infrastructure

Summit Supercomputer
(Summit Supercomputer - Oak Ridge National Lab, U.S.A.)
 


- Overview

High-performance computing (HPC) infrastructure is a method of processing large amounts of data at high speeds using multiple computers and storage devices. One of the best-known types of HPC solutions is the supercomputer. 

HPC is a technology that uses clusters of powerful processors that work in parallel to process massive, multidimensional data sets and solve complex problems at extremely high speeds. HPC solves some of today's most complex computing problems in real-time.

HPC clusters are made up of hundreds or thousands of computers, called nodes, that are connected through a network. Each node is responsible for a different task. The computers use either high-performance multi-core CPUs or GPUs (graphical processing units). 

HPC can be run on-premises, in the cloud, or as a hybrid of both. HPC is used for solving complex problems in engineering, science, and business. For example, HPC can be used to simulate car crashes for structural design, molecular interaction for new drug design, or airflow over automobiles or airplanes. 

Some challenges of HPC infrastructure include: accelerating workloads while keeping costs down, protecting sensitive data, and driving smarter decisions from data.

 

- The Challenges of Building an HPC Infrastructure

High-performance computing (HPC) arose decades ago as an affordable and scalable method for tackling difficult math problems. Today, many organizations turn to HPC to approach complex computing tasks such as financial risk modeling, government resource tracking, spacecraft flight analysis and many other "big data" projects.

In today’s world, it’s vital to have access to HPC resources that can tackle whatever you throw at them. Simulation, data storage and analysis, artificial intelligence (AI), and machine learning (ML) technology all demand robust, scalable compute power. 

HPC infrastructures are complex by nature. With hundreds or even thousands of nodes running in parallel, architectural and management complexities abound. As technologies such as artificial intelligence (AI) and Internet of Things (IoT) become the norm, organizations across all industries are turning to HPC solutions to gain or maintain their competitive advantage. 

When building an HPC infrastructure, there are nearly endless options for the required compute, networking, and storage components. HPC capabilities include: large-scale HPC systems, high-speed networking, and multi-petabyte archival mass storage systems.

The challenges that many organizations struggle with are threefold: 

  • How do you choose the components that best meet your business and budget requirements?
  • How do you integrate the components into a solution that works seamlessly?
  • How do you make sure that your solution will be able to accommodate changes in workloads and storage requirements over time?

 

- HPC Architecture

In HPC architecture, a group of computers (nodes) collaborate to perform shared tasks. Each node in the structure accepts and processes tasks and computations independently. Nodes coordinate and synchronize tasks, ultimately producing a combined result. 

HPC architecture has mandatory and optional components. The mandatory components are: compute, storage, and network. The optional components are: GPU-accelerated systems, data management software, and infiniBand switch.

The three main components of HPC systems are: processor, accelerator, and networking connectivity. 

High-bandwidth memory is another critical consideration. The successful HPC architecture must have fast and reliable networking, whether for ingesting external data, moving data between computing resources, or transferring data to or from storage resources. 

Storage in an HPC system will function relatively fast to meet the needs of HPC workloads.

 

- To Build a HPC Architecture

HPC solutions have three main components: Compute, Network, Storage. To build a HPC architecture, compute servers are networked together into a cluster. An HPC cluster consists of hundreds or thousands of compute servers that are networked together. Each server is called a node. The nodes in each cluster work in parallel with each other, boosting processing speed to deliver high performance computing. 

Software programs and algorithms are run simultaneously on the servers in the cluster. The cluster is networked to the data storage to capture the output. Together, these components operate seamlessly to complete a diverse set of tasks. 

To operate at maximum performance, each component must keep pace with the others. For example, the storage component must be able to feed and ingest data to and from the compute servers as quickly as it is processed. Likewise, the networking components must be able to support the high-speed transportation of data between compute servers and the data storage. If one component cannot keep up with the rest, the performance of the entire HPC infrastructure suffers. 

 

The Château de Saumur_France_081421A
[The Château de Saumur, France]

- HPC in the Data Center

HPC combines hardware, software, systems management and data center facilities to support a large array of interconnected computers working cooperatively to perform a shared task too complex for a single computer to complete alone. 

Some businesses might seek to lease or purchase their HPC, and other businesses might opt to build an HPC infrastructure within their own data centers. 

Distributed HPC architecture poses some tradeoffs for organizations. The most direct benefits include scalability and cost management. Frameworks like Hadoop can function on just a single server, but an organization can also scale them out to thousands of servers. 

This enables businesses to build an HPC infrastructure to meet its current and future needs using readily available and less expensive off-the-shelf computers. Hadoop is also fault-tolerant and can detect and separate failed systems from the cluster, redirecting those failed jobs to available systems.

Building an HPC cluster is technically straightforward, but HPC deployments can present business challenges. Even with the ability to manage, scale and add nodes over time, the cost of procuring, deploying, operating and maintaining dozens, hundreds or even thousands of servers - along with networking infrastructure to support them - can become a substantial financial investment. 

Many businesses also have limited HPC needs and can struggle to keep an HPC cluster busy, and the money and training a business invests in HPC requires that deployment to work on business tasks to make it worthwhile.

 

- Top Considerations for HPC Infrastructure in the Data Center

HPC is a combination of hardware, software, systems management, and data center facilities that enables a group of computers to work together to complete a complex task. HPC can be a custom-built supercomputer or a cluster of individual computers. 

Some businesses may choose to lease or purchase their HPC, while others may build an HPC infrastructure within their own data centers.

Here are some things to consider when deciding whether HPC is right for your business:

  • Finance: When acquiring new hardware, you can consider buying outright, hire purchase, or leasing.
  • HPC workload: A typical HPC workload can place significant demands on the compute hardware and infrastructure that supports it.
  • Application requirements: Your application may require a low latency network or access to high performance storage.


Some benefits of leasing IT equipment include: access to the latest technology, predictable monthly expenses, no upfront costs, and Keeping up with competitors. 

However, leasing can cost more in the long run, and you may have to pay even if you don't use the technology.

 
 
 

[More to come ...]

 

 

Document Actions