HPC Technology Trends

: [Cornell University]

- Overview

High-performance Computing (HPC) is the aggregation of computing power. This is the functional premise of HPC solutions: "the aggregation of multiple compute resources into an interconnected whole capable of tackling bigger problems in smaller amounts of time." While all computing power is somewhat aggregated. HPC is the aggregation of many different computers and systems to tackle one problem.

HPC is undergoing a period of rapid evolution involving changes in computational architectures and the types of problems that are being attacked. Aside from the traditional types of “hard” computational problems, there is now considerably more emphasis on “system-scale” simulation, data-intensive problems, and large-scale artificial intelligence/machine learning applications.

- Three Key Components of HPC Solutions

HPC integrates system management (including network and security knowledge) and parallel programming into a multidisciplinary field that combines digital electronics, computer architecture, system software, programming languages, algorithms, and computing technologies.

HPC technology is the tools and systems used to implement and create high-performance computing systems. Recently, HPC systems have moved from supercomputing to computing clusters and grids.

Due to the need for networking in clusters and meshes, HPC technology is being promoted through the use of collapsed network backbones because collapsed backbone architectures are easier to troubleshoot and upgrades can be applied to a single router rather than multiple routers.

There are three key components of high-performance computing (HPC) solutions: compute, network, and storage. Supercomputing is a form of HPC that determines or calculates using a powerful computer, a supercomputer, reducing overall time to solution. Supercomputers are made up of interconnects, I/O systems, memory and processor cores.

In order to develop a HPC architecture, multiple computer servers are networked together to form a cluster. Algorithms and software programs are executed concurrently on the servers in the cluster. To get the results, the cluster is networked to data storage.

These modules function together to complete different tasks. To achieve maximum efficiency, each module must keep pace with others, else the performance of the entire HPC infrastructure would deteriorate.

- HPC Clusters

Generally speaking, HPC is the use of large and powerful computers designed to efficiently handle mathematically intensive tasks. Although HPC "supercomputers" exist, such systems often elude the reach of all but the largest enterprises.

Instead, most businesses implement HPC as a group of relatively inexpensive, tightly integrated computers or nodes configured to operate in a cluster. Such clusters use a distributed processing software framework -- such as Hadoop and MapReduce -- to tackle complex computing problems by dividing and distributing computing tasks across several networked computers. Each computer within the cluster works on its own part of the problem or data set, which the software framework then reintegrates to provide a complete solution.

- Accessing Software on HPC Systems

HPC is the ability to process data and perform complex calculations at high speeds. To put it into perspective, a laptop or desktop with a 3 GHz processor can perform around 3 billion calculations per second. While that is much faster than any human can achieve, it pales in comparison to HPC solutions that can perform quadrillions of calculations per second.

One of the best-known types of HPC solutions is the supercomputer. A supercomputer contains thousands of compute nodes that work together to complete one or more tasks. This is called parallel processing. It’s similar to having thousands of PCs networked together, combining compute power to complete tasks faster.

As HPC systems are being used by many users with different requirements, they usually have multiple versions of frequently used software packages installed. As it is not easy to install and use many versions of a package at the same time, this system uses environment modules that allow users to configure the software environment with the specific version required.

- Supercomputer Infrastructure

Unlike traditional computers, supercomputers use more than one central processing unit (CPU). These CPUs are grouped into compute nodes, comprising a processor or a group of processors - symmetric multiprocessing (SMP) - and a memory block. At scale, a supercomputer can contain tens of thousands of nodes. With interconnect communication capabilities, these nodes can collaborate on solving a specific problem.

Nodes also use interconnects to communicate with I/O systems, like data storage and networking. A matter to note, because of modern supercomputers' power consumption, data centers require cooling systems and suitable facilities to house it all.

The use of several CPUs to achieve high computational rates is necessitated by the physical limits of circuit technology. Electronic signals cannot travel faster than the speed of light, which thus constitutes a fundamental speed limit for signal transmission and circuit switching.

This limit has almost been reached, owing to miniaturization of circuit components, dramatic reduction in the length of wires connecting circuit boards, and innovation in cooling techniques (e.g., in various supercomputer systems, processor and memory circuits are immersed in a cryogenic fluid to achieve the low temperatures at which they operate fastest).

Rapid retrieval of stored data and instructions is required to support the extremely high computational speed of CPUs. Therefore, most supercomputers have a very large storage capacity, as well as a very fast input/output capability.

- Supercomputers and AI

Although supercomputers have long been a necessity in fields such as physics and space science, the expanded use of artificial intelligence and machine learning has prompted a surge in demand for supercomputers capable of performing a quadrillion computations per second. In reality, the very next generation of supercomputers, known as exascale supercomputers, is enhancing efficiency in these areas.

Supercomputers, or, to put it another way, machines with accelerated hardware, are worthy of improving the velocity of the artificial intelligence system. It can train quicker, on larger, more detailed sets, along with more oriented and deeper training sets, thanks to its improved pace and ability.

- Supercomputers and Software Defined Networking

Supercomputers have long been characterized by their closed and proprietary interconnects. Software Defined Networking (SDN) allows for the programmatic flow management of network traffic.

The introduction of SDN into the High Performance Computing (HPC) world permits scientists to leverage a high speed and simplified model for access to the immense computing power of supercomputers. As increasingly larger experimental and observational sites come online, scientists have a need to not only store data, but to compute upon that data in near real-time.

SDN provides an avenue for the data from these experiments to flow directly into the supercomputer, without having to be staged first at an intermediary site, thus reducing the time to computation for that data. The newly gleaned results can then be used to guide the experiments in near real-time, allowing for more accurate data.

[More to come ...]

Document Actions

Send this

Sections

Personal tools