Cluster systems: Abstract: Computers and peripheral devices

The principle of their operation is based on the distribution of requests through one or more input nodes, which redirect them for processing to the remaining computing nodes. The initial goal of such a cluster is performance, however, they often also use techniques to improve reliability. Such structures are called server farms. Software can be either commercial (OpenVMS, MOSIX, Platform LSF HPC, Solaris Cluster, Moab Cluster Suite, Maui Cluster Scheduler) or free (OpenMosix, Sun Grid Engine, Linux Virtual Server).

Computing clusters

Clusters are used for computing purposes, particularly in scientific research. For computing clusters, significant indicators are high processor performance in floating-point operations (flops) and low latency of the interconnecting network, and less significant are the speed of I/O operations, which is more important for databases and web services. Computing clusters make it possible to reduce calculation time compared to a single computer by dividing the task into parallel executing branches that exchange data over an interconnecting network. One typical configuration is a collection of computers assembled from commonly available components, running the Linux operating system, and connected by Ethernet, Myrinet, InfiniBand, or other relatively inexpensive networks. Such a system is usually called a Beowulf cluster. High-performance clusters are specially identified (denoted by the English abbreviation HPC Cluster - High-performance computing cluster). List of the most powerful high-performance computers (can also be denoted by the English abbreviation HPC) can be found in the world ranking TOP500. Russia maintains a rating of the most powerful computers in the CIS.

Distributed computing systems (grid)

Such systems are not usually considered clusters, but their principles are largely similar to cluster technology. They are also called grid systems. The main difference is the low availability of each node, that is, the impossibility of guaranteeing its operation at a given point in time (nodes are connected and disconnected during operation), therefore the task must be divided into a number of processes independent of each other. Such a system, unlike clusters, is not like a single computer, but serves as a simplified means of distributing calculations. The instability of the configuration, in this case, is compensated by a large number of nodes.

Cluster of servers organized programmatically

Cluster systems occupy a worthy place in the list of the fastest, while significantly outperforming supercomputers in price. As of July 2008, the SGI Altix ICE 8200 cluster (Chippewa Falls, Wisconsin, USA) is in 7th place in the TOP500 rating.

A relatively cheap alternative to supercomputers are clusters based on the Beowulf concept, which are built from ordinary inexpensive computers based on free software. One practical example of such a system is the Stone Soupercomputer at Oak Ridge National Laboratory (Tennessee, USA, 1997).

The largest cluster owned by a private individual (of 1000 processors) was built by John Koza.

Story

The history of cluster creation is inextricably linked with early developments in the field of computer networks. One of the reasons for the emergence of high-speed communication between computers was the hope of pooling computing resources. In the early 1970s, the TCP/IP development team and the Xerox PARC laboratory established networking standards. The Hydra operating system for PDP-11 computers produced by DEC also appeared; the cluster created on this basis was named C.mpp (Pittsburgh, Pennsylvania, USA, 1971). However, it wasn't until around 1983 that mechanisms were created to make it easy to distribute tasks and files over a network, mostly from SunOS (a BSD-based operating system from Sun Microsystems).

The first commercial project of the cluster was ARCNet, created by Datapoint in 1977. It did not become profitable, and therefore cluster construction did not develop until 1984, when DEC built its VAXcluster based on the VAX/VMS operating system. ARCNet and VAXcluster were designed not only for joint computing, but also for sharing the file system and peripherals, taking into account the preservation of data integrity and unambiguity. VAXCluster (now called VMSCluster) is an integral component of the OpenVMS operating system using DEC Alpha and Itanium processors.

Two other early cluster products that gained recognition include Tandem Hymalaya (1994, class ) and IBM S/390 Parallel Sysplex (1994).

The history of creating clusters from ordinary personal computers owes much to the Parallel Virtual Machine project. In 1989, this software for linking computers together into a virtual supercomputer made it possible to create clusters instantly. As a result, the total performance of all cheap clusters created at that time surpassed in performance the sum of the capacities of “serious” commercial systems.

The creation of clusters based on cheap personal computers connected by a data transmission network continued in 1993 by the American Aerospace Agency NASA, then in 1995 Beowulf clusters, specially designed based on this principle, were developed. The success of such systems has stimulated the development

Blue Gene /L and SGI Altix families.

Windows Compute Cluster Server (CCS) 2003 is considered as the basic software for organizing computing on cluster systems. Its general characteristics and the composition of services running on cluster nodes are given.

At the end of this section, the rules for working with the console for launching and managing CCS jobs are given. Describes the details of how the CCS scheduler works when executing sequences of jobs on a cluster.

1.1. Architecture of high-performance processors and cluster systems

In the history of the development of computer processor architecture, two major stages can be distinguished:

Stage 1 - increasing the clock frequency of processors (up to 2000),
Stage 2 - the emergence of multi-core processors (after 2000)

Thus, the approach based on SMP (Symmetrical MultiProcessing), which developed when building high-performance servers in which several processors share system resources, and, first of all, RAM (see Figure 1.1), has shifted “down” to the level of cores inside processor.

Rice. 1.1.

On the way to multi-core processors, the first technology to appear was Hyper-Threading, first used in 2002 in Intel Pentium 4 processors:

Rice. 1.2.

In this technology, two virtual processors share all the resources of one physical processor, namely caches, execution pipeline and separate execution units. Moreover, if one virtual processor has occupied a shared resource, then the second will wait for it to be released. Thus, a processor with Hyper-Threading can be compared to a multitasking operating system that provides each process running in it with its own virtual computer with a full set of tools and plans the order and time of operation of these processes on physical hardware. Only in the case of Hyper-Threading, all this happens at a much lower hardware level. However, two instruction streams allow the processor's execution units to be loaded more efficiently. The real increase in processor performance from the use of Hyper-Threading technology is estimated from 10 to 20 percent.

A full-fledged dual-core processor (see Fig. 1.3) demonstrates a performance increase of 80 to 100 percent on certain tasks.

Rice. 1.3.

Thus, a dual-core and, in general, a multi-core processor can be considered as a miniature SMP system, in which there is no need to use complex and expensive multiprocessor motherboards.

Moreover, each core can (as, for example, in the Intel Pentium Extreme Edition 840 processor) support Hyper-Threading technology, and therefore this kind of dual-core processor can execute four program threads simultaneously.

In early 2007, Intel introduced an 80-core single-chip processor called the Teraflops Research Chip (http://www.intel.com/research/platform/terascale/teraflops.htm). This processor can achieve 1.01 teraflops of performance with a minimum core clock speed of 3.16 GHz and a voltage of 0.95 V. However, the total Energy consumption chip is only 62 watts.

According to Intel forecasts, commercial versions of processors with a large number of cores will appear in the next 5 years, and by 2010, a quarter of all servers shipped will have teraflops. performance.

Cluster computing systems and their architecture

Cluster is a local (located geographically in one place) computing system, consisting of many independent computers and a network connecting them. In addition, the cluster is a local system because it is managed within a separate administrative domain as a single computer system.

Computer nodes of which it is composed are standard, general-purpose (personal) computers used in a variety of fields and for a variety of applications. A computing node can contain either one microprocessor or several, forming, in the latter case, a symmetric (SMP) configuration.

The network component of a cluster can be either a regular local network or be built on the basis of special network technologies that provide ultra-fast data transfer between cluster nodes. The cluster network is designed to integrate cluster nodes and is usually separate from the external network through which users access the cluster.

Cluster software consists of two components:

development/programming tools and
resource management tools.

Development tools include compilers for languages, libraries for various purposes, performance measurement tools, as well as debuggers, which, all together, allow you to build parallel applications.

Resource management software includes installation, administration, and workflow scheduling tools.

Although there are many programming models for parallel processing, the currently dominant approach is the message passing model, implemented in the form of the MPI (Message Passing Interface) standard. MPI is a library of functions that can be used in programs written in C or Fortran to pass messages between parallel processes and also control these processes.

Alternatives to this approach are languages based on the so-called “global partitioned address space” (GPAS), typical representatives of which are the HPF (High Performance Fortran) and UPC (Unified Parallel C) languages.

Some thoughts on when it makes sense to use high availability clusters to protect applications.

One of the main tasks when operating an IT system in any business is to ensure the continuity of the service provided. However, very often both engineers and IT managers do not have a clear idea of what “continuity” specifically means in their business. In the author’s opinion, this is due to the ambiguity and vagueness of the very concept of continuity, which is why it is not always possible to clearly say which sampling period is considered continuous and which interval will be the period of inaccessibility. The situation is aggravated by the multitude of technologies designed to ultimately solve one common problem, but in different ways.

Which technology should be chosen in each specific case to solve the assigned problems within the available budget? In this article we will take a closer look at one of the most popular approaches to protecting applications, namely introducing hardware and software redundancy, i.e. building a high availability cluster. This task, despite the apparent simplicity of implementation, is actually very difficult to fine-tune and operate. In addition to describing well-known configurations, we will try to show what other capabilities - not very often used - are available in such solutions, how different implementations of clusters are structured. In addition, we would often like the customer, having seriously weighed all the advantages of the cluster approach, to still keep in mind its disadvantages, and therefore consider the entire range of possible solutions.

What threatens applications...

According to various estimates, 55-60% of applications are critical for a company's business - this means that the absence of the service that these applications provide will seriously affect the financial well-being of the company. In this regard, the concept of accessibility becomes a fundamental aspect in the operation of a data center. Let's take a look at where application availability threats come from.

Data destruction. One of the main problems is the availability of the service. The simplest method of protection is to take frequent “snapshots” of data in order to be able to return to a complete copy at any time.

Hardware failure. Manufacturers of hardware systems (servers, disk storage) produce solutions with redundant components - processor boards, system controllers, power supplies, etc. However, in some cases, a hardware malfunction can lead to unavailability of applications.

Application error. A programmer error in an application that has already been tested and put into production can occur in one case in tens or even hundreds of thousands, but if such an incident does occur, it leads to a direct loss of profit for the organization, since transaction processing stops, and the method for eliminating the error is not obvious and takes time.

Human error. A simple example: an administrator makes changes to the settings of configuration files, for example, DNS. When he tests the changes, the DNS service works, but a service that uses DNS, such as email, begins to experience problems that are not immediately detected.

Scheduled maintenance. System maintenance - replacing components, installing service packs, rebooting - is the main reason for unavailability. Gartner estimates that 80% of the time a system is unavailable is due to planned downtime.

Common problems on the computing site. Even if an organization does everything to protect itself from local problems, this does not guarantee the availability of the service if for some reason the entire site is unavailable. This must also be taken into account when planning the system.

...and how to deal with it

Depending on the criticality of the task, the following mechanisms can be used to restore the functionality of the computing system.

Backup data to tape or disk media. This is the basic level of accessibility - the simplest, cheapest, but also the slowest.

Local mirroring. Provides real-time data availability, data is protected from destruction.

Local clustering. Once data protection is in place, the next step in ensuring application availability is local clustering, i.e., creating redundancy in both hardware and software.

Remote replication. Here it is assumed that the computing sites are distributed in order to create a copy of the data in distributed data centers.

Remote clustering. Since the availability of data on different sites is ensured, it is also possible to maintain the availability of the service from different sites by organizing application access to this data.

We will not dwell here on the description of all these methods, since each point may well become the topic of a separate article. The idea is transparent - the more redundancy we introduce, the higher the cost of the solution, but the better the applications are protected. For each of the methods listed above, there is an arsenal of solutions from different manufacturers, but with a standard set of capabilities. For the solution designer, it is very important to keep all these technologies in mind, since only a competent combination of them will lead to a comprehensive solution to the problem set by the customer.

In the author’s opinion, Symantec’s approach is very successful for understanding the service recovery strategy (Fig. 1). There are two key points here - the point at which the system is restored (recovery point objective, RPO), and the time required to restore the service (recovery time objective, RTO).

The choice of a particular tool depends on the specific requirements for a critical application or database.

For the most critical systems, RTO and RPO should not exceed 1 hour. Tape backup-based systems provide a recovery point of two or more days. In addition, recovery from tape is not automated; the administrator must constantly remember whether he has restored everything properly and launched it.

Moreover, as already mentioned, when planning an accessibility scheme, one tool is not enough. For example, it hardly makes sense to use only a replication system. Even though critical data is located at a remote site, applications must be launched in the appropriate order manually. Thus, replication without automatically starting applications can be considered a type of expensive backup.

If you need to provide RTO and RTS measured in minutes, i.e. the task requires minimizing downtime (both planned and unplanned), then the only right solution is a high availability cluster. This article discusses just such systems.

Due to the fact that the concept of “computing cluster” has been overloaded for some time due to their great diversity, first let’s say a little about what kinds of clusters there are.

Types of clusters

In its simplest form, a cluster is a system of computers operating together to jointly solve problems. This is not a client-server processing model, where the application can be logically separated so that clients can route requests to different servers. The idea of a cluster is to pool the computing resources of related nodes to create redundant resources that provide greater shared computing power, high availability, and scalability. Thus, clusters do not simply process client requests to servers, but simultaneously use many computers, presenting them as a single system and thereby providing significantly greater computing capabilities.

A cluster of computers must be a self-organizing system - work performed on one of the nodes must be coordinated with work on other nodes. This leads to the complexity of configuration connections, difficult communications between cluster nodes, and the need to solve the problem of accessing data in a common file system. There are also operational issues associated with operating a potentially large number of computers as a single resource.

Clusters can exist in various forms. The most common types of clusters include high performance computing (HPC) systems and high availability (HA) systems.

High-performance computing clusters use parallel computing methods using as much processor power as possible to solve a given problem. There are many examples of such solutions in scientific computing, where many low-cost processors are used in parallel to perform a large number of operations.

However, the topic of this article is high availability systems. Therefore, further, when speaking about clusters, we will have in mind precisely such systems.

Typically, when building high-availability clusters, redundancy is used to create a reliable environment, i.e., a computing system is created in which the failure of one or more components (hardware, software or networking) does not have a significant impact on the availability of the application or system generally.

In the simplest case, these are two identically configured servers with access to a shared data storage system (Fig. 2). During normal operation, application software runs on one system while a second system waits for applications to run when the first system fails. When a failure is detected, the second system takes over the corresponding resources (file system, network addresses, etc.). This process is usually called failover. The second system completely replaces the failed one, and the user does not need to know that his applications are running on different physical machines. This is the most common two-node asymmetric configuration, where one server is active, the other is passive, that is, it is in a standby state in case the main one fails. In practice, this is the scheme that works in most companies.

However, the question must be asked: how acceptable is it to keep an additional set of equipment that is essentially in reserve and is not used most of the time? The problem with unloaded equipment is solved by changing the cluster scheme and allocating resources in it.

Cluster configurations

In addition to the two-node asymmetric cluster structure mentioned above, there are possible options that may have different names from different cluster software manufacturers, but their essence is the same.

Symmetrical cluster

The symmetric cluster is also made on two nodes, but on each of them there is an active application running (Fig. 3). Cluster software ensures correct automatic transition of an application from server to server if one of the nodes fails. In this case, loading the hardware is more efficient, but if a fault occurs, the entire system's applications are running on one server, which can have undesirable performance consequences. In addition, you need to consider whether it is possible to run multiple applications on the same server.

N+1 configuration

This configuration already includes more than two nodes, and among them there is one dedicated, backup one (Fig. 4). In other words, for every N running servers, there is one in hot standby. In the event of a malfunction, the application from the problematic node will “move” to a dedicated free node. In the future, the cluster administrator will be able to replace the failed node and designate it as a backup one.

The N+1 variation is a less flexible N to 1 configuration where the backup node always remains constant for all worker nodes. If the active server fails, the service switches to the backup one, and the system remains without a backup until the failed node is activated.

Of all the cluster configurations, N+1 is probably the most effective in terms of complexity and equipment efficiency. The table below 1 confirms this assessment.

N to N configuration

This is the most efficient configuration in terms of the level of use of computing resources (Fig. 5). All servers in it are working, each of them runs applications included in the cluster system. If a failure occurs on one of the nodes, applications are moved from it in accordance with established policies to the remaining servers.

When designing such a system, it is necessary to take into account the compatibility of applications, their connections when “moving” from node to node, server load, network bandwidth, and much more. This configuration is the most complex to design and operate, but it provides the most bang for your buck when using clustered redundancy.

Evaluating Cluster Configurations

In table 1 summarizes what has been said above about the various cluster configurations. The rating is given on a four-point scale (4 is the highest score, 1 is the lowest).

From the table 1 it can be seen that the classical asymmetric system is the simplest in terms of design and operation. And if the customer can operate it independently, then it would be correct to transfer the rest to external maintenance.

In conclusion of the conversation about configurations, I would like to say a few words about the criteria according to which the cluster core can automatically give a command to “move” an application from node to node. The overwhelming majority of administrators define only one criterion in the configuration files - the inaccessibility of any component of the node, i.e. a hardware or software error.

Meanwhile, modern cluster software provides the ability to load balance. If the load on one of the nodes reaches a critical value, with a correctly configured policy, the application on it will be shut down correctly and launched on another node where the current load allows this. Moreover, server load control tools can be either static - the application itself specifies in the cluster configuration file how many resources it will need - or dynamic, when the load balancer is integrated with an external utility (for example, Precise), which calculates the current system load.

Now, to understand how clusters work in specific implementations, let's consider the main components of any high availability system.

Main cluster components

Like any complex complex, a cluster, regardless of its specific implementation, consists of hardware and software components.

As for the hardware on which the cluster is assembled, the main component here is an internode connection or internal cluster interconnect, which provides physical and logical connection between servers. In practice, this is an internal Ethernet network with duplicate connections. Its purpose is, firstly, the transmission of packets confirming the integrity of the system (the so-called heartbeat), and secondly, with a certain design or scheme that arose after a fault occurred, the exchange of information traffic between nodes intended for transmission outside. Other components are obvious: nodes running an OS with cluster software, disk storage to which cluster nodes have access. And finally, the common network through which the cluster interacts with the outside world.

Software components provide control over the operation of the cluster application. First of all, it is a common OS (not necessarily a common version). The core of the cluster - cluster software - runs in the environment of this OS. Those applications that are clustered, that is, can migrate from node to node, are controlled - started, stopped, tested - by small scripts, so-called agents. There are standard agents for most tasks, but at the design stage it is imperative to check using the compatibility matrix whether there are agents for specific applications.

Cluster implementations

There are many implementations of the cluster configurations described above on the software market. Almost all the largest server and software manufacturers - for example, Microsoft, HP, IBM, Sun, Symantec - offer their products in this area. Microtest has experience working with Sun Cluster Server (SC) solutions from Sun Microsystems (www.sun.com) and Veritas Cluster Server (VCS) from Symantec (www.symantec.com). From an administrator's point of view, these products are very similar in functionality - they provide the same settings and reactions to events. However, in terms of their internal organization, these are completely different products.

SC was developed by Sun for its own Solaris OS and therefore runs only on that OS (both SPARC and x86). As a result, SC during installation is deeply integrated with the OS and becomes part of it, part of the Solaris kernel.

VCS is a multi-platform product, works with almost all currently popular operating systems - AIX, HP-UX, Solaris, Windows, Linux, and is an add-on - an application that controls the operation of other applications that are subject to clustering.

We will look at the internal implementation of these two systems - SC and VCS. But let us emphasize once again that despite the difference in terminology and completely different internal structure, the main components of both systems with which the administrator interacts are essentially the same.

Sun Cluster Server Software Components

The SC core (Fig. 6) is the Solaris 10 (or 9) OS with a built-in shell that provides high availability functionality (the core is highlighted in green). Next are the global components (light green), which provide their services received from the cluster core. And finally, at the very top are custom components.

The HA framework is a component that extends the Solaris kernel to provide clustered services. The framework task begins with initializing the code that boots the node into cluster mode. The main tasks of the framework are inter-node interaction, management of the cluster state and membership in it.

The inter-node communication module transmits heartbeating messages between nodes. These are short messages confirming the response of a neighboring node. Communication between data and applications is also managed by the HA framework as part of the inter-node communication. In addition, the framework manages the integrity of the cluster configuration and performs recovery and update tasks when necessary. Integrity is maintained through the quorum device; reconfiguration is performed if necessary. A quorum device is an additional mechanism for verifying the integrity of cluster nodes through small portions of the shared file system. In the latest version of the SC 3.2 cluster, it became possible to assign a quorum device outside the cluster system, that is, use an additional server on the Solaris platform, accessible via TCP/IP. Failing cluster members are removed from the configuration. An element that becomes operational again is automatically included in the configuration.

The functions of the global components are derived from the HA framework. These include:

global devices with a common cluster device namespace;
a global file service that organizes access to every file in the system for each node as if it were in its own local file system;
a global network service that provides load balancing and the ability to access clustered services over a single IP.

User components manage the cluster environment at the top level of the application interface. It is possible to administer both through the graphical interface and through the command line. Modules that monitor applications and start and stop them are called agents. There is a library of ready-made agents for standard applications; This list grows with each release.

Veritas Cluster Server Software Components

A two-node VCS cluster is shown schematically in Fig. 7. Inter-node communication in VCS is based on two protocols - LLT and GAB. VCS uses an internal network to maintain cluster integrity.

LLT (Low Latency Transport) is a protocol developed by Veritas that operates on top of Ethernet as a high-performance replacement for the IP stack and is used by nodes in all internal communications. The required redundancy in inter-node communications requires at least two completely independent internal networks. This is necessary so that the VSC can differentiate between a network fault and a system fault.

The LLT protocol performs two main functions: traffic distribution and heartbeating. LLT distributes (balances) inter-node communication among all available internal links. This design ensures that all internal traffic is randomly distributed across internal networks (there can be a maximum of eight), which improves performance and fault tolerance. If one link fails, the data will be redirected to the remaining others. In addition, the LLT is responsible for sending heartbeat traffic through the network, which is used by the GAB.

GAB (Group Membership Services/Atomic Broadcast) is the second protocol used by VCS for internal communication. He, like the LLT, is responsible for two tasks. The first is the membership of nodes in the cluster. GAB receives heartbeat from each node via LLT. If the system does not receive a response from a node for a long time, then it marks its state as DOWN - non-working.

The second function of GAB is to ensure reliable inter-cluster communication. GAB provides guaranteed delivery of broadcasts and point-to-point messages between all nodes.

The control component of VCS is the VCS engine, or HAD (High Availability daemon), running on each system. She is responsible for:

building working configurations obtained from configuration files;
distribution of information between new nodes joining the cluster;
processing input from the cluster administrator (operator);
performing normal actions in case of failure.

HAD uses agents to monitor and manage resources. Information about the state of resources is collected from agents on local systems and transmitted to all members of the cluster. Each node's HAD receives information from other nodes, updating its own picture of the entire system. HAD acts as a replicated state machine RSM, i.e. the core on each node has a picture of the resource state that is completely synchronized with all other nodes.

The VSC cluster is managed either through the Java console or via the Web.

What's better

We have already discussed the question of when it is better to use which cluster. Let us emphasize once again that the SC product was written by Sun for its own OS and is deeply integrated with it. VCS is a multi-platform product, and therefore more flexible. In table 2 compares some of the capabilities of these two solutions.

In conclusion, I would like to give one more argument in favor of using SC in the Solaris environment. Using both equipment and software from a single manufacturer - Sun Microsystems, the customer receives a “single window” service for the entire solution. Despite the fact that vendors are now creating common centers of competence, the time it takes to transmit requests between software and hardware manufacturers will reduce the speed of response to an incident, which does not always suit the user of the system.

Geographically distributed cluster

We looked at how a high availability cluster is built and operates within one site. This architecture can only protect against local problems within a single node and the data associated with it. In the event of problems affecting the entire site, be it technical, natural or otherwise, the entire system will be unavailable. Today, tasks are increasingly arising, the criticality of which requires the migration of services not only within the site, but also between geographically dispersed data centers. When designing such solutions, new factors have to be taken into account - the distance between sites, channel capacity, etc. Which replication should be preferred - synchronous or asynchronous, host-based or array-based, what protocols should be used? The success of the project may depend on the resolution of these issues.

Data replication from the main site to the backup site is most often performed using one of the popular packages: Veritas Volume Replicator, EMC SRDF, Hitachi TrueCopy, Sun StorageTek Availability Suite.

If there is a hardware failure or problem with the application or database, the cluster software will first try to transfer the application service to another node on the main site. If the main site for any reason is inaccessible to the outside world, all services, including DNS, migrate to the backup site, where, thanks to replication, data is already present. Thus, the service is resumed for users.

The disadvantage of this approach is the huge cost of deploying an additional “hot” site with equipment and network infrastructure. However, the benefit of complete protection may outweigh these additional costs. If the central node is unable to provide service for a long time, this can lead to major losses and even the death of the business.

Testing the system before disaster

According to a study conducted by Symantec, only 28% of companies test their disaster recovery plan. Unfortunately, most of the customers with whom the author had to talk about this issue did not have such a plan at all. The reasons why testing is not carried out are lack of time for administrators, reluctance to do it on a “live” system and lack of testing equipment.

For testing, you can use the simulator included in the VSC package. Users who choose VCS as their clustering software can test their setup using the Cluster Server Simulator, which allows them to test their application migration strategy between nodes on a PC.

Conclusion

The task of providing a service with a high level of availability is very expensive, both in terms of the cost of equipment and software, and in terms of the cost of further maintenance and technical support of the system. Despite the apparent simplicity of the theory and uncomplicated installation, the cluster system, when studied in depth, turns out to be a complex and expensive solution. In this article, the technical side of the system’s operation was considered only in general terms, however, a separate article could be written on individual issues of the cluster’s operation, for example, determining membership in it.

Clusters are usually built for business-critical tasks, where one unit of downtime results in large losses, for example, for billing systems. The following rule could be recommended to determine where it is reasonable to use clusters: where the downtime of a service should not exceed an hour and a half, a cluster is a suitable solution. In other cases, you can consider less expensive options.

Cluster (group of computers)

Load Sharing Clusters

Computing clusters

Distributed computing systems (grid)

Cluster of servers organized programmatically

The largest privately owned cluster (1000 processors) was built by John Koza.

Story

The history of cluster creation is inextricably linked with early developments in the field of computer networks. One of the reasons for the emergence of high-speed communication between computers was the hope of pooling computing resources. In the early 1970s. The TCP/IP protocol development group and the Xerox PARC laboratory established networking standards. The Hydra operating system also appeared for PDP-11 computers manufactured by DEC, the cluster created on this basis was named C.mpp (Pittsburgh, Pennsylvania, USA). However, it wasn't until around 1999 that mechanisms were created to make it easy to distribute tasks and files over a network, mostly from SunOS (a BSD-based operating system from Sun Microsystems).

The first commercial cluster project was ARCNet, created by Datapoint in the city. It did not become profitable, and therefore the construction of clusters did not develop until DEC built its VAXcluster based on the VAX/VMS operating system. ARCNet and VAXcluster were designed not only for joint computing, but also for sharing the file system and peripherals, taking into account the preservation of data integrity and unambiguity. VAXCluster (now called VMSCluster) is an integral component of the OpenVMS operating system using Alpha and Itanium processors.

Two other early cluster products that gained recognition include Tandem Hymalaya (HA class) and IBM S/390 Parallel Sysplex (1994).

The history of creating clusters from ordinary personal computers owes much to the Parallel Virtual Machine project. This software for connecting computers into a virtual supercomputer opened up the possibility of instantly creating clusters. As a result, the total performance of all cheap clusters created at that time surpassed in performance the sum of the capacities of “serious” commercial systems.

The creation of clusters based on cheap personal computers connected by a data transmission network continued in the city by the American Aerospace Agency (NASA), then Beowulf clusters, specially designed based on this principle, were developed in the city. The success of such systems spurred the development of grid networks, which have existed since the creation of UNIX.

Software

A widely used tool for organizing inter-server communication is the MPI library, which supports languages and Fortran. It is used, for example, in the weather modeling program MM5.

The Solaris operating system provides Solaris Cluster software, which provides high availability and availability for servers running Solaris. There is an open source implementation for OpenSolaris called OpenSolaris HA Cluster.

Several programs are popular among GNU/Linux users:

distcc, MPICH, etc. are specialized tools for parallelizing the work of programs. distcc allows parallel compilation in the GNU Compiler Collection.
Linux Virtual Server, Linux-HA - node software for distributing requests between computing servers.
MOSIX, openMosix, Kerrighed, OpenSSI - full-featured cluster environments built into the kernel, automatically distributing tasks among homogeneous nodes. OpenSSI, openMosix and Kerrighed create between nodes.

Cluster mechanisms are planned to be built into the DragonFly BSD kernel, which was forked in 2003 from FreeBSD 4.8. Long-term plans also include turning it into unified operating system environment.

Microsoft produces an HA cluster for the Windows operating system. There is an opinion that it was created based on Digital Equipment Corporation technology, supports up to 16 (since 2010) nodes in a cluster, as well as operation in a SAN (Storage Area Network). A set of API interfaces is used to support distributed applications; there are preparations for working with programs that do not allow for work in a cluster.

Windows Compute Cluster Server 2003 (CCS), released in June 2006, is designed for high-end applications that require cluster computing. The publication is designed for deployment on many computers that are assembled into a cluster to achieve the power of a supercomputer. Each cluster on Windows Compute Cluster Server consists of one or more master machines that distribute tasks and several slave machines that perform the main work. In November 2008, Windows HPC Server 2008 was introduced, designed to replace Windows Compute Cluster Server 2003.

Thematic materials: