White Papers

Fault-Tolerant Cluster Management for Reliable High-Performance Computing

Overview Clusters of COTS workstations/PCs are commonly used to implement cost-effective high-performance systems. A central coordinator/manager is often the simplest way to implement many of the operations required for managing these distributed systems. These operations include scheduling of parallel tasks, coordination of access to limited resources, as well as high-level coordination of fault tolerance mechanisms and interactions with external devices. A key disadvantage of using a central manager is that it becomes a critical single point of failure. The UCLA Fault-Tolerant Cluster Testbed (FTCT) project is focused on the implementation of fault-tolerant management for clusters. This paper describes key aspects of the design and implementation of the FTCT and provides preliminary evaluation of the overheads incurred by the management mechanisms.

Download White Paper

By downloading you agree to our Terms and Conditions. These include information regarding use of your personal data.

Publisher
University of California
File Format
PDF
Date Published
Oct 22, 2008
Format
White Papers
Topics
Fault-Tolerant Servers, High Performance Computing

Similiar White Papers

Virtualization delivers IT and business benefits for SMBs

Virtualization delivers IT and business benefits for SMBs

Virtualization leader VMware has been a pioneer in both enterprise and SMB virtualization deployments, particularly in s

Publisher: VMware  |  Tags: disaster recovery, it management, management, server, smb, tco

Fault Tolerance Management for a Hierarchical GridRPC Middleware

Fault Tolerance Management for a Hierarchical GridRPC Middleware

GridRPC middleware are usually managing failures by using TCP or other link network layer provided failure detector, aut

Publisher: French National Institute for Research in Computer Science and Control  |  Tags: api, middleware, network

Context Information Based Fault Tolerant Technique in Mobile Grid

Context Information Based Fault Tolerant Technique in Mobile Grid

Mobile grids extend the concepts of grid computing through performance improvement of mobile devices and the development

Publisher: Korea University  |  Tags: computing, context, grid computing, mobile devices, mobility, server

University of California White Papers

Stateless Load Balancing Over Multiple MPLS Paths

Stateless Load Balancing Over Multiple MPLS Paths

The paper proposes a flow-independent approach to balance the load coming from several multimedia applications (i.e., IP

Publisher: University of California  |  Tags: applications, ip, mpls, network

Escape From the Computer Lab: Education in Mobile Wireless Networks

Escape From the Computer Lab: Education in Mobile Wireless Networks

As mobile wireless network technology becomes widespread, the importance of education about this new form of communicati

Publisher: University of California  |  Tags: computing, mobile wireless, mobility, network, portable devices, university of california

Parallel Spectral Clustering Algorithm for Large-Scale Community Data Mining

Parallel Spectral Clustering Algorithm for Large-Scale Community Data Mining

The spectral clustering algorithm has been shown to be very effective in finding clusters of non-linear boundaries. Unfo

Publisher: University of California

Directed Diffusion for Wireless Sensor Networking

Directed Diffusion for Wireless Sensor Networking

Advances in processor, memory and radio technology will enable small and cheap nodes capable of sensing, communication a

Publisher: University of California  |  Tags: data, network

Mesh Topology Construction for Interconnected Wireless LANs

Mesh Topology Construction for Interconnected Wireless LANs

The 802.11s working group has been formed recently to recommend an Extended Service Set (ESS) that enables wider area co

Publisher: University of California  |  Tags: network