Walter Scott, Jr. College of Engineering

Graduate Exam Abstract

Kelly Pracht
M.S. Final
Mar 22, 2007, 9AM
Engineering E103
Abstract: Improving the availability of high end computing systems is required to reduce user costs associated with downtime. One of the simplest methods of preventing downtime is duplicating components throughout the system to provide working alternatives should failure occur. This is known as redundancy and may be implemented at many different levels within the hardware of a single server. Subsystems that often employ redundant architecture are: CPUs, Memory systems, Nodes (CPUs and Memory), IO subsystems, storage systems, critical fans, and rail power subsystems. This thesis focuses on multiple server power supplies that operate in parallel such that should one power supply fail  there is a satisfactory method to keep the system fully operational. A server may have power redundancy located at the AC/DC conversion nodes as well as the numerous DC/DC conversion nodes throughout the system as will be discussed.
Todays methods of power redundancy at the DC-DC level are costly, not easily scalable, and are too complex. Although many server customers desire power redundancy down to the level of DC-DC conversion nodes, there are other customers that would not find this level of redundancy cost-feasible. In short, costs of downtime might not always justify increased costs for the server hardware for all customers. Therefore, a customer often wants a choice of purchasing the redundant subsystems only when needed and when business models justify the additional costs. It would be beneficial to the industry if a scalable redundant solution were offered that would allow for low-cost entry level solutions that can be seamlessly upgraded.
This thesis presents a new method for achieving such simple low cost entry level systems that can be upgraded to redundant solutions. The new method is called Warm Standby Failover. In this thesis a proof of concept is performed using industry standard voltage regulator modules (VRMs). This new redundancy topology demonstrates that it is possible to achieve a low cost customer-configurable power redundant option that is highly reliable and yet a cost improvement over many of todays current redundant power architectures.
In summary, this thesis both designs and tests Warm Standby Failover. This new DC-DC redundancy strategy proves to be both low cost and highly reliable. Additionally, Warm Standby Failover may be used in a number of applications  all of which can give a strategic advantage to systems desiring appropriate redundancy levels. In the server industry, Warm Standby Failover is a desirable method for reliably achieving power for a range of systems: from non-redundant entry level systems up to high-end redundant architectures. This thesis also suggests a future work pathway towards required timing and voltage deviations during failover intervals, more in-depth comparison of various topologies, different types of controllers, and use of fast phase detection method for better sensing emerging faults.
Adviser: George Collins
Non-ECE Member: Hiroshi Sakurai, Mechanical Engineering
Member 3: HJ Siegel, Electrical and Computer Engineering
Addional Members:
Program of Study: