With the tremendous amount of data transmitted over the world’s telecommunication systems and our growing reliance on their continued successful operation, network survivability has never been more important. Yet virtually every day, subscribers somewhere still endure service interruptions that affect their businesses, financial systems, phone systems, or any of a myriad other possible disruptions, up to and including their own personal health and safety, due to unforeseen failures of our networks. The work discussed in this thesis presents new techniques for the optimal design and analysis of network architectures that embody survivability mechanisms to ensure that any communications affected by such failures are restored quickly, efficiently, and inexpensively.
The main outcomes of the research presented are fourfold. First we provide a thorough discussion and analysis of the common mesh network survivability mechanisms; 1+1 automatic protection switching, span restoration, p-cycle restoration, shared backup path protection, and path restoration. We next introduce the metamesh concept, which is a form of span restoration suited to sparse networks that contain chains of degree-2 nodes. By targeting the loop-back spare capacity required within these chains by working traffic that flows entirely through them, meta-mesh restoration is able to provide substantial savings over conventional span restoration. We then address the problem of jointly optimizing a network’s topology as well as the working and restoration routing within it. We show that the complete problem is quite onerous to solve for anything but a very small network, and so we also develop a three-step heuristic that in most cases is actually able to outperform the complete Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. problem in terms of both runtime and solution quality. Finally, we introduce nodeinclusive span restoration, which is a completely new form of network restoration based on span restoration, but fundamentally part of the way towards path restoration. We show that node-inclusive span restoration is also capable of node-failure restoration, and in general, most network demands are fully restorable in the event of any node failure with little or no extra spare capacity. At the same time, spare capacity requirements of node-inclusive span restoration is shown to approach path restoration, particularly for highly connected networks.
The insights gained by the work of this thesis add to the growing understanding of the various issues related to network survivability, and has the potential to lower the costs for network operators, while simultaneously providing new options in the planning and development of their future networks. Implementation of design principles from this work will also lead to more reliable communication systems that are less vulnerable to equipment failures or attacks, and might eventually help to eliminate service interruptions that we all tolerate as a necessary aspect of modem telecommunication systems.