Sequential Decision-Making in Networking Algorithms Using Deep Reinforcement Learning

Networking algorithms perform sequential decision-making on the Internet, where they take decisions, e.g., on when to transmit a packet. Traditional networking algorithms use fixed mappings between network-level events and control responses, which may result in inefficient usage of network resources in some network environments. The main issue with existing networking algorithms is the lack of generalization to diverse network environments. This results in poor performance in a wide variety of network environments.

This dissertation proposes new techniques in reinforcement learning (RL) to solve the lack of generalization issue. We propose new RL techniques and study them in congestion control, network adaptive coding, and adaptive bitrate selection. First, we present Pareto, a congestion control algorithm fueled by deep reinforcement learning (DRL). Different from existing RL-based congestion control algorithms, Pareto uses expert demonstrations, a new staged training process, multi-agent RL fairness training framework, and online adaptation to new environments. All of these techniques enable Pareto to perform well in a wide variety of environments in terms of high throughput, low latency, low loss rate and fairness to competing flows.

In network adaptive coding, we propose Ivory, a new DRL-based network adaptive coding algorithm that utilizes an existing low-latency forward error correction (FEC) scheme. Ivory chooses better coding parameters to ensure a low loss rate and latency while having lower coding overhead compared to the state-of-the-art. Lower coding overhead makes Ivory a better fit for limited bandwidth networks.

Since interactions with the environment are costly in online RL, we propose Lethe, a new online RL approach to quickly acquire new knowledge by biasing against old interfering knowledge. In Lethe, we solve issues resulting from interference between knowledge acquired in different time slots. On a different note, in federated learning (FL), interference occurs when the server simultaneously accumulates models from clients with interfering objectives. To avoid inter-client interference, we propose Cascade, a curriculum-federated RL framework. Cascade is tested by training an adaptive bitrate selection algorithm, and it achieves a better asymptotic behavior with fairer knowledge accumulation as compared to other FL algorithms. This results in a better performance over diverse network environments.