Q-Routing Revisited (Part 1)

In Learning of Routing Revisited we broadened our view to get an overview of the various approaches that have been developed to improve routing in networks with a focus on learning techniques. Here, we want to revisit the Q-Routing approach[1]Boyan, J. and Littman, M. (1994). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach. that has been introduced more than 20 years ago. Neural Networks  showed “inconclusive” in the work of Boyan and Littman back then – something we want to keep in mind against the backdrop of Neural Networks being so prominent in the current Machine Learning discourse. While revisiting the approach we also want to focus on the performance metrics applied and the environmental factors considered.

The “Q-Routing” algorithm is a version of Bellman-Ford’s algorithm (a distance vector algorithm, cp. link state algorithms) for finding shortest paths from a starting node in a graph with weighted edges. What was Bellman-Ford’s algorithm again? The core idea is to run from a starting node to its neighboring nodes in the graph and record the weights along the path. This can be extended to all the other nodes left in the graph – by using the knowledge of how best to reach predecessor nodes via which the node in focus can be reached now. Reiterating this and updating the weight sums eventually leads to the shortest paths between the starting node and all other destination nodes.

Important here: the weights (or distance metrics) on the edges are preset. They have been declared by someone in a way that some property of the edge is reflected (e.g. very high weights for low capacity links, or, in the Routing Information Protocol (RIP) which implements Bellman-Ford, the hop count only). The shortest paths will be found via this algorithm as long as the weights remain unchanged. Once they change, the algorithm would need to start over to come up with the optimal paths. But this change needs to come about somehow – 1) either the “declaration agent” senses that changes in the edge properties just demand an update of the weights, or, 2) the network system itself has inherently some mechanism at its disposal that senses actual edge property changes and can also update the weights (or some technique in between the two).

In both cases observation is required, as well as realizing/detecting the need for weight updates, and lastly implementing the update. In case 1) a human would take in all the available information about the network system and its changes (e.g. major link upgrades, topological changes, etc.), then realize and decide on new weights (according to some preferred overall system behavior), and then play out the updated weights into the network system. This is how Traffic Engineering (TE) works in most major telco networks today. The learning bit in it, i.e. the realizing of the need for change and the weight updating (the re-description of the network itself), for the most part is processed in the human system. The delays here are 2-4 weeks until a human TE would check for needed weight update measures and if it is the case, then play around with the options (the routing algorithm can accommodate for many ad-hoc changes in the network (it would sense it and then re-run for a new optimum), however its adaptability has limits), simulate scenarios and then select a robust heuristic that hopefully performs well enough the next 4 weeks.

In case 2) instead of a human system observing the network state, the network itself would take care of this (self-observation and self-description). It would also evaluate the observations (this evaluation is linked to the “reward” concept in reinforcement learning (RL)) and then update the weights by itself (this is related to “actions” in RL). How are these observation signals, these measurements, coming about? This is an interesting topic itself as it involves delay and can have different semantics. Whether ping is used[2]Mansour, Y. (1999). Course material “Packet Routing using Reinforcement Learning” for Computational Learning Theory at Tel Aviv University during Fall Semester 1999/2000. or the signals are received in-band[3]Brockners, F. (2017). Next-gen Network Telemetry is Within Your Packets: In-band OAM. Talk at Open Networking Summit 2017. – it plays an important role. Let’s keep in mind that the various delays introduced here – depending on the measurement method applied – are crucial when analyzing the overall approach.

This was a little detour to how the metrics/weights on edges may change – however, this is not the main idea of Bellman-Ford. When we think about the propagation of updates, though, Bellman-Ford has shown to be comparatively slow. But let’s turn back to what the main idea behind Q-Routing is and in which aspects  it adapted Bellman-Ford. Part 2

References

References
1 Boyan, J. and Littman, M. (1994). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach.
2 Mansour, Y. (1999). Course material “Packet Routing using Reinforcement Learning” for Computational Learning Theory at Tel Aviv University during Fall Semester 1999/2000.
3 Brockners, F. (2017). Next-gen Network Telemetry is Within Your Packets: In-band OAM. Talk at Open Networking Summit 2017.

Learning of Routing Revisited

Before pairing the terms Deep Learning and Optimization of Network Traffic it is worthwhile to revisit what we know about learning in the context of routing traffic – or: what we know about routing algorithms and their inherent optimization component. The latter already indicates that finding an optimal route involves techniques that seek to incrementally improve some value until it is found (or does not change anymore). To this extent it appears that the problem of finding good routes had involved a form of learning[1]Learning in the sense of looking at, comparing and ranking data structures being presented and identifying an optimal solution (optimization, finding a recipe). However, not so much in the sense of … Continue reading.

We want to revisit the main concepts here. We will also revisit proposals such as Q-Routing[2]Boyan, J. and Littman, M. (1994). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach.. And we want to round it up with recent proposals such as the MIND architecture[3]Geng, Y. (2017). MIND: Machine Learning based Network Dynamics. Talk at the Open Networking Summit 2016.. We also want to use theory from the dynamic control systems community on network architecture tradeoffs[4]Matni, N., Tang, A. and Doyle, J. (2015). A Case Study in Network Architecture Tradeoffs. ACM Sigcomm Symposium on SDN Research (SOSR) 2015. to see what the theory can do for us when different kinds of learning are involved. Please stay tuned.

References

References
1 Learning in the sense of looking at, comparing and ranking data structures being presented and identifying an optimal solution (optimization, finding a recipe). However, not so much in the sense of taking in new data (experience, training on a feedback signal from the environment, a reflection about how one has done) which is used to alter the function in order to do better. The latter requires changes to oneself (learning!) which has to take place in time. The former is kind of fixed (static) and in a narrow sense is stripped off the possibility of taking in feedback for adaptation after the optimum has been found, thus is not learning – i.e. transforming – in a strict sense which we will apply from here on now.
2 Boyan, J. and Littman, M. (1994). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach.
3 Geng, Y. (2017). MIND: Machine Learning based Network Dynamics. Talk at the Open Networking Summit 2016.
4 Matni, N., Tang, A. and Doyle, J. (2015). A Case Study in Network Architecture Tradeoffs. ACM Sigcomm Symposium on SDN Research (SOSR) 2015.

Deep Learning: Applications in Telecommunication Companies

Deep Learning[1]For example this Review: LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature. doi:10.1038/nature14539. is en vogue across all businesses. Also Telecommunications companies have (re-)shifted their attention to this topic. While the recent performance increases based on Deep Learning techniques have to be seen in the context of the tasks for which the techniques have been developed for[2]From the review in Nature referenced above: “Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on … Continue reading, the question of how these tasks / challenges relate to the telecommunications business is worthwhile.

This is a list of telco task domains[3]Acker, O., Blockus, A. and Pötscher, F. (2013). Benefiting from data. pwc Strategy& Report. that could benefit from the strengths of machines, software and statistical inference:

  • Optimizing routing and quality of service by analyzing network traffic in real time
  • Analyzing call data records in real time to identify fraudulent behavior immediately
  • Allowing call center reps to flexibly and profitably modify subscriber calling plans immediately
  • Tailoring marketing campaigns to individual customers using location-based and social networking technologies
  • Using insights into customer behavior and usage to develop new products and services

This list can be extended and will be. What is of interest here is to what extent Deep Learning approaches can be successfully applied to these domains. As each of these domains has its own characteristics we shall focus on them individually – starting with the networks, routing, traffic control, etc. domain.

References

References
1 For example this Review: LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature. doi:10.1038/nature14539.
2 From the review in Nature referenced above: “Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.”
3 Acker, O., Blockus, A. and Pötscher, F. (2013). Benefiting from data. pwc Strategy& Report.