Q-Routing Revisited (Part 2)

In Part 1 we have recalled how Bellman-Ford, a distance vector algorithm, works to find shortest paths in a directed, weighted network. Now we want to understand in which aspects Q-Routing[1]Boyan, J. and Littman, M. (1994). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach. modifies Bellman-Ford.

Difference 1: path relaxation steps performed asynchronously and online?

Difference 2: metric describing the path “quality”.

References

References
1 Boyan, J. and Littman, M. (1994). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach.