In Part 1 we have recalled how Bellman-Ford, a distance vector algorithm, works to find shortest paths in a directed, weighted network. Now we want to understand in which aspects Q-Routing[1]Boyan, J. and Littman, M. (1994). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach. modifies Bellman-Ford.
Difference 1: path relaxation steps performed asynchronously and online?
Difference 2: metric describing the path “quality”.
References
↑1 | Boyan, J. and Littman, M. (1994). Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach. |
---|