🧐🧐🧐

Day-to-day dynamics and FW algorithm

Connections & differences

Posted on March 9, 2021

Traffic equilibrium [Read More]

Tags: research day-to-day mathematical analysis

Feasible link and path flow set

Convexity and underdetermination

Posted on February 7, 2021

This blog reviews some basic concepts in the traffic assignment. The focus will be on the network structure. [Read More]

Tags: research linear algebra

Value iteration process converges to an optimal policy

In the last policy iteration blog, we prove that starting from an initial pocliy, the iteration process of “evaluation -> greedy improvement -> evaluation -> greedy improvement …” can guarantee an optimal policy. We note that at each step of evaluation, we have to iterate it many times to get... [Read More]

Tags: reinforcement learning

A copy of online tutorial for Pytorch autograd framework

Self record only

Posted on January 30, 2021

Autograd，自动微分，是整个pytorch框架的核心功能。我一直是只了解原理，但却不知框架是如何实现的，这次看到一个非常好的教程，所以打算把理解总结一下。 [Read More]

Tags: deep learning

Improvimg greedily leads to optimality in policy iteration

Why?

Posted on January 25, 2021

在Dynamic programming里，policy iteration是一个经典的框架，通过反复迭代evaluation和improvement 来得到一个最优的策略。Evaluation得到的值函数，传给improvement过程，而improvement过程根据这个值函数做一个贪婪的policy，返回给evaluation过程。下面看一下为什么贪婪的反馈能够得到一个optimal policy。 [Read More]

Tags: reinforcement learning