使用帕累托的原则去求解多目标优化问题,往往是一件困难的事情。
此处用相关的文章去理一理MGDA这个方法,文章[1]提出了此方法,文章[2]将该方法用在ANN上,并且做了一些关于多目标优化中,使用ANN得出帕累托最优的上界的推导,文章[3]利用文章[2]的结论将该方法用在推荐系统。
Some records from papers are listed as follows:
In the paper[1], the author has defined the Pareto-stationary points and proposed a method to find Pareto-stationary solutions.
Some important results are:
- Pareto-optimal solution of a multiple objective problem must be Pareto-stationary, the reverse may not be true.
- If a feasible solution is not Pareto-stationary, there must exist a gradient that causes every objective to fall(i.e., minimization problem).
In paper[2], the author summarized that
It has been proven [3] that either the solution to this optimiza- tion problem is 0 so that the KKT conditions are satisfied or the solutions lead to gradient directions that minimizes all the loss functions. If the KKT conditions are satisfied, the solution is Pareto stationary and also Pareto efficient under realistic and mild conditions[3].
But it’s not true, because from paper[3] we can know that optimizing this upper bound
yields a Pareto optimal solution under realistic assumptions.
文章[2]的说法不是那么准确, 应该这么说,因为是凸问题,那么再KKT条件下一定能取得最小值,当所有的最小值都求得后,这些solutions解集(最优解不一定只有一个,也许值相同,但是对应的自变量不同)中存在帕累托最优解。
总结: 我们只能从上述的upper bound中获得Pareto-stationary的解,这里隐含的思路是,
如果所有的Pareto-stationary解都得到了,再从中去选择符合帕累托支配的Pareto-optimal solution, 当然对于神经网络的算法,只要能得到近似解,并且实验上近似最优文章就算是不错了。
所以,实际上,我们利用Pareto-stationary对于Pareto-optimal的必要性,去求解所有的Pareto-stationary解,因为 于Pareto-optimal必然存在于所有的的Pareto-stationary解集当中,然后利用帕累托支配接的定义,从中筛出optimal的解.
进一步地,如果不能穷举所有的帕累托稳定解,那么只能从帕累托稳定解中得到近似的帕累托最优解。
收敛性
从下图,MGDA的算法收敛推导[4]可以看出,这种收敛主要是因为自变量domain有界,所以算法可以收敛。
[1] Multiple-gradient descent algorithm (MGDA) for multiobjective optimization
[2] A Pareto-Efficient Algorithm for Multiple Objective Optimization in E-Commerce Recommendation
[3] Multi-Task Learning as Multi-Objective Optimization
[4] Multiple-Gradient Descent Algorithm (MGDA)