其他分享
首页 > 其他分享> > 【论文考古】联邦学习开山之作 Communication-Efficient Learning of Deep Networks from Decentralized Data

【论文考古】联邦学习开山之作 Communication-Efficient Learning of Deep Networks from Decentralized Data

作者:互联网

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Apr. 2017, pp. 1273–1282.

联邦学习

特征

优势:communication-efficient

communication-efficient的含义并不是相较于传输整体数据或整个网络结构,只传输参数更新会降低通信开销。而是和同步的SGD(仅用所有本地数据训练一次就进行参数合并,是当时的基于数据中心训练方法的SOTA)相比,在更少的通信次数下就能达到目标准确率(减少10到100倍的通信次数)。

our goal is to use additional computation in order to decrease the number of rounds of communication needed to train a model

对于不传输本地数据这一点,作者强调的是隐私保护,而不是节省通信开销。

核心算法:FedAvg

精彩观点

性能提升

挖的坑

评价

文章价值

新意100×有效1000×研究问题100

为什么能诞生FL

当两个模型采用同一套参数初始值时,过拟合训练直接参数平均就能提高模型性能!所以和分布式SGD的每本地训练一次就上传相比,大大减少了通信的次数。

这个发现是在IID的情况下做的,仿真下发现在non IID下也有显著提升。但是没有IID下提升那么明显,可能是个可以挖的坑。

Recent work indicates that in practice, the loss surfaces of sufficiently over-parameterized NNs are surprisingly well-behaved and in particular less prone to bad local minima than previously thought [11, 17, 9].

we find that naive parameter averaging works surprisingly well

the average of these two models, \(\frac{1}{2}w+ \frac{1}{2}w^\prime\), achieves significantly lower loss on the full MNIST training set than the best model achieved by training on either of the small datasets independently.

为什么FL能这么火

提示与启发

标签:Decentralized,训练,Efficient,Communication,通信,用户,准确率,100,FL
来源: https://www.cnblogs.com/mhlan/p/15920621.html