其他分享
首页 > 其他分享> > PSPNet: Pyramid Scene Parsing Network论文解读

PSPNet: Pyramid Scene Parsing Network论文解读

作者:互联网

Pyramid Scene Parsing Network论文解读

代码链接:https://github.com/Lextal/pspnet-pytorch

摘要:

通过金字塔池模块和所提出的金字塔场景解析网络(PSPNet),利用基于不同区域的上下文聚合来开发全局上下文信息的能力。我们的全局先验表示可以有效地在场景解析任务上产生高质量的结果,而PSPNet则为像素级预测提供了一个优越的框架。

总结经验:

PSP Module:

PSP Module在四个不同的金字塔尺度下融合特征。如图所示:
在这里插入图片描述
步骤:

  1. 通过Pooling层将特征图划分为不同的子区域,尺寸大小分别为1×1、2×2、3×3和6×6。使用函数:nn.AdaptiveAvgPool2d(output_size=(size, size))
  2. 使用1x1卷积减小维度为1/N(N为金字塔级数,即将多少个子区域)
  3. 直接通过双线性插值的上采样将低维特征图输出得到输入特征图相同大小的特征图。
  4. 将输出特征图和输入特征图进行Concat

Network Architecture:

在这里插入图片描述
使用一个预先训练的ResNet模型和 dilated network策略来提取特征图,即在Resnet中设置dilation达到扩张的目的。使用4层金字塔,池化内核覆盖了图像的整个部分、一半部分和一小部分。它们被融合为全局的先验。然后我们将©的最后一部分的先验与原始特征映射连接起来然后是一个卷积层,在(d)中生成最终的预测图。

辅助损失:(暂时没有看懂原因)
在这里插入图片描述
除了使用softmax损失来训练最终分类器的主分支外,在第四阶段之后还应用了另一个分类器,即res4b22残差块。让两个损失函数通过之前的所有层。辅助损失有助于优化学习过程,而主分支损失承担的责任最大。我们增加了权重来平衡辅助性的损失。

代码:

import torch
from torch import nn
from torch.nn import functional as F

import extractors


class PSPModule(nn.Module):
    def __init__(self, features, out_features=1024, sizes=(1, 2, 3, 6)):
        super().__init__()
        self.stages = []
        self.stages = nn.ModuleList([self._make_stage(features, size) for size in sizes])
        self.bottleneck = nn.Conv2d(features * (len(sizes) + 1), out_features, kernel_size=1)
        self.relu = nn.ReLU()

    def _make_stage(self, features, size):
        prior = nn.AdaptiveAvgPool2d(output_size=(size, size))
        conv = nn.Conv2d(features, features, kernel_size=1, bias=False)
        return nn.Sequential(prior, conv)

    def forward(self, feats):
        h, w = feats.size(2), feats.size(3)
        priors = [F.upsample(input=stage(feats), size=(h, w), mode='bilinear') for stage in self.stages] + [feats]
        bottle = self.bottleneck(torch.cat(priors, 1))
        return self.relu(bottle)


class PSPUpsample(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.PReLU()
        )

    def forward(self, x):
        h, w = 2 * x.size(2), 2 * x.size(3)
        p = F.upsample(input=x, size=(h, w), mode='bilinear')
        return self.conv(p)


class PSPNet(nn.Module):
    def __init__(self, n_classes=18, sizes=(1, 2, 3, 6), psp_size=2048, deep_features_size=1024, backend='resnet34',
                 pretrained=False):
        super().__init__()
        self.feats = getattr(extractors, backend)(pretrained)
        self.psp = PSPModule(psp_size, 1024, sizes)
        self.drop_1 = nn.Dropout2d(p=0.3)

        self.up_1 = PSPUpsample(1024, 256)
        self.up_2 = PSPUpsample(256, 64)
        self.up_3 = PSPUpsample(64, 64)

        self.drop_2 = nn.Dropout2d(p=0.15)
        self.final = nn.Sequential(
            nn.Conv2d(64, n_classes, kernel_size=1),
            nn.LogSoftmax()
        )

        self.classifier = nn.Sequential(
            nn.Linear(deep_features_size, 256),
            nn.ReLU(),
            nn.Linear(256, n_classes)
        )

    def forward(self, x):
        f, class_f = self.feats(x) 
        p = self.psp(f)
        p = self.drop_1(p)

        p = self.up_1(p)
        p = self.drop_2(p)

        p = self.up_2(p)
        p = self.drop_2(p)

        p = self.up_3(p)
        p = self.drop_2(p)

        auxiliary = F.adaptive_max_pool2d(input=class_f, output_size=(1, 1)).view(-1, class_f.size(1))

        return self.final(p), self.classifier(auxiliary)

models = {
    'squeezenet': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=512, deep_features_size=256, backend='squeezenet'),
    'densenet': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=1024, deep_features_size=512, backend='densenet'),
    'resnet18': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=512, deep_features_size=256, backend='resnet18'),
    'resnet34': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=512, deep_features_size=256, backend='resnet34'),
    'resnet50': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=2048, deep_features_size=1024, backend='resnet50'),
    'resnet101': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=2048, deep_features_size=1024, backend='resnet101'),
    'resnet152': lambda: PSPNet(sizes=(1, 2, 3, 6), psp_size=2048, deep_features_size=1024, backend='resnet152')
}

def build_network( backend):
    epoch = 0
    backend = backend.lower()
    net = models[backend]()
    #net = nn.DataParallel(net)
    #net = net.cuda()
    return net


if __name__ == '__main__':

    net = build_network('resnet34')
    input = torch.empty((1,3,512,512))
    label = torch.empty(1)
    out, out_cls = net(input)

标签:__,Pyramid,Network,nn,self,PSPNet,features,backend,size
来源: https://blog.csdn.net/weixin_44543648/article/details/122126244