Notice

GitHUb

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

ComputerVision Jack

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 본문

Reading Paper/Classification Networks

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

JackYoon 2021. 12. 22. 09:57

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Abstract

Mobile 및 Embedded application에서 효율적인 MobileNets 제시한다
MobileNet은 Depthwise Separable Convolution 구성된 Streamlined Architecture 이며 경량 Deep Neural Network이다.
해당 논문에서 Latency와 Accuracy 위해 2개의 Global Hyper-Parameter 제안한다. 이러한 Parameter는 Application의 제한된 구조에서 Model의 메모리를 효율적으로 만든다.

Introduction

일반적으로 높은 Accuracy 달성하기 위해 Network 깊고 복잡하게 만드는 경향이 있다. 그러나 이러한 경향은 Network의 향상에 반드시 필요하지 않다. Network의 크기와 속도를 고려하지 않았기 때문이다.
세상의 Application은 연산에 의존적인 platform 기반으로 작동한다.
논문에서 효율적인 Network Architecture 설명하며, 2개의 Hyper-Parameter 통해 경량 및 낮은 Latency 갖는 Model 만든다. 이 Model은 쉽게 Mobile 및 Embedded Application에 대해 최적화 될 수 있다.

Prior Works

해당 논문은 제한된 자원 (Latency, Size) 갖는 Application에 Small Network 할당하는 것을 제안한다.
MobileNets은 Depthwise Separable Convolution 처음 도입하였으며, 도입부의 Layer의 연산을 줄이기 위해 후에 Inception Models 차용한다.

작성된 Small Network 다른 접근 방법으로 Shrinking, Factorizing, Compressing Pretrained Network 방법이 존재한다.

MobileNets Architecture

MobileNet의 핵심 Layer는 Depthwise Separable Filters이다.
다음 MobileNet 구조는 width_multiplier와 resolution multiplier Hyper Parameters 갖는다.

Depthwise Separable Convolution

MobileNet은 각 Input Channel에 대해 단일 Depwise Convolution 진행한다. 다음 해당 Output 결합하기 위해 1 x 1 Convolution 진행한다.
Depthwise Separable Convolution 2개의 Layer 나눠지며, 하나는 Separate Layer for filtering 다른 하나는 Separate Layer for Combining이다.
Standard Convolution computation cost : $D_K * D_K * M * N * D_F * D_F$
Depthwise Convolution computation cost : $D_K * D_K * M * D_F * D_F$

Depthwise Convolution computation cost : $D_K * D_K * M * D_F * D_F + M * N * D_F *D_F$
3 x 3 Depthwise Separable Convolution의 경우 기존 Convolution 연산 보다 8~9배 연산이 적으며, Accuracy에 대한 차이가 없다.

Network Structure and Training

MobileNet 구조는 Table 1 정의된다. 모든 Layer는 BatchNorm + ReLU 따르고 마지막 하단의 Fully Connected Layer만 non-linearity하게 Softmax Layer 전달된다.
Down Sampling은 Convolution의 Stride 다뤄지며, Depthwise 및 First Layer에 적용된다.
마지막 Average Pooling 경우 공간적 Resolution 1로 만든다. (FC 들어가기 위함)
Mult-Add 수로 Network 정의하기에는 충분하지 않다.
sparse matrix 연산이 dense matrix 연산 보다 빠르진 않지만 이를 1 x 1 Conv(Pointwise) 처리하여 연산에 대한 이점이 존재한다.
큰 Model 학습할 때와 대조적으로, less regularization과 less data augmentation 적용한다. 왜냐하면 Overfitting 문제에 빠질 수 있기 때문이다.
weight decay 거의 적용하지 않는다. Depthwise Filter는 Parameter 수가 작기 때문이다.

Width Multiplier: Thinner Models

MobileNet 구조도 이미 충분히 작고 지연이 적지만, 많은 Application은 더 작고 연산이 적게 걸리 방향을 필요로한다.
그래서 α 인 width_multiplier 도입한다. α 역할은 Network 균일하게 가볍게 만든다.
Depthwise Separable Convolution (with multiplier α) computation cost : $D_K * D_K * αM * D_F * D_F + αM * αN * D_F * D_F$
α = 1 default 기본적으로 연산을 적게 만든다. 또한 경량 모델에 대해 합리적인 이유로 사용이 가능하다.

Resolution Multiplier: Reduced Representation

두 번째 Hyper Parameter는 ρ 인 resolution multiplier이다. 저자들은 해당 값을 Layer의 Input에 넣어 사용하였다.
Depthwise Separable Convolution (with Multiplier α, resolution Multiplier ρ) : $D_K * D_K *αM *ρD_F * ρD_F+αM * αN * ρD_F * ρD_F$

Conclusion

저자들은 효율적인 Model design 고려하였고, Depthwise Separable Convolution 사용하는 MobileNet 만들었다.
width_multiplier와 resolution_multiplier 사용하여 더 빠른 MobileNet 만들었고, Size와 Latency에 대한 이점이 존재한다.

import torch
from torch import nn


class MobileNet_stem(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(MobileNet_stem, self).__init__()
        self.conv = Conv2dBnAct(in_channels=in_channels, out_channels=out_channels, kernel_size=3, stride=2)

    def forward(self, input):
        return self.conv(input)


class DepthwiseSeparable_Block(nn.Module):
    def __init__(self, in_channels, kernel_size, out_channels, stride):
        super(DepthwiseSeparable_Block, self).__init__()
        self.depthwise = DepthwiseConvBnAct(in_channels=in_channels, kernel_size=kernel_size, stride=stride)
        self.pointwise = Conv2dBnAct(in_channels = in_channels, out_channels=out_channels, kernel_size=1, stride=1)
    
    def forward(self, input):
        output = self.depthwise(input)
        output = self.pointwise(output)
        return output


class _MobileNetv1(nn.Module):
    def __init__(self, in_channels, classes):
        super(_MobileNetv1, self).__init__()
        self.stage_channels = []
        self.stem_block = MobileNet_stem(in_channels=in_channels, out_channels=32)
        
        # confing in_channels, kernel_size, output_ch, stride
        layer1 = [
            [32, 3, 64, 1], [64, 3, 128, 2]
        ]
        layer2 = [ 
            [128, 3, 128, 1], [128, 3, 256, 2]
        ]
        layer3 = [
            [256, 3, 256, 1], [256, 3, 512, 2]
        ]
        layer4 = [
            [512, 3, 512, 1], [512, 3, 512, 1],
            [512, 3, 512, 1], [512, 3, 512, 1],
            [512, 3, 512, 1], [512, 3, 1024, 2]
        ]
        layer5 = [
            [1024, 3, 1024, 1]
        ]
        self.layer1 = self.make_layers(layer1)
        self.layer2 = self.make_layers(layer2)
        self.layer3 = self.make_layers(layer3)
        self.layer4 = self.make_layers(layer4)
        self.layer5 = self.make_layers(layer5)
        self.stage_channels = self.Get_Stage_Channels([layer1, layer2, layer3, layer4, layer5])
        self.classification = nn.Sequential(
            Conv2dBnAct(in_channels=1024, out_channels=1024, kernel_size=1),
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(1024, classes, 1)
        )
    
    def forward(self, input):
        stem_out = self.stem_block(input)      
        s1 = self.layer1(stem_out)
        s2 = self.layer2(s1)
        s3 = self.layer3(s2)
        s4 = self.layer4(s3)
        s5 = self.layer5(s4)
        pred = self.classification(s5)
        b, c, _, _ = pred.size()
        pred = pred.view(b, c)
        stages = [s1, s2, s3, s4, s5]
        return {'stage':stages, 'pred':pred}

    def make_layers(self, cfg):
        layers = []
        for i, k, o, s in cfg:
            layer = DepthwiseSeparable_Block(in_channels=i, kernel_size=k, out_channels=o, stride=s)
            layers.append(layer)
        return nn.Sequential(*layers)


def MobileNetv1(in_channels, classes=1000):
    model = _MobileNetv1(in_channels=in_channels, classes=classes)
    weight_initialize(model)
    return model


if __name__ == '__main__':
    model = MobileNetv1(in_channels=3, classes=1000)
    model(torch.rand(1, 3, 224, 224))

'Reading Paper > Classification Networks' 카테고리의 다른 글

Squuze-and-Excitation Networks (0)	2021.12.28
MobileNetV2: Inverted Residuals and Linear Bottlenecks (0)	2021.12.24
Densely Connected Convolutional Networks (0)	2021.12.23
Aggregated Residual Transformations for Deep Neural Networks (0)	2021.12.21
Deep Residual Learning for Image Recognition (0)	2021.12.20

'Reading Paper/Classification Networks' Related Articles

Comments

ComputerVision Jack

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications 본문

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Abstract

Introduction

Prior Works

MobileNets Architecture

'Reading Paper > Classification Networks' 카테고리의 다른 글

티스토리툴바