Notice

GitHUb

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

ComputerVision Jack

Searching for MobileNetV3 본문

Reading Paper/Classification Networks

Searching for MobileNetV3

JackYoon 2021. 12. 29. 11:25

Searching for MobileNetV3

Abstract

기술과 새로운 구조 디자인을 통해, 다음 세대의 MobileNets 구조를 제안한다.
MobileNetV3는 NetAdapt 알고리즘을 보완하여, 하드웨어 인식 네트워크 구조 검색(NAS)의 조합을 바탕으로 휴대폰 CPU에 맞춰 조정된다. 이 후 새로운 구조 발전을 통해 개선 된다.
따라서 2가지의 새로운 MobileNet Model 제시한다 : MobileNetV3-Large, MobileNetV3-Small 이 두 가지 모델은 높고 낮은 Resource에 사용이 가능하다.
또한 이러한 모델은 Object-Detection 및 Semantic Segmentation 작업에 대해 적용될 수 있다.

Introduction

효율적인 Nerual Networks는 Mobile Application에 전혀 다른 경험을 가능하게 하는 생각이 보편화 되어 있다.
이러한 Nerual Network의 효율성은 높은 정확도와 낮은 지연을 통한 사용자의 경험 향상 뿐만 아니라 전력 소비를 낮춰, 배터리의 수명을 보존할 수 있게 한다.
해당 논문은 on-device에서 높은 정확도를 달성할 수 있는 MobileNetV3-Large와 Small에 대한 저자들의 접근을 보여준다.
논문의 목적은 Mobile Device에서 정확도와 지연의 Trade-off 최적화 시킬 수 있는 구조를 발전시키는 것이다.

Efficient Mobile Building Blocks

Mobile Modle은 효율적인 Building Blocks 바탕으로 설계되어 왔다.
MobileNetV1 : 고전적인 Convolution Layer → Depthwise Separable Convolution 대체하였다.
MobileNetV2 : Linear Bottleneck과 Inverted Residual 구조를 통해 효율적인 Layer 구조를 도입했다.
MnasNet : MobileNetV2 구조에 입각하여, Bottleneck 구조에 Squeeze and Excitation인 Attention Module 추가하였다.
MobileNetV3 경우 효과적인 Models 만들기 위해 이러한 Building Block 결합하고, Swish Nonlinearity 수정하여 Activation 함수로 사용하였다.

Networks Search

Network Search 방법은 Netowrk 구조를 최적화 하기 위해 가장 강력한 도구이다.
MobileNetV3에서, 저자들은 NAS(platform-aware) 사용하여 각각의 Network Block 최적화 하였다. 그리고 NetAdapt 알고리즘을 통해 각 Layer의 Filter 수를 찾아 내었다.
이러한 방법들은 Hardware Platform에 Modle 최적화 하기에 좋다.

Platform-Aware NAS for Block-wise Search

RNN Controller에 입각하고, 같은 Factorized Hierarchical Search Space 기반으로 비슷한 결과를 갖는 Large Mobile Model 발견하였다.
Small Model의 경우 정확도 변화에 따라 Latency가 극적으로 변한다. 그렇기 때문에 Small Weight Factor 필요로 했고, W=-0.15 설정하였다. 그 다음 NetAdapt 그리고 Optimization 통해 Small Mobile Model 획득하였다.

NetAdapt for Layer-wise Search

다음 구조 탐색인 NetAdapt 기술이다. 이 접근은 NAS(platform-aware) 방식 다음 뒤 따른다 : 각각의 Layer 독립하게 연속적으로 fine-tuning하고 Global Architecture 도출한다.

NAS(platform-ware) 발견한 Seed Network에서 출발한다.
각각의 step 따른다.(b) - 각 제안에 대해 pre-trained Model 사용하여 이전 step 보다 Parameter 늘려 새로운 구조를 제안한다. Weight 적절하게 자르고 무작위로 초기화 한다. 그 다음 각 제안을 T step에서 미세 조정하여 대략적인 정확도 추정한다.
(c) - 해당 metric에 대해 가장 좋은 제안을 채택한다.
(a) - 새로운 제안을 준비한다. 이전 step과 비교하여 δ reduction 통해 구조를 수정한다.
target의 Latency에 도달할 때 까지 위 step 반복한다.

Metric에서 정확도의 변화를 최소화 한다. 따라서 알고리즘을 수정하고 latency change와 accuracy change 사이의 ratio 최소화 한다.
MobileNetV2에 대하여 저자들은 다음과 같은 방식을 사용하였다.

Expansion Layer에 대해 Size 줄였다.
Residual Connection 유지하는 선에서 모든 Block에 대해 Bottleneck 줄였다.

Network Improvements

Network Search와 추가적으로, 여러 새로운 구성 요소를 추가하여 최종 Model 제안한다.
Network의 처음부터 끝까지 Expensive Layer 새롭게 조정하였다. 또한 최근 Swish 수정하여 H-Swish(Nonlinearity) Activation 만들었고, 연산 속도가 훨씬 빠르며 Quantization에도 친숙하다.

Redesigning Expensive Layers

Architecture Search 결과로부터 다른 Layer 보다 earlier Layer 및 last Layer의 Parameter가 높은지 확인한다. 저자들은 정확도를 유지하면서 이런 Layer의 Latency 줄이려고 수정한다.
첫 번째 수정안은 Networks의 Last Layer가 Final Features 뽑기 위해 상응하는지 본다. MobileNetV2에서 1 x 1 Conv 예측을 위해 Rich Feature 뽑아 낸다. 하지만 비용 측면에서 이는 추가 Latency 발생 시킨다.
Latency 줄이고, High Dimensional Feature 보존하기 위해 이 Layer 마지막 Average Pool 단으로 옮긴다.
일단, Feature 추출하는 Layer의 비용이 완화된다면, 이전 Bottleneck Projection Layer는 더 이상 연산을 줄일 필요가 없다.
또 다른 접근은 Expansion Layer에 대한 Filter 수를 정의한다. 이를 토대로 정확도를 유지하면서 Filter 개수를 16개 줄였다.

Nonlinearities

Swish 경우 ReLU 대신하기 위해 도입되었다. Neural Networks에서 정확도를 향상 시켰다.
$swish(x) = x * σ(x)$
이러한 nonlinearity 정확도의 향상을 이루지만, Embedded Environments에서 Sigmoid에 기반을 둔 Activation Function은 연산에 많은 비용을 갖는다.

Sigmoid Function에 piece-wise linear 하여 H-Swish 만들었다.이 수식은 정확도에 영향을 주지 않으며, 개발 관점에서 많은 이점이 있다. 우선 모든 Hardware 및 Software에 대해 최적화되며, Quantized mode 이점, 메모리 접근에 대한 이점도 존재한다.
$H-Swish[x] = x * ReLU6(x+3) / 6$
nonlinearity 적용하는 비용은 Network가 깊어질 수록 감소한다.

Large squeeze-and-excite

Squeeze-and-Excite Bottleneck 구조를 추가하면서, Expasion Layer의 Channel 1/4 줄였다. 그 결과 정확도가 증가하였으며, Parameter의 차이는 미비했고, Latency 측면 에서도 이점이 있었다.

MobileNetV3 Definitions

아래의 그림은 MobileNetV3 Network 구조를 설명한다.

Conclusions and future work

해당 논문은 MobileNetV3(Large-Small) 제안하며, Object Detection 및 Segmentation 및 Classification에서 State-of-the art 달성한다.
그리고 swish(Nonlinearity)와 Squeeze and Excit가 Qunatization에 얼마나 효율적으로 동작하는지 설명하고, Mobile Model Domain에서 효과를 증명한다.

from ..layers.conv_block import Conv2dBnAct, DepthwiseConvBnAct
from ..layers.activation import HardSwish
from ..initialize import weight_initialize

import torch
from torch import nn

class MobileNet_stem(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(MobileNet_stem, self).__init__()
        self.conv = Conv2dBnAct(in_channels=in_channels, out_channels=out_channels, kernel_size=3, stride=2, dilation=1,
                                groups=1, padding_mode='zeros', act=HardSwish())
    
    def forward(self, input):
        return self.conv(input)
        
class InvertedResidualBlock(nn.Module):
    def __init__(self, in_channels, kernel_size, out_channels, exp, SE, NL, stride):
        super(InvertedResidualBlock, self).__init__()
        if exp == 0:
            exp = 1
        self.stride = stride
        self.exp = exp
        self.se_block = SE
        if NL == 'RE':
            self.act = nn.ReLU6()
        else:
            self.act = HardSwish()

        self.conv1 = Conv2dBnAct(in_channels=in_channels, out_channels=self.exp, kernel_size=1, stride=1, dilation=1,
                                    groups=1, padding_mode='zeros', act=self.act)
        self.dconv = DepthwiseConvBnAct(in_channels=self.exp, kernel_size=kernel_size, stride=stride,
                                        dilation=1, padding_mode='zeros', act=self.act)
        self.conv2 = Conv2dBnAct(in_channels=self.exp, out_channels=out_channels, kernel_size=1)
        self.se = SE_Block(in_channels=self.exp)

    def forward(self, input):
        _, i_c, _, _ = input.size()
        output = self.conv1(input)
        output = self.dconv(output)
        if self.se_block:
            se = self.se(output)
            output = self.conv2(se)
        else:
            output = self.conv2(output)
        _, o_c, _, _ = output.size()
        if self.stride == 1 and i_c == o_c:
            output = input + output
        return output

class Make_Layers(nn.Module):
    def __init__(self, layers_configs):
        super(Make_Layers, self).__init__()
        self.layers_configs = layers_configs
        self.layer = self.Mobilenetv3_layer(self.layers_configs)

    def forward(self, input):
        return self.layer(input)

    def Mobilenetv3_layer(self, layers_configs):
        layers = []
        for i, k , o, e, se, nl, s in layers_configs:
            layers.append(InvertedResidualBlock(in_channels=i, kernel_size=k, out_channels=o,
                                                exp=e, SE=se, NL=nl, stride=s))
        return nn.Sequential(*layers)

class _MobileNetv3_Large(nn.Module):
    def __init__(self, in_channels, classes):
        super(_MobileNetv3_Large, self).__init__()
        self.stage_channels = []
        self.stem_block = MobileNet_stem(in_channels=in_channels, out_channels=16)

        # config in_channels, kernel_size, out_channels, exp, SE, NL, stride
        layer1 = [ # 112 x 112 resolution
            [16, 3, 16, 16, False, 'RE', 1], [16, 3, 24, 64, False, 'RE', 2]
        ]
        layer2 = [ # 56 x 56 resolution
            [24, 3, 24, 72, False, 'RE', 1], [24, 5, 40, 72, True, 'RE', 2]
        ]
        layer3 = [ # 28 x 28 resolution
            [40, 5, 40, 120, True, 'RE', 1], [40, 5, 40, 120, True, 'RE', 1],
            [40, 5, 80, 240, False, 'HS', 2]
        ]
        layer4 = [ # 14 x 14 resolution
            [80, 3, 80, 200, False, 'HS', 1], [80, 3, 80, 184, False, 'HS', 1],
            [80, 3, 80, 184, False, 'HS', 1], [80, 3, 112, 480, True, 'HS', 1],
            [112, 3, 112, 672, True, 'HS', 1], [112, 5, 160, 672, True, 'HS', 2]
        ]
        layer5 = [ # 7 x 7 resolution
            [160, 5, 160, 960, True, 'HS', 1], [160, 5, 160, 960, True, 'HS', 1]
        ]
        self.layer1 = Make_Layers(layer1)
        self.layer2 = Make_Layers(layer2)
        self.layer3 = Make_Layers(layer3)
        self.layer4 = Make_Layers(layer4)
        self.layer5 = Make_Layers(layer5)
        self.classification = nn.Sequential(
            Conv2dBnAct(in_channels=160, out_channels=1280, kernel_size=1),
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(1280, classes, 1)
        )
    def forward(self, input):
        stem_out = self.stem_block(input)
        s1 = self.layer1(stem_out)
        s2 = self.layer2(s1)
        s3 = self.layer3(s2)
        s4 = self.layer4(s3)
        s5 = self.layer5(s4)
        pred = self.classification(s5)
        b, c, _, _ = pred.size()
        pred = pred.view(b, c)
        stages = [s1, s2, s3, s4, s5]
        return {'stage':stages, 'pred':pred}

class _MobileNetv3_Small(nn.Module):
    def __init__(self, in_channels, classes):
        super(_MobileNetv3_Small, self).__init__()
        self.stage_channels = []
        self.stem_block = MobileNet_stem(in_channels=in_channels, out_channels=16)
     # config in_channels, kernel_size, out_channels, exp, SE, NL, stride
        layer1 = [ # 112 x 112 resolution
            [16, 3, 16, 16, True, 'RE', 2]
        ]
        layer2 = [ # 56 x 56 resolution
            [16, 3, 24, 72, False, 'RE', 2]
        ]
        layer3 = [ # 28 x 28 resolution
            [24, 3, 24, 88, False, 'RE', 1], [24, 5, 40, 96, True, 'HS', 2]
        ]
        layer4 = [ # 14 x 14 resolution
            [40, 5, 40, 240, True, 'HS', 1], [40, 5, 40, 240, True, 'HS', 1],
            [40, 5, 48, 120, True, 'HS', 1], [48, 5, 48, 144, True, 'HS', 1],
            [48, 5, 96, 288, True, 'HS', 2]
        ]
        layer5 = [ # 7 x 7 resolution
            [96, 5, 96, 576, True, 'HS', 1], [96, 5, 96, 576, True, 'HS', 1]
        ]
        self.layer1 = Make_Layers(layer1)
        self.layer2 = Make_Layers(layer2)
        self.layer3 = Make_Layers(layer3)
        self.layer4 = Make_Layers(layer4)
        self.layer5 = Make_Layers(layer5)
        self.classification = nn.Sequential(
            Conv2dBnAct(in_channels=96, out_channels=1280, kernel_size=1),
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(1280, classes, 1)
        )
    def forward(self, input):
        stem_out = self.stem_block(input)
        s1 = self.layer1(stem_out)
        s2 = self.layer2(s1)
        s3 = self.layer3(s2)
        s4 = self.layer4(s3)
        s5 = self.layer5(s4)
        pred = self.classification(s5)
        b, c, _, _ = pred.size()
        pred = pred.view(b, c)
        stages = [s1, s2, s3, s4, s5]
        return {'stage':stages, 'pred':pred}

def MobileNetv3(in_channels, classes=1000, varient='small'):
    if varient == 'small':
        model = _MobileNetv3_Small(in_channels=in_channels, classes=classes)
    else:
        model = _MobileNetv3_Large(in_channels=in_channels, classes=classes)
    weight_initialize(model)
    return model

if __name__ == '__main__':
    model = MobileNetv3(in_channels=3, classes=1000, varient='small')
    model(torch.rand(1, 3, 224, 224))

'Reading Paper > Classification Networks' 카테고리의 다른 글

Designing Network Design Spaces (0)	2022.01.03
GhostNet: More Features from Cheep Operations (0)	2021.12.31
Squuze-and-Excitation Networks (0)	2021.12.28
MobileNetV2: Inverted Residuals and Linear Bottlenecks (0)	2021.12.24
Densely Connected Convolutional Networks (0)	2021.12.23