Notice

GitHUb

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

ComputerVision Jack

An Energy and GPU-Computation Efficient Backbone Network fro Real-Time Object Detection 본문

Reading Paper/Classification Networks

An Energy and GPU-Computation Efficient Backbone Network fro Real-Time Object Detection

JackYoon 2022. 1. 24. 16:39

An Energy and GPU-Computation Efficient Backbone Network fro Real-Time Object Detection

Abstract

DenseNet은 Dense connection 기반으로 다양한 Receptive field 취합하여 중간 단계의 Feature-map 보존한다. 이를 통해 Object Detection에서 좋은 성능을 보여준다.
Feature 재 사용 하는 단점이 있지만 DenseNet Backbone 기반 detector는 낮은 Energy 효율성을 갖는다. 이러한 DenseNet 단점을 극복하기 위해 One-Shot-Aggregation(OSA) 구성된 VoVNet 제안한다.
OSA는 DenseNet의 장점인 다층 Receptive field에서 만든 다양한 Feature-map 사용할 뿐만 아니라 Last Feature-map만 사용하여 DenseNet의 단점을 극복하기도 한다.
Backbone Network 관점에서 VoVNet의 효율성을 검증하기 위해, 가볍고 무거운 VoVNet 만들고 one-stage, two-stage detector에 장착하여 실험했다.

Introduction

현재 흐름에서 object detector는 가장 좋은 feature extractor 채택하여 사용하는 것이다. DenseNet의 경우 Classification에서 가장 좋은 성능을 보였기 때문에 자연적으로 Detection Task까지 확장하게 되었다.
ResNet과 DenseNet의 가장 큰 차이는 Feature 결합하는 방식이다. ResNet은 summation 접근하고 DenseNet은 concatentation 접근한다.
concatenation이 원본으로부터 가장 정보를 잘 보존한다. 그러므로 feature-map 잘 보존하고 축적하는 DenseNet이 ResNet 보다 Detection 관점에선 우수하다.
하지만 DenseNet 경우 ResNet 보다 더 많은 자원과 시간이 필요하다. 이는 FLOPs와 Model_size 외에도 다른 부분들이 영향을 주기 때문이다.
1. 중간 단계의 feature-map에 접근하는데 필요한 memory access cost(MAC) 중요한 연산 요소이다.
2. GPU 병렬 연산 측면에서, DenseNet은 bottleneck 처리하는데 제한이 존재하기 때문이다.
따라서 목적은 MAC과 GPU-computation efficiency 고려하여 효율적으로 Network 설계하는 것이다. 이를 통해 One-Shot Aggregation (OSA) 제안하고, 중간 단계의 feature 한번만 취합한다.

Factors of Efficient Network Design

효율적인 Network 설계할 때, depthwise convolution과 1 x 1 convolution bottleneck 사용하여 FLOPs와 모델 Size에 초점을 두는 경향이 있다. 하지만 이는 항상 GPU에서 Inference 시간을 보장하진 않는다.
따라서 FLOPs와 Parameter 넘어 실용적이고 유효한 Metric(energy per image & FPS) 고려하여 설계해야 한다.

Memory Access Cost

DRAM에 접근할 경우, 접근하는 명령이 그 자체를 사용하는 것 보다 더 많은 연산을 필요로 한다. 이는 모델 구조가 같은 computation과 parameter 갖고 있더라도 자원을 소비하는 방식이 다를 수 있다는 것을 의미한다.
모델 size와 memory acess 수 사이의 차이는 intermediate activation memory footprint 의해 발생한다. 만약 중간 단계의 feature-map 크다면 모델이 같은 parameter 공유하더라도 메모리 접근 비용이 증가한다.

$MAC = hw(C_i + C_o) + k^2_{c_ic_{o}}$

GPU-Computation Efficiency

모든 Network 구조는 Floating point operation이 모든 device에서 처리 속도가 같다고 생각하고 FLOPs 줄인다. 하지만 GPU에선 다르다. GPU는 병렬 처리 방식을 사용하기 때문이다.
GPU 병렬 처리는 Tensor가 크면 클 수록 그 효과가 나온다. 따라서 convolution 여러 작은 단계로 분할하여 처리하는 것은 비 효율성을 가져온다.

Proposed Method

Rethinking Dense Connection

DenseNet의 수행은 실용적으로 보이나 energy와 time 관점에서 몇몇 결점들이 보인다.
Dense connection은 input channel size 증가하는 반면 output channel size 고정이다. 결과적으로 각 Layer는 channel size에 대한 불균형이 나타난다.
그러므로 같은 parameter 갖고 있다고 하더라도 높은 MAC 초래하여 더 많은 자원과 시간이 소요된다.
model size가 큰 경우 input size 증가는 중요한 문제이다. 왜냐하면 depth가 증가할 수록 점진적으로 더 많은 연산을 요구하기 때문이다.
이러한 비효율성 때문에 학습 기간에 Dense Connection이 features 취합하는 방법을 조사했다. 그리고 intermediate layer와 final layer 간의 부정적인 연결이 있다고 가정하였다.
Dense Connection 은 intermediate layer가 더 좋은 feature 만들 수 있게 만든다. 뿐만 아니라 former layer 에서 파생된 feature 비슷하게 만든다. 이 경우 final layer는 양쪽의 feature 취합할 필요가 없다. 이미 풍부한 특징 정보를 갖고 있기 때문이다. (redundant information)

One-Shot Aggregation

이전 방향을 기반으로 효율적인 구조를 조사하였고, one-shot aggregation(OSA) module 만들었고 이 module은 마지막에 한번만 feature 결합(concat) 한다.
게다가 OSA module의 transition layer weight는 DenseNet과 다른 양상을 보인다. shallow depth 파생된 feature는 trainsition layer에서 더 잘 결합된다.
비록 OSA module이 CIFAR-10에서 약간 성능이 떨어지지만 Dense Block 보다 덜 MAC 소요된다. 또한 GPU computation efficiency 강하다.

Configuration of VoVNet

OSA module에서 다양한 feature와 발생한 효율성으로 인하여 VoVNet 적은 module 사용해 accuracy와 speed가 좋아졌다.
VoVNet은 3개의 convolution layer 구성된 stem block과 OSA module 사용하는 4개의 stage 구성되어 있다. (output stride = 32)

Conclusion

real-time object detection 중점인 논문에서 VoVNet 효율적인 backbone networks 제안한다. multi receptive field 다양한 features 사용하고 DenseNet의 비효율성을 개선한다.

from ..layers.convolution import Conv2dBnAct
from ..initialize import weight_initialize

import torch
from torch import nn


class VoVStem(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(VoVStem, self).__init__()
        self.conv = Conv2dBnAct(in_channels=in_channels, out_channels=out_channels, kernel_size=3, stride=2)
    
    def forward(self, input):
        output = self.conv(input)
        return output


class OSAModule(nn.Module):
    def __init__(self, in_channels, conv_channels, layers_per_block, trans_ch):
        super().__init__()
        self.in_channels = in_channels
        self.conv_channels = conv_channels
        self.layers_per_block = layers_per_block
        self.trans_ch = trans_ch
        self.layers = []
        transition_in_ch = in_channels

        for i in range(layers_per_block):
            self.layers.append(Conv2dBnAct(
                self.in_channels, self.conv_channels, 3))
            self.in_channels = self.conv_channels
            transition_in_ch += self.conv_channels
        self.layers = nn.ModuleList(self.layers)
        self.transition = Conv2dBnAct(transition_in_ch, trans_ch, 1)

    def forward(self, x):
        outputs = []
        outputs.append(x)
        for layer in self.layers:
            x = layer(x)
            outputs.append(x)

        x = torch.cat(outputs, dim=1)
        x = self.transition(x)
        return x


class VoVBlock(nn.Module):
    def __init__(self, in_channels, conv_channels, layers_per_block, trans_ch):
        super(VoVBlock, self).__init__()
        self.max_pool = nn.MaxPool2d(2, 2, 0)
        self.osa = OSAModule(in_channels=in_channels, conv_channels=conv_channels, layers_per_block=layers_per_block, trans_ch=trans_ch)

    def forward(self, input):
        output = self.max_pool(input)
        output = self.osa(output)
        return output


class Make_Layer(nn.Module):
    def __init__(self, layers_configs):
        super(Make_Layer, self).__init__()
        self.layers_configs = layers_configs
        self.layer = self.vovBlock(self.layers_configs)

    def forward(self, input):
        return self.layer(input)

    def vovBlock(self, cfg):
        layers = []
        for i, c, n, t in cfg:
            layers.append(VoVBlock(in_channels=i, conv_channels=c, layers_per_block=n, trans_ch=t))
        return nn.Sequential(*layers)


class _VoVNet27(nn.Module):
    def __init__(self, in_channels, classes):
        super(_VoVNet27, self).__init__()
        self.vovStem = VoVStem(in_channels=3, out_channels=64)

        # configs OSA : in_channels, conv, iter_cnt, trans_channels
        layer3 = [[128, 80, 5, 256]]
        layer4 = [[256, 96, 5, 384]]
        layer5 = [[384, 112, 5, 512]]
        
        self.layer1 = Conv2dBnAct(in_channels=64, out_channels=64, kernel_size=3, stride=1)
        self.layer2 =  nn.Sequential(
                        Conv2dBnAct(in_channels=64, out_channels=128, kernel_size=3, stride=2),
                        OSAModule(in_channels=128, conv_channels=64, layers_per_block=5, trans_ch=128)
                    )
        self.layer3 = Make_Layer(layer3)
        self.layer4 = Make_Layer(layer4)
        self.layer5 = Make_Layer(layer5)
        self.classification = nn.Sequential(
            Conv2dBnAct(in_channels=512, out_channels=1280, kernel_size=1),
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(1280, classes, 1)
        )

    def forward(self, input):
        stem= self.vovStem(input)
        s1 = self.layer1(stem)
        s2 = self.layer2(s1)
        s3 = self.layer3(s2)
        s4 = self.layer4(s3)
        s5 = self.layer5(s4)
        pred = self.classification(s5)
        b, c, _, _ = pred.size()
        pred = pred.view(b, c)
        return {'pred':pred}


def VoVNet(in_channels, classes=1000, varient=27):
    if varient == 19:
        model = _VoVNet19(in_channels=in_channels, classes=classes)
    elif varient == 27:
        model = _VoVNet27(in_channels=in_channels, classes=classes)
    
    weight_initialize(model)
    return model


if __name__ == '__main__':
    model = VoVNet(in_channels=3, classes=1000, varient=27)
    model(torch.rand(1, 3, 224, 224))

'Reading Paper > Classification Networks' 카테고리의 다른 글

Rethinking Channel Dimension for Efficient Model Design (0)	2022.01.25
FrostNet: Towards Quantization-Aware Network Architecture Search (0)	2022.01.20
MicroNet: Towards Image Recognition with Extremely Low FLOPs (0)	2022.01.17
MnasNet: Platform-Aware Neural Architectures Search for Mobile (0)	2022.01.13
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (0)	2022.01.04