Skip to main content

Table 1 Layer-wise details of our proposed MobileNet

From: DeepDet: YAMNet with BottleNeck Attention Module (BAM) for TTS synthesis detection

Type

Activations

Learnable

Stride/channel

Total learnable

Image input

96 × 64 × 1

–

–

0

Convolution 2D (Conv)

48 × 32 × 32

Weights: 3 × 3 × 1 × 32

Bias: 1 × 1 × 32

32 3 × 3 × 1 convolutions

Stride: [2 2]

Padding: same

320

Instance normalization

48 × 32 × 32

Offset: 1 × 1 × 32

Scale: 1 × 1 × 32

32 Channels

64

ReLU

48 × 32 × 32

–

–

0

Grouped convolution depthwise (GConv DW)

48 × 32 × 32

Weights: 3 × 3 × 1 × 1 × 32

Bias: 1 × 1 × 1 × 32

32 groups of 1 3 × 3 × 1 convolutions

Stride: [1 1]

Padding: same

320

Instance normalization

48 × 32 × 32

Offset: 1 × 1 × 32

Scale: 1 × 1 × 32

–

64

ReLU

48 × 32 × 32

–

–

0

Conv

48 × 32 × 64

Weights: 1 × 1 × 32 × 64

Bias: 1 × 1 × 64

64 1 × 1 × 32 convolutions

Stride: [1 1]

Padding: same

2112

128

0

GConv DW

24 × 16 × 64

Weights: 3 × 3 × 1 × 1 × 64

Bias: 1 × 1 × 1 × 64

64 groups of 1 33 × 1 Convolutions

Stride: [2 2]

Padding: same

640

128

0

Conv

24 × 16 × 128

Weights: 1 × 1 × 64 × 128

Bias: 1 × 1 × 128

128 1 × 1 × 64 Convolutions

Stride: [1 1]

Padding: same

8320

256

0

GConv DW

24 × 16 × 128

Weights: 3 × 3 × 1 × 1 × 128

Bias: 1 × 1 × 1 × 128

128 groups of 1 3 × 3 × 1 Convolutions

Stride: [1 1]

Padding: same

1280

256

0

Conv

24 × 16 × 128

Weights: 1 × 1 × 128 × 128

Bias: 1 × 1 × 128

128 1 × 1 × 128 Convolutions

Stride: [1 1]

Padding: same

16,512

256

0

GConv DW

12 × 8 × 128

Weights: 3 × 3 × 1 × 1 × 128

Bias: 1 × 1 × 1 × 128

128 groups of 1 3 × 3 × 1 Convolutions

Stride: [2 2]

Padding: same

1280

256

0

Conv

12 × 8 × 256

Weights: 1 × 1 × 128 × 256

Bias: 1 × 1 × 256

256 1 × 1 × 128 Convolutions

Stride: [1 1]

Padding: same

33,024

512

0

GConv DW

12 × 8 × 256

Weights: 3 × 3 × 1 × 1 × 256

Bias: 1 × 1 × 1 × 256

256 groups of 1 3 × 3 × 1 convolutions

Stride: [1 1]

Padding: same

2560

512

0

Conv

12 × 8 × 256

Weights: 1 × 1 × 256 × 256

Bias: 1 × 1 × 256

256 1 × 1 × 256 Convolutions

Stride: [1 1]

Padding: same

65,972

512

0

GConv DW

6 × 4 × 256

Weights: 3 × 3 × 1 × 1 × 256

Bias: 1 × 1 × 1 × 256

256 groups of 1 3 × 3 × 1 Convolutions

Stride: [2 2]

Padding: same

2560

512

0

Conv

6 × 4 × 512

Weights: 1 × 1 × 256 × 512

Bias: 1 × 1 × 512

512 1 × 1 × 256 Convolutions

Stride: [1 1]

Padding: same

131,584

1024

0

GConv DW

6 × 4 × 512

Weights: 3 × 3 × 1 × 1 × 512

Bias: 1 × 1 × 1 × 512

512 Groups Of 1 3 × 3 × 1 Convolutions

Stride: [1 1]

Padding: same

5120

1024

0

Conv

6 × 4 × 512

Weights: 1 × 1 × 512 × 512

Bias: 1 × 1 × 512

512 1 × 1 × 512 convolutions

Stride: [1 1]

Padding: same

262,656

1024

0

GConv DW

6 × 4 × 512

Weights: 3 × 3 × 1 × 1 × 512

Bias: 1 × 1 × 1 × 512

512 groups of 1 3 × 3 × 1 convolutions

Stride: [1 1]

Padding: same

5120

1024

0

Conv

6 × 4 × 512

Weights: 1 × 1 × 512 × 512

Bias: 1 × 1 × 512

512 1 × 1 × 512 convolutions

Stride: [1 1]

Padding: same

262,656

1024

0

GConv DW

6 × 4 × 512

Weights: 3 × 3 × 1 × 1 × 512

Bias: 1 × 1 × 1 × 512

512 groups of 1 3 × 3 × 1 convolutions

Stride: [1 1]

Padding: same

5120

1024

0

Conv

6 × 4 × 512

Weights: 1 × 1 × 512 × 512

Bias: 1 × 1 × 512

512 1 × 1 × 512 convolutions

Stride: [1 1]

Padding: same

262,656

1024

0

GConv DW

6 × 4 × 512

Weights: 3 × 3 × 1 × 1 × 512

Bias: 1 × 1 × 1 × 512

512 groups of 1 3 × 3 × 1 convolutions

Stride: [1 1]

Padding: same

5120

1024

0

Conv

6 × 4 × 512

Weights: 1 × 1 × 512 × 512

Bias: 1 × 1 × 512

512 1 × 1 × 512 convolutions

Stride: [1 1]

Padding: same

262,656

1024

0

GConv DW

6 × 4 × 512

Weights: 3 × 3 × 1 × 1 × 512

Bias: 1 × 1 × 1 × 512

512 groups of 1 3 × 3 × 1 convolutions

Stride: [1 1]

Padding: same

5120

1024

0

Conv

6 × 4 × 512

Weights: 1 × 1 × 512 × 512

Bias: 1 × 1 × 512

512 1 × 1 × 512 convolutions

Stride: [1 1]

Padding: same

262,656

1024

0

GConv DW

3 × 2 × 512

Weights: 3 × 3 × 1 × 1 × 512

Bias: 1 × 1 × 1 × 512

512 groups of 1 3 × 3 × 1 convolutions

Stride: [2 2]

Padding: same

5120

1024

0

Conv

3 × 2 × 1024

Weights: 1 × 1 × 512 ×  × 1024

Bias: 1 × 1 × 1024

1024 1 × 1 × 512 convolutions

Stride: [1 1]

Padding: same

525,312

2048

0

GConv DW

3 × 2 × 1024

Weights: 3 × 3 × 1 × 1 × 1024

Bias: 1 × 1 × 1 × 1024

1024 groups of 1 3 × 3 × 1 convolutions

Stride: [1 1]

Padding: same

10,240

2048

0

Conv

3 × 2 × 1024

Weights: 1 × 1 × 1024 × 1024

Bias: 1 × 1 × 1024

1024 1 × 1 × 1024 convolutions

Stride: [1 1]

Padding: same

1,049,600

2048

0

Conv

3 × 2 × 1024

Weights: 1 × 1 × 1024 × 1024

Bias: 1 × 1 × 1024

1024 1 × 1 × 1024 convolutions

Stride: [1 1]

Padding: same

1,049,600

2048

0

Avg. Pooling

1 × 1 × 1024

–

–

0

FC Layer

1 × 1 × 2

Weights: 2 × 1024

Bias: 2 × 1

–

2040

Softmax

1 × 1 × 2

–

Binary classifier

Â