From: A large TV dataset for speech and music activity detection
Model Arch. | Parameters | Values |
---|---|---|
TCN | Kernel size | {3, 5, 5} |
 | No. filters | {32, 16, 32} |
 | No. stacks | {9, 5, 2} |
 | No. dilations | {3, 7, 2} |
 | Use skip connections | {False, true, true} |
CRNN | Kernel size | {3, 11, 11} |
 | No. filters | {64, 64, 16} |
 | No. GRU units | {80, 40} |