Skip to main content

Table 1 Characteristics of the MAVIR database. Number of word occurrences (#occ.), duration (dur.) in minutes (min), number of speakers (#spk.), and average MOS (Ave. MOS)

From: Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

File ID

Data

#word occ.

dur. (min)

#spk.

Ave. MOS

Mavir-02

train

13432

74.51

7 (7 ma.)

2.69

Mavir-03

dev

6681

38.18

2 (1 ma. 1 fe.)

2.83

Mavir-06

train

4332

29.15

3 (2 ma. 1 fe.)

2.89

Mavir-07

dev

3831

21.78

2 (2 ma.)

3.26

Mavir-08

train

3356

18.90

1 (1 ma.)

3.13

Mavir-09

train

11179

70.05

1 (1 ma.)

2.39

Mavir-12

train

11168

67.66

1 (1 ma.)

2.32

Mavir-04

test

9310

57.36

4 (3 ma. 1 fe.)

2.85

Mavir-11

test

3130

20.33

1 (1 ma.)

2.46

Mavir-13

test

7837

43.61

1 (1 ma.)

2.48

ALL

train

43467

260.27

13 (12 ma. 1 fe.)

2.56

ALL

dev

10512

59.96

4 (3 ma. 1 fe.)

2.64

ALL

test

20277

121.3

6 (5 ma. 1 fe.)

2.65

  1. ma. male, fe. female. These characteristics are displayed for training (train), development (dev), and testing (test) datasets