1. Introduction 5
1.1 Speech file formats 6
1.2 Directory structure 6
1.3 File naming conventions 7
1.4 Label files 8
1.4.1 Label file header 8
1.4.2 Label file body 10
1.4.3 Example of label file 12
2. Database design and collection 12
2.1 Recording platform 13
2.2 Speaker recruitment 13
2.3 Design of prompting and prompt-sheet 13
3. Database contents definition 14
3.1 Application words (A1-6) 15
3.2 Isolated digits (I1-2, B1) 15
3.2.1 Single digit (I1-2) 15
3.2.2 Isolated-digits string (B1) 16
3.3 Connected digits (C1-4) 16
3.3.1 Sheet number (C1) 16
3.3.2 Telephone number (C2) 16
3.3.3 Credit card number (C3) 17
3.3.4 PIN code (C4) 17
3.4 Dates (D1-3) 17
3.4.1 Spontaneous date (D1) 17
3.4.2 Prompted date (D2) 17
3.4.3 Relative and general date expression (D3) 18
3.5 Application word phrase (E1) 19
3.6 Spelt items (L1-3) 19
Alternative Name 19
3.6.1 Spelt personal name (L1) 20
3.6.2 Spelt city name (L2) 20
3.6.3 Spelt artificial word (L3) 20
3.7 Money amount (M1) 20
3.8 Natural number (N1) 21
3.9 Directory assistance names (O1-7) 21
3.9.1 Prompted surname (O1) 21
3.9.2 Spontaneous city name (O2) 21
3.9.3 Prompted city name (O3) 21
3.9.4 Prompted company name (O5) 22
3.9.5 Prompted personal name (O7) 22
3.10 Yes/No (Q1-2) 22
3.11 Phonetically rich sentences (S1-9) 22
3.12 Times (T1-2) 23
3.12.1 Spontaneous time (T1) 23
3.12.2 Prompted time (T2) 23
3.13 Phonetically rich words (W1-4) 24
4. Orthographical transcription 26
5. Lexicon 29
6. Speaker demographic information 32
6.1 Accent/Regions 32
6.2 Speaker ages 33
7. Recording conditions 33
7.1 Environments for mobile network 33
7.2 Network providers 34
All the recordings are GSM. 34
7.3 Handsets 34
8. Test material 34
9. Deviations from general SALA II specifications 35
10. Sample sheets 35
10.1 Sample instruction sheet and sample prompt sheet 35
Gràcies per la seva col·laboració. S’ha realitzat un enregistrament complet. Adéu-siau. 37
Appendix 1. Credit Card Checksum Algorithm 38
Appendix 2. Credit card numbers 39
Appendix 3. PIN codes 40
Appendix 4. Surnames 40
Appendix 5. Cities 42
Appendix 6. Companies 45
Appendix 7. Full names 48
The Catalan Database for Mobile Telephone Network was recorded within the scope of a research project supported by the Catalonian Government. The design of the corpus and the collection was performed at the Universitat Politècnica de Catalunya (UPC). Transcription and formatting was performed at Verbio Speech Technologies, Spain.
. . . ...
. . ...
. . ...
. ...