===================================================== Aalto University DSP Course Conversation Corpus 2013- ===================================================== 1. INTRODUCTION This version of the DSPCON corpus contains transcribed recordings of Finnish conversations by Digital Signal Processing course students in Aalto University, Finland, from the years 2013 to 2015. The intention has been to use the data to build better models for automatic speech recognition of conversational Finnish. 160 different male students and 21 female students had conversations in pairs, recorded their own conversations, and transcribed at least 20 utterances each. In total they contributed 3926 utterances which adds up to 7.4 hours of audio. The sound was recorded using Logitech USB headsets, PC 960 and H390. In 2013 and 2014, Labtec headsets were used also. 2. DIRECTORY STRUCTURE The data collected each year is organized into its own directory. The recordings and transcripts from each student are in /students/ directory, where is a student ID of the format dsp__ and is a speaker ID. Male speakers have speaker IDs dspmXXX and female speakers have IDs dspfXXX, where XXX is a running number. The alignments directory contains forced-alignments, i.e. timestamps assigned to each phoneme in each transcribed word. These have been created using AaltoASR, the Aalto University speech recognizer, and saved in AaltoASR .phn file format. Rough word-level segmentations have been deduced from the forced-alignments and saved in Praat TextGrid format. Note that these correspond to the most probable word and phoneme segments as given by the acoustic model that was used - not necessarily the linguistically exact segments. The empty phoneme intervals that appear at word boundaries were inserted by the automatic aligner and may not correspond to actual pauses in the speech signal. 3. TRANSCRIPTION Corrections and updates have been made to the original transcripts created by the students. There are two kinds of transcripts: verbatim.trn contains exact phonetic transcripts suitable for acoustic model training and normalized.trn contains "normalized" transcripts suitable for evaluation. The normalized transcripts contain alternations for different pronunciation of the same word. Normalization is incomplete and has only been done for certain recordings. Two garbage tokens have been used, [laugh] to denote laughter and [reject] to denote other noise that cannot be transcribed. Interrupted words are written down ending in a minus sign. 4. FILE FORMATS Audio files have been saved in Microsoft WAVE format. Sample format is 44 kHz 16-bit PCM. Transcripts have been saved in trn format specified in NIST Scoring Toolkit. Each line contains a word sequence, follow by an utterance ID enclosed in parenthesis. Transcript alternations are used in normalized transcripts to allow alternative pronunciations. Alternative forms are separated by a slash sign (/) and enclosed in curly brackets. An at sign (@) represents an empty word; when scoring a text, a missing word will not be counted as an error, if @ is specified as its alternative. Example: { mm / @ } { en minä / emmä / en mä / emminä } { tiä / tiiä / tiädä / tiedä }