Dataset based on Swiss German parliament debates and their Standard German transcripts.
Parliament: Grosser Rat Kanton Bern
License: MIT
Initial version, used in GermEval 2020 Task 4, 70 hours of training data
Improved and extended version, used in SwissText 2021 Task 3, 293 hours of training data
Test set with speech from people all over German-speaking Switzerland, with a dialect distribution close to the real dialect distribution.
Text data is from the German Common Voice project.
License: MIT
Initial version, used in SwissText 2021 Task 3, 13 hours of data
Unlabeled audio dataset containing recorded parliament debates.
Parliament: Gemeinderat Zürich
License: MIT
Initial version, used in SwissText 2021 Task 3, 1208 hours of data
Human annotated ratings of a Swiss German speech-to-text system.
License: MIT
Initial version