FHNW Institute for Data Science Datasets

Swiss German Speech to Standard German Text

Swiss Parliaments Corpus

Dataset based on Swiss German parliament debates and their Standard German transcripts.
Parliament: Grosser Rat Kanton Bern
License: MIT

Version 1, 2020-01-28

Download
Paper

Initial version, used in GermEval 2020 Task 4, 70 hours of training data

Version 2, 2020-08-25

Download
Paper

Improved and extended version, used in SwissText 2021 Task 3, 293 hours of training data

All Swiss German Dialects Test Set

Test set with speech from people all over German-speaking Switzerland, with a dialect distribution close to the real dialect distribution.
Text data is from the German Common Voice project.
License: MIT

Version 1, 2021-03-10

Download
Paper

Initial version, used in SwissText 2021 Task 3, 13 hours of data

Gemeinderat Zürich Audio Corpus

Unlabeled audio dataset containing recorded parliament debates.
Parliament: Gemeinderat Zürich
License: MIT

Version 1, 2021-03-10

Download
Paper

Initial version, used in SwissText 2021 Task 3, 1208 hours of data

Swiss German STT Metric Evaluation

GER-HSR-1K

Human annotated ratings of a Swiss German speech-to-text system.
License: MIT

Version 1, 2023-05-22

Download
Paper coming soon

Initial version