You may cite the paper as:
@inproceedings{Yu2021SLT, TITLE = {{The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines}}, AUTHOR = {Fan Yu and Zhuoyuan Yao and Xiong Wang and Keyu An and Lei Xie and Zhijian Ou and Bo Liu and Xiulin Li and Guanqiong Miao}, URL = {https://arxiv.org/abs/2011.06724}, BOOKTITLE = {{IEEE SLT 2021}}, ADDRESS = {Shenzhen, China}, YEAR = {2021}, MONTH = January, KEYWORDS = {Automatic speech recognition ; children speech recognition ; deep learning ; audio ; datasets}, PDF = {https://arxiv.org/pdf/2011.06724.pdf}, }
Children speech recognition is crucial to developing intelligent technologies such as children-machine interaction and computer-aided language learning. But both acoustic and linguistic patterns of children speech are remarkably different from those of adult speech. Moreover, children’s speech dataset is still limited nowadays, which may hinder the development of children speech recognition researches. Regarding this, we launch the Children Speech Recognition Challenge , as a flagship satellite event of IEEE SLT 2021 workshop, which will release about 400 hours of data for registered teams and set up two challenge tracks, targeting to boost children speech recognition research. Hopefully, the Challenge will also provide a good testbed for related techniques such as domain adaptation, transfer learning, and so on.
The following data will be released to registered participants for challenge system building.
duration (hrs) | 341.4 |
---|---|
speakers | 1999 |
speaker ages | 18-60 |
language | Mandarin |
audio format | 16kHz, 16bit, single channel wav |
speaking style | reading style |
duration (hrs) | 28.6 |
---|---|
speakers | 927 |
speaker ages | 7-11 |
language | Mandarin |
audio format | 16kHz, 16bit,single channel wav |
speaking style | reading style |
duration (hrs) | 29.5 |
---|---|
speakers | 54 |
speaker ages | 4-11 |
language | Mandarin |
audio format | 16kHz, 16bit, single channel wav |
speaking style | conversational style |
Only the datasets in openslr list are allowed.
Track 1 Only the data provided by the Challenge can be used to train the acoustic and language models.
Track 2 In addition to the provided data, external data listed in openslr can be used to train the acoustic model. Only the transcripts associated with the provided speech data and the external speech data in openslr are allowed in language model training.
Track 1 | Track 2 | |
---|---|---|
Acoustic model training data | Set A, Set C1 and Set C2 | Set A, Set C1, Set C2 and external data appeared in openslr |
Language model training data | The text transcription in Set A, Set C1 and Set C2 | The text transcription in Set A, Set C1, Set C2 and external data appeared in openslr |
Evaluation data | Children's reading and conversational speech (duration will be released shortly) |
Date | Description |
---|---|
Aug 14, 2020 | Registration deadline, the due date for participants to join the Challenge |
Aug 21, 2020 | Training data release |
Sept 30, 2020 | Evaluation data release |
Oct 10, 2020 | Final submission deadline |
Oct 25, 2020 | Evaluation result and ranking release |
Nov 8, 2020 | System description submission |
Jan 19-22, 2021 | SLT2021 main worksop and challenge workshop |
1.Download the registration form (either English or Chinese version), fill in the information, and send it to the email address above. The subject of the email includes the name of the organization and the selected track. The registration deadline is August 14, 2020.
2.The organizing committee will review and verify the qualifications of the participating teams within 5 working days. The teams that have passed the review will sign the challenge data usage agreement, and qualified to join the challenge.
3.The training data will be announced on August 21, and the data downloading method will be provided to the sucessfully-registered teams.
You must submit your model as a Docker image to receive your official scores on the evaluation dataset, and all dependencies must be included in your Docker image.
In your submission, a script called submission.sh
is
expected. The submission.sh
script should call your model
to output the recognized result for each .wav file, and write to the
output file in Kaldi style.
More details about submission will be announced shortly.
Team number | Team name | track1(CER) |
---|---|---|
CSRCA15 | SJTU SpeechLab | 18.50% |
CSRCA33 | 奇辉千语 | 20.30% |
CSRCA17 | 大耳朵图图喵喵喵 | 20.85% |
CSRCA07 | Ethiopian | 21.66% |
CSRCA18 | CSR_Team | 22.74% |
CSRCA22 | 22.81% | |
CSRCA02 | 22.91% | |
CSRCA21 | 23.03% | |
CSRCA28 | 23.53% | |
CSRCA01 | 23.60% | |
CSRCA27 | 23.72% | |
CSRCA14 | 23.72% | |
CSRCA29 | 23.90% | |
CSRCA39 | 24.70% | |
CSRCA03 | 25.18% | |
CAT baseline system | 25.34% | |
CSRCA31 | 25.36% | |
CSRCA40 | 25.55% | |
CSRCA30 | 27.05% | |
ESPNET transformer baseline system | 27.28% | |
CSRCA06 | 27.61% | |
CSRCA05 | 27.85% | |
KALDI nnet3 baseline system | 28.75% | |
CSRCA26 | 31.32% | |
CSRCA09 | 32.53% | |
CSRCA25 | 33.01% | |
CSRCA42 | 33.33% | |
CSRCA10 | 37.48% | |
CSRCA41 | 42.86% | |
CSRCA45 | 43.46% | |
CSRCA11 | 55.06% |
Team number | Team name | track2(CER) |
---|---|---|
CSRCA07 | Ethiopian | 16.53% |
CSRCA02 | TCH | 21.65% |
CSRCA21 | royalflush | 22.69% |
CSRCA18 | CSR_Team | 22.74% |
CSRCA39 | 童声无忌 | 24.48% |
CAT baseline system | 25.34% | |
CSRCA40 | 25.55% | |
CSRCA03 | 26.57% | |
ESPNET transformer baseline system | 27.28% | |
CSRCA45 | 28.65% | |
KALDI nnet3 baseline system | 28.75% | |
CSRCA26 | 31.24% | |
CSRCA09 | 40.45% | |
CSRCA25 | 33.01% | |
CSRCA42 | 33.33% | |
CSRCA10 | 37.48% | |
CSRCA41 | 42.86% | |
CSRCA45 | 43.46% | |
CSRCA11 | 55.06% |
Top three winning teams based on the ranking of scores from each track will be awarded prizes and certificates.
The following prizes will be awarded:
One First Prize: Winner team will receive: 500 USD
One Second Prize: Winner team will receive: 300 USD
One Third Prize: Winner team will receive: 200 USD
All prizes are stated in US dollars.
The unmentioned matters and the final interpretation right belong to the challenge organizer.