2021 SLT Children Speech Recognition Challenge (CSRC)

This activity has expired, if you have any needs, please contact: pre-sale@data-baker.com.

The system description papers

The official challenge description paper

You may cite the paper as:

@inproceedings{Yu2021SLT,
    TITLE = {{The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines}},
    AUTHOR = {Fan Yu and Zhuoyuan Yao and Xiong Wang and Keyu An and Lei Xie and Zhijian Ou and Bo Liu and Xiulin Li and Guanqiong Miao},
    URL = {https://arxiv.org/abs/2011.06724},
    BOOKTITLE = {{IEEE SLT 2021}},
    ADDRESS = {Shenzhen, China},
    YEAR = {2021},
    MONTH = January,
    KEYWORDS = {Automatic speech recognition ; children speech recognition ; deep learning ; audio ; datasets},
    PDF = {https://arxiv.org/pdf/2011.06724.pdf},
}

Description

​Children speech recognition is crucial to developing intelligent technologies such as children-machine interaction and computer-aided language learning. But both acoustic and linguistic patterns of children speech are remarkably different from those of adult speech. Moreover, children’s speech dataset is still limited nowadays, which may hinder the development of children speech recognition researches. Regarding this, we launch the Children Speech Recognition Challenge , as a flagship satellite event of IEEE SLT 2021 workshop, which will release about 400 hours of data for registered teams and set up two challenge tracks, targeting to boost children speech recognition research. Hopefully, the Challenge will also provide a good testbed for related techniques such as domain adaptation, transfer learning, and so on.

Data Introduction

The following data will be released to registered participants for challenge system building.

Set A: Adult speech training set
duration (hrs) 341.4
speakers 1999
speaker ages 18-60
language Mandarin
audio format 16kHz, 16bit, single channel wav
speaking style reading style
Set C1: Children speech training set
duration (hrs) 28.6
speakers 927
speaker ages 7-11
language Mandarin
audio format 16kHz, 16bit,single channel wav
speaking style reading style
Set C2: Children conversation training set
duration (hrs) 29.5
speakers 54
speaker ages 4-11
language Mandarin
audio format 16kHz, 16bit, single channel wav
speaking style conversational style
Specification of external datasets

Only the datasets in openslr list are allowed.

Tracks

Track 1 Only the data provided by the Challenge can be used to train the acoustic and language models.

Track 2 In addition to the provided data, external data listed in openslr can be used to train the acoustic model. Only the transcripts associated with the provided speech data and the external speech data in openslr are allowed in language model training.

Track 1 Track 2
Acoustic model training data Set A, Set C1 and Set C2 Set A, Set C1, Set C2 and external data appeared in openslr
Language model training data The text transcription in Set A, Set C1 and Set C2 The text transcription in Set A, Set C1, Set C2 and external data appeared in openslr
Evaluation data Children's reading and conversational speech (duration will be released shortly)

Evaluation & Ranking

Rules that must be followed

Organizers

Organizing Committee

Important Date

Date Description
Aug 14, 2020 Registration deadline, the due date for participants to join the Challenge
Aug 21, 2020 Training data release
Sept 30, 2020 Evaluation data release
Oct 10, 2020 Final submission deadline
Oct 25, 2020 Evaluation result and ranking release
Nov 8, 2020 System description submission
Jan 19-22, 2021 SLT2021 main worksop and challenge workshop

Registration

1.Download the registration form (either English or Chinese version), fill in the information, and send it to the email address above. The subject of the email includes the name of the organization and the selected track. The registration deadline is August 14, 2020.

2.The organizing committee will review and verify the qualifications of the participating teams within 5 working days. The teams that have passed the review will sign the challenge data usage agreement, and qualified to join the challenge.

3.The training data will be announced on August 21, and the data downloading method will be provided to the sucessfully-registered teams.

Submission

You must submit your model as a Docker image to receive your official scores on the evaluation dataset, and all dependencies must be included in your Docker image.

In your submission, a script called submission.sh is expected. The submission.sh script should call your model to output the recognized result for each .wav file, and write to the output file in Kaldi style.

More details about submission will be announced shortly.

Result

Track1
Team number Team name track1(CER)
CSRCA15 SJTU SpeechLab 18.50%
CSRCA33 奇辉千语 20.30%
CSRCA17 大耳朵图图喵喵喵 20.85%
CSRCA07 Ethiopian 21.66%
CSRCA18 CSR_Team 22.74%
CSRCA22 22.81%
CSRCA02 22.91%
CSRCA21 23.03%
CSRCA28 23.53%
CSRCA01 23.60%
CSRCA27 23.72%
CSRCA14 23.72%
CSRCA29 23.90%
CSRCA39 24.70%
CSRCA03 25.18%
CAT baseline system 25.34%
CSRCA31 25.36%
CSRCA40 25.55%
CSRCA30 27.05%
ESPNET transformer baseline system 27.28%
CSRCA06 27.61%
CSRCA05 27.85%
KALDI nnet3 baseline system 28.75%
CSRCA26 31.32%
CSRCA09 32.53%
CSRCA25 33.01%
CSRCA42 33.33%
CSRCA10 37.48%
CSRCA41 42.86%
CSRCA45 43.46%
CSRCA11 55.06%
Track2
Team number Team name track2(CER)
CSRCA07 Ethiopian 16.53%
CSRCA02 TCH 21.65%
CSRCA21 royalflush 22.69%
CSRCA18 CSR_Team 22.74%
CSRCA39 童声无忌 24.48%
CAT baseline system 25.34%
CSRCA40 25.55%
CSRCA03 26.57%
ESPNET transformer baseline system 27.28%
CSRCA45 28.65%
KALDI nnet3 baseline system 28.75%
CSRCA26 31.24%
CSRCA09 40.45%
CSRCA25 33.01%
CSRCA42 33.33%
CSRCA10 37.48%
CSRCA41 42.86%
CSRCA45 43.46%
CSRCA11 55.06%

Awards

The unmentioned matters and the final interpretation right belong to the challenge organizer.

IEEE SLT 2021 Official Website: http://2021.ieeeslt.org