2021 SLT Children Speech Recognition Challenge (CSRC)

Description

​Children speech recognition is crucial to developing intelligent technologies such as children-machine interaction and computer-aided language learning. But both acoustic and linguistic patterns of children speech are remarkably different from those of adult speech. Moreover, children’s speech dataset is still limited nowadays, which may hinder the development of children speech recognition researches. Regarding this, we launch the Children Speech Recognition Challenge , as a flagship satellite event of IEEE SLT 2021 workshop, which will release about 400 hours of data for registered teams and set up two challenge tracks, targeting to boost children speech recognition research. Hopefully, the Challenge will also provide a good testbed for related techniques such as domain adaptation, transfer learning, and so on.

Data Introduction

The following data will be released to registered participants for challenge system building.

Set A: Adult speech training set
duration (hrs) 341.4
speakers 1999
speaker ages 18-60
language Mandarin
audio format 16kHz, 16bit, single channel wav
speaking style reading style
Set C1: Children speech training set
duration (hrs) 28.6
speakers 927
speaker ages 7-11
language Mandarin
audio format 16kHz, 16bit,single channel wav
speaking style reading style
Set C2: Children conversation training set
duration (hrs) 29.5
speakers 54
speaker ages 4-11
language Mandarin
audio format 16kHz, 16bit, single channel wav
speaking style conversational style
Specification of external datasets

Only the datasets in openslr list are allowed.

Tracks

Track 1 Only the data provided by the Challenge can be used to train the acoustic and language models.

Track 2 In addition to the provided data, external data listed in openslr can be used to train the acoustic model. Only the transcripts associated with the provided speech data and the external speech data in openslr are allowed in language model training.

Track 1 Track 2
Acoustic model training data Set A, Set C1 and Set C2 Set A, Set C1, Set C2 and external data appeared in openslr
Language model training data The text transcription in Set A, Set C1 and Set C2 The text transcription in Set A, Set C1, Set C2 and external data appeared in openslr
Evaluation data Children's reading and conversational speech (duration will be released shortly)

Evaluation & Ranking

Rules that must be followed

Organizers

Organizing Committee

Important Date

Date Description
Aug 14, 2020 Registration deadline, the due date for participants to join the Challenge
Aug 21, 2020 Training data release
Sept 30, 2020 Evaluation data release
Oct 10, 2020 Final submission deadline
Oct 25, 2020 Evaluation result and ranking release
Nov 8, 2020 System description submission
Jan 19-22, 2021 SLT2021 main worksop and challenge workshop

Registration

If you are interested in the challenge, please contact us by email to slt_challenge2021@data-baker.com.

English registration form中文报名表

1.Download the registration form (either English or Chinese version), fill in the information, and send it to the email address above. The subject of the email includes the name of the organization and the selected track. The registration deadline is August 14, 2020. For college teams, a copy of certificate showing student identity of the team leader, e.g., student ID card, needs to be provided as attchement with the registration email. For company teams, registration email should be sent out through a company official email address.

2.The organizing committee will review and verify the qualifications of the participating teams within 5 working days. The teams that have passed the review will sign the challenge data usage agreement, and qualified to join the challenge.

3.The training data will be announced on August 21, and the data downloading method will be provided to the sucessfully-registered teams.

Submission

You must submit your model as a Docker image to receive your official scores on the evaluation dataset, and all dependencies must be included in your Docker image.

In your submission, a script called submission.sh is expected. The submission.sh script should call your model to output the recognized result for each .wav file, and write to the output file in Kaldi style.

More details about submission will be announced shortly.

Awards

The unmentioned matters and the final interpretation right belong to the challenge organizer.

IEEE SLT 2021 Official Website: http://2021.ieeeslt.org