2021 SLT Children Speech Recognition Challenge (CSRC)

This activity has expired, if you have any needs, please contact: pre-sale@data-baker.com.

The system description papers

The official challenge description paper

You may cite the paper as:

@inproceedings{Yu2021SLT,
    TITLE = {{The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines}},
    AUTHOR = {Fan Yu and Zhuoyuan Yao and Xiong Wang and Keyu An and Lei Xie and Zhijian Ou and Bo Liu and Xiulin Li and Guanqiong Miao},
    URL = {https://arxiv.org/abs/2011.06724},
    BOOKTITLE = {{IEEE SLT 2021}},
    ADDRESS = {Shenzhen, China},
    YEAR = {2021},
    MONTH = January,
    KEYWORDS = {Automatic speech recognition ; children speech recognition ; deep learning ; audio ; datasets},
    PDF = {https://arxiv.org/pdf/2011.06724.pdf},
}

Description

Children speech recognition is crucial to developing intelligent technologies such as children-machine interaction and computer-aided language learning. But both acoustic and linguistic patterns of children speech are remarkably different from those of adult speech. Moreover, children’s speech dataset is still limited nowadays, which may hinder the development of children speech recognition researches. Regarding this, we launch the Children Speech Recognition Challenge , as a flagship satellite event of IEEE SLT 2021 workshop, which will release about 400 hours of data for registered teams and set up two challenge tracks, targeting to boost children speech recognition research. Hopefully, the Challenge will also provide a good testbed for related techniques such as domain adaptation, transfer learning, and so on.

Data Introduction

The following data will be released to registered participants for challenge system building.

Set A: Adult speech training set

duration (hrs)	341.4
speakers	1999
speaker ages	18-60
language	Mandarin
audio format	16kHz, 16bit, single channel wav
speaking style	reading style

Set C1: Children speech training set

duration (hrs)	28.6
speakers	927
speaker ages	7-11
language	Mandarin
audio format	16kHz, 16bit，single channel wav
speaking style	reading style

Set C2: Children conversation training set

duration (hrs)	29.5
speakers	54
speaker ages	4-11
language	Mandarin
audio format	16kHz, 16bit, single channel wav
speaking style	conversational style

Specification of external datasets

Only the datasets in openslr list are allowed.

Tracks

Track 1 Only the data provided by the Challenge can be used to train the acoustic and language models.

Track 2 In addition to the provided data, external data listed in openslr can be used to train the acoustic model. Only the transcripts associated with the provided speech data and the external speech data in openslr are allowed in language model training.

	Track 1	Track 2
Acoustic model training data	Set A, Set C1 and Set C2	Set A, Set C1, Set C2 and external data appeared in openslr
Language model training data	The text transcription in Set A, Set C1 and Set C2	The text transcription in Set A, Set C1, Set C2 and external data appeared in openslr
Evaluation data	Children's reading and conversational speech (duration will be released shortly)

Evaluation & Ranking

We use character error rate (CER) to evaluate the model performance for both track 1 and 2. The character error rate is calculated as: the summed number of error characters in both the children reading test set and the children conversation test set divided by the total number of characters in the two test sets.
For track 1, participants need to provide detailed technical solutions and decoding results over the test sets, including but not limited to the model structure and algorithms used. For track 2, participants need to provide an additional list of data used, besides the data used in track 1.

Rules that must be followed

Allows data augmentation on the original training dataset, including, but not limited to, adding noise or reverberation, speed perturb and tone change.
The use of test dataset in any form of non-compliance is strictly PROHIBITED, including but not limited to use the test dataset to finetune or train model.
Multi-system fusion is NOT allowed.
If the character error rates of the two teams on the test data set are the same, the system with lower computation complexity will be judged as the superior one.
If forced alignment is used to obtain the frame-level classification label, the forced alignment model must be trained on the basis of the data allowed by the corresponding track.
Shallow fusion is allowed to the end-to-end approaches, e.g., LAS, RNNT and Transformer, but the training data of the shallow fusion model can only come from the transcripts of the training dataset.
The right of final interpretation solely belongs to the organizer. In case of special circumstances, the organizer will coordinate the interpretation.

Organizers

Databaker (Beijing) Technology Co., Ltd.
Audio, Speech and Language Processing Group, Northwestern Polytechnical University
Speech Processing and Machine Intelligence Lab, Tsinghua University
Speech Lab, Xiamen University
Task Force on Speech Dialogue and Auditory Processing, China Computer Federation

Organizing Committee

Lei Xie, Northwestern Polytechnical University
Zhijian Ou, Tsinghua University
Qingyang Hong, Xiamen University
Xiulin Li, Databaker
Bo Liu, Databaker

Important Date

Date	Description
Aug 14, 2020	Registration deadline, the due date for participants to join the Challenge
Aug 21, 2020	Training data release
Sept 30, 2020	Evaluation data release
Oct 10, 2020	Final submission deadline
Oct 25, 2020	Evaluation result and ranking release
Nov 8, 2020	System description submission
Jan 19-22, 2021	SLT2021 main worksop and challenge workshop

Registration

1.Download the registration form (either English or Chinese version), fill in the information, and send it to the email address above. The subject of the email includes the name of the organization and the selected track. The registration deadline is August 14, 2020.

2.The organizing committee will review and verify the qualifications of the participating teams within 5 working days. The teams that have passed the review will sign the challenge data usage agreement, and qualified to join the challenge.

3.The training data will be announced on August 21, and the data downloading method will be provided to the sucessfully-registered teams.

Submission

You must submit your model as a Docker image to receive your official scores on the evaluation dataset, and all dependencies must be included in your Docker image.

In your submission, a script called submission.sh is expected. The submission.sh script should call your model to output the recognized result for each .wav file, and write to the output file in Kaldi style.

More details about submission will be announced shortly.

Result

Track1

Team number	Team name	track1(CER)
CSRCA15	SJTU SpeechLab	18.50%
CSRCA33	奇辉千语	20.30%
CSRCA17	大耳朵图图喵喵喵	20.85%
CSRCA07	Ethiopian	21.66%
CSRCA18	CSR_Team	22.74%
CSRCA22		22.81%
CSRCA02		22.91%
CSRCA21		23.03%
CSRCA28		23.53%
CSRCA01		23.60%
CSRCA27		23.72%
CSRCA14		23.72%
CSRCA29		23.90%
CSRCA39		24.70%
CSRCA03		25.18%
CAT baseline system		25.34%
CSRCA31		25.36%
CSRCA40		25.55%
CSRCA30		27.05%
ESPNET transformer baseline system		27.28%
CSRCA06		27.61%
CSRCA05		27.85%
KALDI nnet3 baseline system		28.75%
CSRCA26		31.32%
CSRCA09		32.53%
CSRCA25		33.01%
CSRCA42		33.33%
CSRCA10		37.48%
CSRCA41		42.86%
CSRCA45		43.46%
CSRCA11		55.06%

Track2

Team number	Team name	track2(CER)
CSRCA07	Ethiopian	16.53%
CSRCA02	TCH	21.65%
CSRCA21	royalflush	22.69%
CSRCA18	CSR_Team	22.74%
CSRCA39	童声无忌	24.48%
CAT baseline system		25.34%
CSRCA40		25.55%
CSRCA03		26.57%
ESPNET transformer baseline system		27.28%
CSRCA45		28.65%
KALDI nnet3 baseline system		28.75%
CSRCA26		31.24%
CSRCA09		40.45%
CSRCA25		33.01%
CSRCA42		33.33%
CSRCA10		37.48%
CSRCA41		42.86%
CSRCA45		43.46%
CSRCA11		55.06%

Awards

Top three winning teams based on the ranking of scores from each track will be awarded prizes and certificates.
The following prizes will be awarded:
One First Prize: Winner team will receive: 500 USD
One Second Prize: Winner team will receive: 300 USD
One Third Prize: Winner team will receive: 200 USD
All prizes are stated in US dollars.

The unmentioned matters and the final interpretation right belong to the challenge organizer.

IEEE SLT 2021 Official Website: http://2021.ieeeslt.org