In the HAAWAII project, speech technology forms the core of the air traffic controller support system and all its developed components. The Automatic Speech Recognition (ASR) delivers best performance when it is trained on a large amount of domain-specific data. Since the data transcription process is very demanding, other public sources are usually used for the model training. However, the ASR model adaptation needs specific domain data to improve its performance.
At the beginning of the project, we did not have any target data, but with project progress, we are gathering more and more data to adapt and improve our models. This is reflected in the performance of the HAAWAII ASR system. The table at the end shows the word error rates (WER) on the held out HAAWAII test data with an increasing amount of HAAWAII data used to train the models. At the beginning we had no data for the adaptation and the word error rate (WER) was around 20%. In May 2021, about half of the targeted amount of data was transcribed. Our ASR system has reached an average WER of about 12%. Roughly 50% of the data are air traffic controller utterances and the other half comes from noisy pilot utterances.
The current system is then trained on more data, HAAWAII partners have transcribed in the meantime. This brought the average WER below 10%. In other words, the number of errors in the output of the HAAWAII ASR system reduced by 50% relative.
The following table summarizes the achieved WER for the two target air traffic control areas, the London TMA of NATS and the enroute and oceanic airspace of Isavia, the Icelandic air navigation service provider.
Manually transcribed data (hours) | WER [%] | ||
Isavia | NATS | Isavia | NATS |
0 | 0 | 20.0 | 18.8 |
10 | 7 | 13.2 | 10.9 |
15 | 10 | 9.4 | 9.9 |