During the first run of manual correction of the automatic transcriptions the ATCOs observed that the automatic splitting and transcription process performed well, but there was still a lot of manual work to be performed by the team performing the manual splitting and transcription. Some common words from Isavia ANS that the speech recognizer has never heard of were e.g. “iceland”, “reykjavik”, “greenland”, “keflavik” and other domain specific phraseology.
In the second iteration the time spent on the manual splitting and transcription process was considerably improved. The ATCOs observed that a lot of ANSP related words from NATS and Isavia ANS airspace that were not recognized correctly in the first automatic transcription process were correctly transcribed in the second round by the automatic transcription.
The time was well spent in the first manual transcription process. In the second iteration with help from the automatic tools the transcribers have manually split, labeled and transcribed 4-5 hours of voice data without silence.
Next steps will be to further improve the segmentation, speaker labelling and transcription tools and then run another manual transcription of 5 hours of voice data without silence. After the 5 hours have finished automatic tools will be used to detect and flag from the remaining voice data only the interesting use cases, the use cases that are beneficial for the project, like possible readback error scenarios and with recordings which contain word sequences which cannot be automatically labeled with e.g. callsign, type, value, greeting etc. These utterances contain strong hints, that the automatic transcription or the automatic semantic extraction contains errors.
The voice data from these specific scenario contexts will be manually transcribed, this process will ensure that the transcribers will spend the remaining 10 hours’ time only with relevant voice data.
The research institutes and the ANSPs involved in the HAAWAII project have managed to work together and the successful results could be seen in the improvement of the automatic tools from the second process of segmentation, speaker labelling and transcription.
Currently a word error rate of 11.3% for ATCO utterances and of 22.7% for pilot utterances is achieved for NATS data. For Isavia recordings word error rates of 9.3% and 17.3% respectively are achieved. The aim is to decrease all rates by again 50% relative.