Metrics and Command Extraction
The common paper of DLR, Idiap and the Lithuanian Air Navigation Service Provider Oro navigacija describes the recognition performance on word and on semantic level for utterance from the Lithuanian airspace.
The common paper of DLR, Idiap and NATS describes the rule-based algorithm to transform a sequence of words from an air traffic controller or pilot utterance to its semantic interpretation defined an extension of the 16-04 ontology. The defined JSON format allows a consistent exchange concerning Speech-to-Text transformation, ontology information or both together between different systems and applications. The format is by definition machine readable and easy to expand with additional key-value pairs while ensuring compatibility with old data. The paper also impressively shows the break-down of extraction performance, when no surveillance data is used or available.
The common paper of DLR, Idiap, Austro Control and Czech air navigation service provider ANS CR introduces the metrics command extraction rate, callsign extraction rate, command extraction error rate. These rates are evaluated on utterances of Austro Control and ANS CR, which are recorded in the MALORCA project in the ops room environment and in the solution 16-04 in the lab environment.
A shorter version of this Paper was presented during the Satellite Workshop at the Interspeech 2021:
Readback Error Detection
The king’s discipline of Automatic Speech Recognition and Understanding is readback error detection. Noisy and very abbreviated readback of pilot utterances require speech recognition and its semantic interpretation even when word error rates are beyond 10%. And the even bigger challenge is that readback errors are, luckily, seldom events. Only 1% to 4% of the conversation contain readback errors.
The common paper of DLR, Isavia ANS, Idiap, University of Brno (BUT), NATS and Austro Control shows that a recognition rate on command level of slightly above 50% is already sufficient to achieve a readback error detection of 50%, provided the error rate on command level is below 0.2%. Otherwise a readback error false alarm rate of more than 10% must be accepted.
The common paper of DLR, BUT, Isavia ANS and Idiap presented two different algorithms for readback error detection: a rule-based one and a data-driven one, which is based on training a neural network by artificial readback error samples. The paper also presents two different approaches for command extraction, again a rule-based one and a data-driven one.
Application of HAAWAII architecture
The HAAWAII architecture was already successfully used in different projects. HAAWAII architecture means:
- to use Assistant Based Speech Recognition (ABSR), which integrates contextual knowledge (e.g., callsigns) from flight plan and surveillance data into Speech Recognition (Speech-to-Text with so called callsign boosting) and Speech Understanding (Text-to-Concept),
- to make very clear, that speech recognition (Speech-to-Text) does not automatically incorporate speech understanding (Text-to-Concept), only both together can enable an automatic speech recognition and understanding (ASRU)
- to use contextual knowledge from the conversation (e.g., previous utterance) in Text-to-Concept, e.g. “two zero zero thank you” in a pilot readback is very probable an altitude readback, and not an speed or heading readback, if the ATCo has just given a CLIMB command to flight level 200 (RBA),
- to integrate command validation in Text-to-Concept phase (VAL),
- to have the same acoustic and language model for ATCo and pilot utterances (ONE),
- to have a separate block for detection of voice transmissions, which either relies on push-to-talk (PTT) availability or needs to evaluate the input wave signal in more detail (Voice Activity Detection, VAD)
- to repair over- or under-splitting in the Text-to-Concept phase (REP)
The paper above written by DLR, Idiap, Fraport and Atrics benefits from HAAWAII elements ABSR, ASRU, VAL, VAD and REP. It integrates a modern A-SMGCS system with speech recognition and understanding to support apron controllers for maintaining flight strip information and supports simulation pilots to reduce their workload.
The common paper of DLR, Indra Navia AS, LEONARDO S.p.A., the Lithuanian ANSP Oro Navigacija, HungaroControl and Austro Control benefits from ABSR, ASRU, VAL, PTT and REP. It summarizes the results of 3 exercise conducted in solution 97 of SESAR Industrial Research with respect to speech recognition and understanding support for tower controllers.
Speech and Natural Language Processing Technologies for Pseudo-Pilot Simulator, Zuluaga-Gomez A. Prasad, SESAR Innovation Days 2022 (SID 2022), Budapest, Hungary, December 6-8.
The paper from Idiap shows how to support simulations pilots with automatic speech recognition. The main ideas are:
(i) ASR to generate transcript from ATCo,
(ii) an entity generator to tag words (callsigns, commands, value) and
(iii) a repetition generator that uses a rule-based system to generate a pilot response based on the generated tags and a text-to-speech system that acts as a pseudo-pilot to repeat the generated pilot response.
Callsign Extraction
The first step for ATC utterance understanding is to extract the callsign. Knowing which numbers and letters belong to the callsign, extracting the values of the commands is eased. The following papers of the HAAWAII team addressed this challenge.
The paper of DLR shows the advantages when callsign information is available and used. The algorithm is described and the effect of the different algorithm parts are also shown. We see quantitative decrease in performance, when certain parts of the algorithm are excluded.
The paper from Idiap addresses the improvement on word level when callsign boosting is applied, by using information from the flight plan and surveillance data.
The paper from University of Brno (BUT), Saarland University and Idiap addresses the improvement on word level when callsign boosting.
Speech-to-Text
The following papers concentrate on improvements on speech-to-text level.
The common paper of DLR and Braunschweig University shows the results of training the DeepSpeech Engine to recognize utterance from Prague and Vienna ops room environment.
The common paper of Idiap, DLR and Beijing Institute of Technology addresses the usage of Pre-trained Wav2Vec2.0.
The paper from Idiap presents a two-step approach to leverage contextual data.
The paper above from Idiap, University of Brno (BUT) and ReplayWell, addresses Contextual Semi-Supervised Learning.
The paper from Brno University of Technology (BUT) describes how to detect English speech in ATC utterances containing more than one language.
Speaker-Role Classification
The common paper from Idiap and DLR describe the application of BERTraffic to detect the speaker role, i.e. whether air traffic controller or pilot is speaking.
The following paper for Idiap and DLR uses a grammar-based approach for identifying the speaker role.
A shorter version of the paper was presented at the Interspeech 2021.
Public Project Deliverables
Deliverable | Description | Link |
D1-1 | This document contains the operational concept of the HAAWAII project. It addresses the high-level Automatic Speech Recognition use cases read-back error detection, ATCO workload assessment, callsign highlighting, and integration of speech recognition with CPDLC, radar label prefilling, and consistency checking of manual versus verbal input. It is a living document. The final version will be submitted as D6.2 at the end of the project. | Click here |
D3-2 | This deliverable concentrates on the semantic interpretation, i.e., the annotation of the transcribed voice recordings by using the information from corresponding voice recordings from NATS. At the time of its submission, 7.5 hours of manually transcribed pilot and ATCo utterances were available from London airspace. Utterances corresponding to about 57 minutes of voice data were manually annotated, while the remaining 6.5 hours were annotated automatically. | Click here |
D3-3 | This deliverable concentrates on the semantic interpretation, i.e. the annotation, of the transcribed voice recordings by using the information from corresponding voice recordings from Isavia. At the time of its submission, 7.5 hours of manually transcribed pilot and ATCo utterances were available from Isavia airspace. 90 minutes of them were manually annotated, the remaining 6 hours were automatically annotated. | Click here |
D5-5 | Final Project Results Report | Click here |
D6-1 | This deliverable summarizes the dissemination of the HAAWAII project by conducting Stakeholder workshops. It gives a report summarizing the first Stakeholder workshop conducted at the end of June 2021 and of the second Stakeholder Workshop conducted end of September 2022. | Click here |
D6-2 | This deliverable is an update of D1-1 and contains the findings added to D1-1 during the project.The document was updated during the last months considering the feedback from SJU and especially from IFATCA. | Click here |
D6-3 | Updated Requirements Document | Click here |
D6-5 | Results of Dissemination, Communication and Exploitation. D6-4 is a living documented being updated during the lifetime of the HAAWAII project. D6-5 is the latest version of D6-4. | Click here |
References used as starting point for the project
- Helmke, J. Rataj, T. Mühlhausen, O. Ohneiser, H. Ehr, M. Kleinert, Y. Oualil, and M. Schulder, “Assistant-based speech recognition for ATM applications,” in 11th USA/Europe Air Traffic Management Research and Development Seminar (ATM2015), Lisbon, Portugal, 2015.
- Helmke, O. Ohneiser, Th. Mühlhausen, M. Wies, ”Reducing controller workload with automatic speech recognition,” in IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). Sacramento, California, 2016.
- Helmke, O. Ohneiser, J. Buxbaum, C. Kern, “Increasing ATM efficiency with assistant-based speech recognition,” in 12th USA/Europe Air Traffic Management Research and Development Seminar (ATM2017). Seattle, Washington, 2017.
- Kleinert, H. Helmke, G. Siol, H. Ehr, A. Cerna, C. Kern, D. Klakow, P. Motlicek et al., ”Semi-supervised Adaptation of Assistant Based Speech Recognition Models for different Approach Areas,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, England, 2018.
- Helmke, M. Slotty, M. Poiger, D. F. Herrer, O. Ohneiser et al., “Ontology for transcription of ATC speech commands of SESAR 2020 solution PJ.16-04,” in IEEE/AIAA 37th Digital Avionics Systems Conference (DASC). London, United Kingdom, 2018.
- Kleinert, H. Helmke, S. Moos, P. Hlousek, C. Windisch, O. Ohneiser, H. Ehr, and A. Labreuil, “Machine Learning of Air Traffic Controller Command Extraction Models for Speech Recognition Applications,” 9th SESAR Innovation Days, Athens, Greece, 2019.
- Helmke, M.Kleinert, O. Ohneiser, H. Ehr, and S. Shetty, “Reducing Controller Workload by Automatic Speech Recognition Assisted Radar Label Maintenance,” in IEEE/AIAA 39th Digital Avionics Systems Conference (DASC). Virtual Conference, 2020.
- atco2.org