The HAAWAII partners agreed on a set of rules to uniquely transcribe controllers’ and pilots’ verbal radio communication utterances. These rules are kept rather simple and intuitive to enable a homogeneous portfolio of audio transcriptions independent of the transcriber. The transcription rules apply for human transcribers as well as for the automatic transcription that may be checked by a human later on. This enables to compare the performance of different speech-to-text engines and especially enables to exchange ATC transcriptions between different ANSPs without further adaptation costs.
The deliverable D3.1 of the HAAWAII project, therefore, contains 21 transcription rules in six categories to define English speech transcription with special symbols and exceptional words (but without punctuation), numbers, acronyms, unclear speech that might be unknown, pausing, noisy or swallowed, and non-English speech.
The following link contains an excerpt of D3.1 (click here). D3.1 also provides further reference material for ICAO alphabet, callsigns, and special aviation terms.
Thus, the example for (correct) transcription should look like this:
- “easy jet nine whiskey x-ray roger [noise] standby lufthansa triple eight you are cleared to vienna via gunpa initially climb flight level nine zero [hes] QNH two ze* correction one zero one zero give way to airbus on your three o’clock position [NE German] tschuess [/NE]”
and NOT like a following incorrect example:
- “Easyjet niner whisky xray roger _beep_ stand by Luft hansa eight eight eight you are cleared to loww via gunpa initially climb flightlevel nine zero aeh q_n_h two zero correction one zero one zero give way two air bus on your tree o-clock position tschüß”.
The set of rules will be applied to the audio recordings of Isavia, NATS, and ACG to be post-processed by BUT, Idiap, and DLR in HAAWAII. Furthermore, it has been agreed with partners and exercises of other big European ASR projects such as PJ.05-97-ASR, PJ.10-96-ASR, and STARFiSH to stick to the same set of rules.