What controls the sequence of events in a VoiceXML application?

In a VoiceXML application, the sequence of events is determined by the VoiceXML dialog, which is written as series of tags in a flat text file, or VoiceXML document, as shown in the following example:

<?xml version="2.0"?>
<vxml version="2.0">
<!--This simple menu does not require text to speech or speech
<!--recognition capabilities.It plays an audio file and recognizes
<!--DTMF input.-->
<menu>
        <prompt>
                <audio src="hello.wav"/>
        </prompt>
        <choice dtmf="1"next="#trains"/>
        <choice dtmf="2"next="#boats"/>
        <choice dtmf="3"next="#planes"/>
        <choice dtmf="0"next="#end_menu"/>
</menu>
<form id="trains">
        <block>
                <audio src="trains.wav"/>
        </block>
</form>
<form id="boats">
        <block>
                <audio src="boats.wav"/>
        </block>
</form>
<form id="planes">
        <block>
                <audio src="planes.wav"/>
        </block>
</form>
<form id="end_menu">
        <block>
                <audio src="goodbye.wav"/>
        </block>
</form>
</vxml>