Building a MIDI Music App for iOS in Swift

Written by: Eric Ford

This post describes how to build the simplest possible MIDI app (which you can download already built for free from the iOS app store). It’s called Jamulator, like a calculator but for jamming.

The open-source code for the app is here.

If you want to build an app using MIDI, you will want a sound font, and you’ll probably want AudioKit. This version of AudioKit has been updated for Swift 3.2.

Why Use MIDI?

MIDI is a standard music protocol that lets you generate very realistic music in a very simple way. No more cheesy old-school computer music from the days of Pac-Man (which was pretty good back in 1980).

[youtube https://www.youtube.com/watch?v=BxYzjjs6d1s&w=560&h=315\]

Nowadays many apps use recorded sound, but you will find that MIDI is a much more flexible way to make music than using recorded audio files. Using a selection of 128 high-quality instrument sounds that closely approximate their real-world counterparts, it will enable you to:

  • Make the music in your app sound "real".

  • Play chords and melodies.

  • Record music and play it back.

  • Change melody, harmony, rhythm, and pitch in response to events.

Caveat: Both the MIDI sound font and AudioKit are very large, weighing in at about 150 mg for the font and more than 100 mg for AudioKit. AudioKit is optional. In this case, it is used just to display a piano keyboard. Smaller sound fonts are available, but they may be of lesser quality.

A Brief History of MIDI

In the early 1980s, synthesizers, drum machines, and even automatic bass players were being introduced into the mass musical instrument market. Just a few years before that, synthesizers were uncommon.

They were cumbersome to use, involving complicated patch bays similar to old-time telephone switchboards. Each sound, or “patch,” was hand-tuned using a variety of knobs. These synthesizers had no way to remember these settings, so players did their best to recreate sounds by hand when they were needed.

These early instruments had their own synthetic sounds and were not commonly used to reproduce the sounds of traditional instruments. A good example of the sound of early synthesis is The Edgar Winter Group’s “Frankenstein.”

[youtube https://www.youtube.com/watch?v=RSLP1FCREBA?start=52&w=560&h=315\]

Notwithstanding Edgar Winter, Pink Floyd, The Beatles, and such, working musicians were, on the whole, unenthused by these experimental sounds. Much more relevant to their working lives would be having high-quality prepackaged instrument sounds and the capability to record and play these back without complicated tape loop setups.

Enter the Musical Instrument Digital Interface, aka MIDI, with a very compact software protocol and standard cables connecting to standard hardware ports. Key players in the musical instrument industry got together and agreed on a standard so musicians could connect these devices together. Their focus was on performing, not twiddling knobs to get a brand new sound.

!Sign up for a free Codeship Account

Apple’s Sound Infrastructure

You have to jump through quite a few hoops to get to the point where you can play a MIDI note using Apple’s Core Audio API. Learning about how this plumbing fits together will enable you to use more Core Audio features in the future, including effects like delay and reverb and mixers.

The AUGraph

First you need a graph, specifically an AUGraph. You will add some nodes (you guessed it, AUNodes) to the graph in order to connect parts of the audio subsystem.

A graph with nodes is a very generic, non-audio-specific way of describing what’s getting connected. I like to think of the graph as a patch bay in my studio. Everything has to go through it in order to participate in the final sound output, but it’s just a connector. Nodes are like connections. You need a specific kind of node to connect to a specific kind of audio component.

You’ll need two specific nodes, a synth node and an output node, in order to make sounds with MIDI. The nodes are assigned to the graph, and an audio unit of type synth is assigned to the synth node.

 var processingGraph: AUGraph?
var midisynthNode   = AUNode()
var ioNode          = AUNode()
var midisynthUnit: AudioUnit? 

Start with the graph:

 func initAudio() {
  // the graph is like a patch bay, where everything gets connected
  checkError(osstatus: NewAUGraph(&processingGraph))
} 

You will notice that the processingGraph parameter that you pass to NewAUGraph is always passed by reference (ie, with a leading & character) because it is of type UnsafeMutableRawPointer. UnsafeMutableRawPointers are used all over the place in Apple’s audio APIs, so get used to it. This also forces you to declare audio variables as optional and then unwrap them when you pass them as parameters, which is an anti-pattern in Swift.

Next we need an I/O node and a synth node:

 // Mark: - Audio Init Utility Methods
private func createIONode() {
  var cd = AudioComponentDescription(
    componentType: OSType(kAudioUnitType_Output),
    componentSubType: OSType(kAudioUnitSubType_RemoteIO),
    componentManufacturer: OSType(kAudioUnitManufacturer_Apple),
    componentFlags: 0,componentFlagsMask: 0)
  checkError(osstatus: AUGraphAddNode(processingGraph!, &cd, &ioNode))
}
private func createSynthNode() {
  var cd = AudioComponentDescription(
    componentType: OSType(kAudioUnitType_MusicDevice),
    componentSubType: OSType(kAudioUnitSubType_MIDISynth),
    componentManufacturer: OSType(kAudioUnitManufacturer_Apple),
    componentFlags: 0,componentFlagsMask: 0)
  checkError(osstatus: AUGraphAddNode(processingGraph!, &cd, &midisynthNode))
}

These component descriptions look gnarly, but the componentType and componentSubType fields are the only fields that vary. Some of the options are delays, distortion, filters, reverb, and more. Check out Apple's Effect Audio Unit Subtypes to see what’s available.

Now we get a reference to the synthesizer AudioUnit:

 func initAudio() {
  // the graph is like a patch bay, where everything gets connected
  checkError(osstatus: NewAUGraph(&processingGraph))
  createIONode()
  createSynthNode()
  checkError(osstatus: AUGraphOpen(processingGraph!))
  checkError(osstatus: AUGraphNodeInfo(processingGraph!, midisynthNode, nil, &midisynthUnit))
}

The AudioUnit called midisynthUnit is the workhorse of this app. This gets passed to MusicDeviceMIDIEvent to change voices, turn notes on, and turn notes off.

The synthNode and the ioNode must be connected to each other and the graph by calling AUGraphConnectNodeInput. Then you are free to call AUGraphInitialize and AUGraphStart:

 func initAudio() {
  // the graph is like a patch bay, where everything gets connected
  checkError(osstatus: NewAUGraph(&processingGraph))
  createIONode()
  createSynthNode()
  checkError(osstatus: AUGraphOpen(processingGraph!))
  checkError(osstatus: AUGraphNodeInfo(processingGraph!, midisynthNode, nil, &midisynthUnit))
  let synthOutputElement:AudioUnitElement = 0
  let ioUnitInputElement:AudioUnitElement = 0
  checkError(osstatus:
    AUGraphConnectNodeInput(processingGraph!, midisynthNode, synthOutputElement, ioNode, ioUnitInputElement))
  checkError(osstatus: AUGraphInitialize(processingGraph!))
  checkError(osstatus: AUGraphStart(processingGraph!))
}

Now the graph is populated, initialized, and started. It’s ready to receive some MIDI note commands.

Playing MIDI notes

These two methods send MIDI note-on and note-off commands to the synthesizer:

 func noteOn(note: UInt8) {
   let noteCommand = UInt32(0x90 | midiChannel)
   let base = note - 48
   let octaveAdjust = (UInt8(octave) * 12) + base
   let pitch = UInt32(octaveAdjust)
   checkError(osstatus: MusicDeviceMIDIEvent(self.midisynthUnit!,
      noteCommand, pitch, UInt32(self.midiVelocity), 0))
  }
func noteOff(note: UInt8) {
  let channel = UInt32(0)
  let noteCommand = UInt32(0x80 | channel)
  let base = note - 48
  let octaveAdjust = (UInt8(octave) * 12) + base
  let pitch = UInt32(octaveAdjust)
  checkError(osstatus: MusicDeviceMIDIEvent(self.midisynthUnit!,
    noteCommand, pitch, 0, 0))
}

MIDI note-on and note-off commands, and for that matter, all MIDI commands, share the same pattern of the command byte being constructed by a bitwise or of the upper four bits (called a nybble) with the lower four bits, which contain the channel to send the command to. For note-on, the upper nybble is 0x90. For note off, it’s 0x80. In this app, the lower channel nybble is always 0.

The piano keyboard, which is provided by AudioKit, shows two octaves at a time, defaulting to octaves 4 and 5. With 12 keys in an octave, the lowest note would be note 48. The piano keyboard knows nothing about the UISegmentedControl used for octave selection in Jamulator, so the octave is stripped out of the note the keyboard sends, and then the value from the octave control gets added back in, resulting in the final pitch.

Adding an AKKeyboardView

At the bottom of MIDIInstrumentViewController.swift, you will find an extension to the class. Extensions are a nice clean way to implement a protocol, and the AKKeyboardView class requires conformance to the AKKeyboardDelegate protocol:

 // MARK: - AKKeyboardDelegate
// the protocol for the piano keyboard needs methods to turn notes on and off
extension MIDIInstrumentViewController: AKKeyboardDelegate {
  func noteOn(note: MIDINoteNumber) {
    synth.noteOn(note: UInt8(note))
  }
  func noteOff(note: MIDINoteNumber) {
    synth.noteOff(note: UInt8(note))
  }
}

Now find the setUpPianoKeyboard method, which looks like this:

 func setUpPianoKeyboard() {
  let keyboard = AKKeyboardView(frame: ScreenUtils.resizeRect(
    rect: CGRect(x: 40, y: 0, width: 687, height: 150)))
  keyboard.delegate = self
  keyboard.polyphonicMode = true // allow more than one note at a time
  self.view.addSubview(keyboard)
}

There are optional keyboard properties (not shown above), but you really only need the delegate. polyphonicMode is needed if you want to play more than one note at a time. Playing chord is a good example of polyphony.

Loading a voice

Voices are also known as patches.

 func loadPatch(patchNo: Int) {
  let channel = UInt32(0)
  var enabled = UInt32(1)
  var disabled = UInt32(0)
  patch1 = UInt32(patchNo)
  checkError(osstatus: AudioUnitSetProperty(
    midisynthUnit!,
    AudioUnitPropertyID(kAUMIDISynthProperty_EnablePreload),
    AudioUnitScope(kAudioUnitScope_Global),
    0,
    &enabled,
    UInt32(MemoryLayout<UInt32>.size)))
  let programChangeCommand = UInt32(0xC0 | channel)
  checkError(osstatus: MusicDeviceMIDIEvent(midisynthUnit!,
    programChangeCommand, patch1, 0, 0))
  checkError(osstatus: AudioUnitSetProperty(
    midisynthUnit!,
    AudioUnitPropertyID(kAUMIDISynthProperty_EnablePreload),
    AudioUnitScope(kAudioUnitScope_Global),
    0,
    &amp;disabled,
    UInt32(MemoryLayout<UInt32>.size)))
  // the previous programChangeCommand just triggered a preload
  // this one actually changes to the new voice
  checkError(osstatus: MusicDeviceMIDIEvent(midisynthUnit!,
      programChangeCommand, patch1, 0, 0))
}

loadPatch is called whenever the user chooses a new voice. Note that in MIDI lingo, a program change command means to change voices.

Preload is enabled in order to load the new voice. After the program change command, which simply triggers the preload, is issued, you disable preload. Preload is a mode of the midiSynth, so you must turn it off again in order to actually change voices with the final program change command.

Now would you like some sounds to play with? Sounds good to me!

The sound font

I have to admit that calling a collection of musical instrument voices a sound font would never have occurred to me, but it really fits. And since there are 128 sounds, it’s almost like musical ASCII.

The sound font is loaded in the background, separately from the audio initialization:

 func loadSoundFont() {
  var bankURL = Bundle.main.url(forResource: "FluidR3 GM2-2",
    withExtension: "SF2")
  checkError(osstatus: AudioUnitSetProperty(midisynthUnit!,
    AudioUnitPropertyID(kMusicDeviceProperty_SoundBankURL),
    AudioUnitScope(kAudioUnitScope_Global),
                                            0,
                                            &amp;bankURL,
                                            UInt32(MemoryLayout<URL>.size)))
}

Here again, the midisynthUnit is referenced in order to set the sound bank URL property. There are many sound banks out on the web, so you might want to experiment with some of them.

Near the top of VoiceSelectorView.swift is an array of voice names for this sound bank, which conforms to the general MIDI 2 standard set of voices.

In MIDIInstrumentViewController.swift, the method loadVoices looks like this:

 // load the voices in a background thread
// on completion, tell the voice selector so it can display them
// again, might only matter in the simulator
func loadVoices() {
  DispatchQueue.global(qos: .background).async {
    self.synth.loadSoundFont()
    self.synth.loadPatch(patchNo: 0)
    DispatchQueue.main.async {
      // don't let the user choose a voice until they finish loading
      self.voiceSelectorView.setShowVoices(show: true)
      // don't let the user use the sequencer until the voices are loaded
      self.setUpSequencer()
    }
  }
}

DispatchQueue.global(qos: .background).async is the current preferred way to execute code in a background thread.

The main UI thread gets called back by DispatchQueue.main.async at the end to remove the voices loading message and replace it with the custom voice selector control, and also to initialize the sequencer which unhides the UISegmentedControl that allows you to record and play sequences. So, it was a small fib on my part to say this is the simplest possible MIDI app.

The sequencer is not part of the minimum feature set, but is included to illustrate some of the power of using MIDI. The code for sequencing is not explained in this post, but it shouldn’t be too hard to understand.

The 128 voices

The custom voice selection control shows 16 categories in its top half. When a category is selected, the eight voices in that category are shown in the bottom half of the control. There’s quite a variety of sounds here, ranging from pretty common instruments like piano, organ, and guitar, to exotic sounds like steel drum, sitar, and even sound effects like helicopter or gunshot.

One of my favorite things to do with Jamulator is to try out extremely high or low octave settings. The voices weren’t meant to be used that way and you can make some intriguing sounds that seem to have nothing to do with the instrument you’ve selected. Take some time to explore different voices.

Where To Go From Here

Want to experiment some more? Gene De Lisa’s software development blog has many interesting Swift/iOS audio articles, including The Great AVAudioUnitSampler workout.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.