Switching the camera on an iPhone while recording

Not too long ago I was asked whether it was possible to continuously record a video while switching your camera from front to back on an iPhone. Neither the standard camera app or the standard camera control would allow this behaviour, but I suspected we may be able to do so with a custom solution.

As a proof-of-concept, my approach was that whenever the user wants to switch, I would stop recording on the one camera, and start recording on the other. I would then stitch the videos together once the recording is complete. It’s not the most elegant solution: the switch from one camera to another leaves a small gap, and the final stitching step takes a while to process. Still, it was good enough for my use case, so let’s explore it in more detail.

This sample app would have two main components, one for recording and one for stitching. The recording was probably the easiest one. If we can record from the front camera and we can record from the back camera, we can record alternating from one to the other. This meant setting up two different AVCaptureSessions that would only differ on the AVCaptureDevice.Position (.back or .front). Then manage the state of the screen and set the output to a temporary folder, time-stamping the videos so we could later easily list them correctly sorted.

Then there was the stitching or, more technically put, the editing. Video editing in iOS is really powerful, but it’s easy to get lost in all its power and breadth of configurations. The main entry point is the AVMutableComposition, where you can add as many tracks (audio or video) as you want, and over those tracks you can later add AVAssets, which can be audio or video files and configure them with volume, size and transformations (such as scaling, rotation, etc).

Let’s see how adding the two main tracks would look like:

let composition = AVMutableComposition()
let defaultTrackID = Int32(kCMPersistentTrackID_Invalid)

guard let videoCompositionTrack = composition.addMutableTrack(withMediaType: .video, preferredTrackID: defaultTrackID),
        let audioCompositionTrack = composition.addMutableTrack(withMediaType: .audio, preferredTrackID: defaultTrackID) else {

    // Properly handle the error...

    return
}

For this exercise, the configuration was pretty simple. Iterate over all the files in the temporary folder and add, in order, the audio and video assets to the audio and video tracks, then configure minimum size (as front and back cameras have different resolutions) and rotate video to keep it in portrait mode.

var nextStartTime = kCMTimeZero
var size = CGSize.zero
var instructions = [AVVideoCompositionInstructionProtocol]()

files.map { AVAsset(url: $0) }.forEach {

    guard let videoAssetTrack = $0.tracks(withMediaType: .video).first,
            let audioAssetTrack = $0.tracks(withMediaType: .audio).first else {

        // Properly handle the error...

        return
    }

    let timeRange = CMTimeRangeMake(kCMTimeZero, videoAssetTrack.timeRange.duration)

    guard let _ = try? videoCompositionTrack.insertTimeRange(timeRange, of: videoAssetTrack, at: nextStartTime),
            let _ = try? audioCompositionTrack.insertTimeRange(timeRange, of: audioAssetTrack, at: nextStartTime) else {

        // Properly handle the error...

        return
    }

    let videoCompositionInstruction = AVMutableVideoCompositionInstruction()
    videoCompositionInstruction.timeRange = CMTimeRangeMake(nextStartTime, videoAssetTrack.timeRange.duration);
    let layerInstruction = AVMutableVideoCompositionLayerInstruction(assetTrack: videoCompositionTrack)
    layerInstruction.setTransform(videoAssetTrack.preferredTransform, at: nextStartTime)
    videoCompositionInstruction.layerInstructions = [layerInstruction]

    instructions.append(videoCompositionInstruction)

    nextStartTime = CMTimeAdd(nextStartTime, videoAssetTrack.timeRange.duration)

    if size == CGSize.zero {

        size = videoAssetTrack.naturalSize
    }
}

Now that we’ve configured all output tracks with their instructions, let’s export them into a video file using a AVAssetExportSession, with some additional configuration:

let mutableVideoComposition = AVMutableVideoComposition()
mutableVideoComposition.instructions = instructions
mutableVideoComposition.frameDuration = CMTimeMake(1, 30)

// Since we rotated the video, we need to rotate the size.
mutableVideoComposition.renderSize = CGSize(width: size.height, height: size.width)

guard let exporter = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetHighestQuality) else {

    // Properly handle the error...

    return
}

guard let outputURL = self.folderPath?.appendingPathComponent("MergedVideo.mov") else {

    // Properly handle the error...

    return
}

exporter.outputURL = outputURL
exporter.outputFileType = .mov
exporter.shouldOptimizeForNetworkUse = true
exporter.videoComposition = mutableVideoComposition

…and, finally: export, and it’s ready to play!

exporter.exportAsynchronously { [weak self] in

    if let error = exporter.error {

        // Properly handle the error...

    } else {

        let player = AVPlayer(url: outputURL)
        let playerController = AVPlayerViewController()
        playerController.player = player

        self?.present(playerController, animated: true, completion: nil)
        player.play()
    }
}

While this approach demonstrates that it is doable to continue recording in a same session just by tapping the flip button, it has the downside of having gaps during the camera switch. In my tests on an iPhone 6 the gap was about 1,5 seconds long. This may or may not be adequate for your use case.

Alternative approaches

There are other, more elegant, approaches we could test before committing to building a full-blown solution.

Keep two AVCaptureSessions alive at all time. Simply show or record from the appropriate one. While this should greatly reduce the gap, it would surely increase battery consumption, which would be unadvisable for prolonged recording sessions.
Add and remove front and back camera’s AVCaptureInput from a single AVCaptureSession. This approach may seem like the most straightforward and the API allows it, but I failed to find an example that would do this. Instead, most examples would create a new session from switching cameras, I suspect there’s a good reason for that.
Have a third, continuous AVCaptureSession for the audio, which doesn’t flip or switch or changes input device. This wouldn’t prevent the video gap, but would give some sense of real continuity by providing real constant audio. Then, the hard part would be to find a way in which audio and video are synchronized, but since the sounds from the three sources (front, back and continuous audio) would be identical, it may be feasible to do it without any user interaction.
Instead of stoping the active camera right away, watch for the KVO compliant isRunning property on the newly created AVCaptureSession. Having that handy, keep the old session alive until the new one is up and running. This won’t fully remove the gap and it may even introduce some overlap. We’d need more research to back that claim.

While we don’t seem able to avoid completely the video gap we can mitigate it or make it pass inadvertedly by adding a transition (like blurring, fading, etc) to fill those voids.

I look forward to get deeper into this issue since it’s clearly something that’s missing on the iOS toolset and that could be componentized for everyone’s benefit.