During virtual lessons and live presentations, I constantly found myself missing something important the teacher/speaker would say, whether it was because of a distraction, or whether it was because I needed to leave my screen for a while (e.g. using the washroom). I didn't want to interrupt the class and ask the teacher to repeat what they said, nor did I want to browse through hours of recorded footage—if I was lucky to have a teacher who recorded their lessons—to find that one segment that I missed. It initially seemed as if those were the only solutions to the problem, but taking into account the principle of ingenuity, I asked myself: "what if you could capture a recording of the past?"
It initially seemed like an impossible idea; either you somehow knew beforehand the time you wanted to record, or you would record the whole lesson and cut out the part you wanted to keep afterwards. Of course, the former idea was inapplicable for most situations, but what about the latter? What if you didn’t need to record the whole lesson, but just the last, say, five minutes of it?
I tried searching for an existing solution, but the only similar idea I could find was something called background recording in the Xbox Game Bar. Unfortunately, it only worked on Windows (which made it useless to me since I use Linux) and it didn’t seem tailored to my use case. Thus, having left empty after scouring the internet, I decided to make my own.
Introducing Kapt, a combination of the words Kept + Capture.
Kapt is a screen recording tool that allows you to capture recordings from the past. It’s a desktop app that you install on your computer, which comes as both a window and as a desktop tray icon.
Note: In Kapt, a kapture is defined as a recording which ends at the time you make the capture (a term I made up).
In the Kapt window, you can take the following actions:
- Activating Kapt (which will turn on the background recording and enable you to make Kaptures)
- Change the audio capture device (usually you want it set to your computer audio)
- Change the maximum cached minutes (the maximum amount of minutes in recording chunks Kapt will keep cached)
- Change the videos path (the path where the Kaptures will be stored)
- Viewing the latest Kapture (which will appear when you make a Kapture)
Once you activate Kapt, you will see a box with 5 buttons, each with a number on it. The number represents the amount of seconds in history to start recording from (e.g. if you pressed the button 5, Kapt will make a Kapture that starts 5 seconds before you pressed the button). Once you make a Kapture, it will be saved to the videos folder you specified and you will also be able to view it within the app.
Kapt also provides the system tray, which is a more out-of-your-face way to make Kaptures. You can activate Kapt from the tray, and once activated, you also have the same options to make Kaptures by pressing the corresponding tray menu item. The tray allows you to conveniently make a Kapture whenever you need it, without having to have a window constantly open.
Kapt is able to keep a “continuous recording” by recording in chunks, which are small clips that get concatenated into one whole clip when the user requests a Kapture (the term I use for Kapt’s special kind of recording).
Kapt keeps 2 sets of recordings, each set containing an audio recording and a video recording. The recordings are represented by the bars in the diagram, with the start of the bar representing the start time of the recording, and the end time representing the end. Each set of recordings contain the recording chunks, which are brief recordings of about 5 seconds of length each. The space between those bars is intentional, as it takes time to stop the recording and to restart it. This space brings us to why the second set of recordings, the Secondary Video + Audio, is necessary. Their clips are used to fill in the gap so that the end video is smooth with minimally noticeable cuts.
You’ll also notice that the audio and video bars aren’t perfectly aligned, which is intentional. Kapt spawns two separate FFmpeg processes for recording audio and video (since I’ve found that recording both at the same time isn’t very reliable). Since it’s very hard to spawn two processes at the exact same time, Kapt needs to take into account the slight discrepancy between the time the two processes are spawned. You’ll also notice that the main set of recordings aren’t fully used; this is because it’s difficult to detect exactly when FFmpeg stops recording (in contrast to FFmpeg directly outputting the exact time when the recording starts). Thus, Kapt internally marks each recording as having an “early end time,” which is guaranteed to come before the actual end time of the recording (since it retrieves the “early end time” before stopping the recording processes).
When the user requests a Kapture, Kapt goes through the recording chunks and intelligently aligns the video and the audio. It then uses FFmpeg to cut out the part each chunk it needs to create a seamless video (refer to the above diagram of Kapt's workings). After it retrieves the segments, it uses FFmpeg to concatenate all those video into one video, which it saves to the user's disk.
The main benefit of recording in small chunks is that Kapt is able to discard chunks of video that came before a certain amount of time the user specifies (defaulting to 5 minutes). This makes sure that while Kapt is activated, it doesn't take up more than a constant small amount of space, which is important given the frequency and duration of virtual lessons and meetings.
I built Kapt using Tauri, an up-and-coming framework I’ve been interested in using for a while. Tauri enables me to create cross-platform applications that are both performant and light (in terms of storage and memory). It’s similar to Electron, but with some major differences that makes it stand out.
Firstly, Tauri uses a Rust-based backend instead of Node.js. I’ve been wanting to build an application in Rust for a while, and Kapt was a perfect opportunity for me to step outside my comfort zone and try using a language which I wasn’t too familiar with to build a project from scratch.
Secondly, Tauri doesn’t bundle Chromium with its apps. This allows it to have smaller bundle sizes, as well as becoming less memory-intensive than Electron apps. This was a major consideration when building Kapt, as many of us would be using Kapt alongside more demanding applications, like Zoom or Google Meet. Kapt should optimally be using as little system resources as it can, which prompted me to choose Tauri over Electron.
To build the frontend of Tauri, I used the JavaScript framework Vue. I have a lot of experience with Vue and how easy it is to create interactive applications using Vue. For the styling, I used Tailwind CSS, a utility-based CSS framework that makes it incredibly easy to style your application. I’m also using Vite, which automatically updates the frontend of my Tauri app almost instantly when I make changes to the code.
Developing Kapt turned out to be a lot more difficult than I initially expected. I encountered a lot of strange bugs with Tauri, especially with Vite and building for production. I eventually decided to scrap the idea of making production builds for the sake of the hackathon, since I wanted to make sure that I implemented the core functionality of Kapt that I would be able to demo.
Not only did the frontend give me issues, but also the Rust-based backend. I’m not too experienced with Rust, and I encountered a lot of errors as I fought with the borrow checker. I was able to persevere through most of these errors through trial-and-error, but it was a tedious process that left me pulling my hair out as I couldn’t figure out what was wrong with my code. In the end though, when my code compiled, the only bugs that were left were logic bugs, and everything else worked as intended. While you may find yourself struggling with the compiler, it ends up protecting you from a lot of otherwise nasty bugs that you would normally spend days figuring out.
Speaking of nasty bugs, the biggest challenge I came across while developing Kapt was the implementation of the recording chunk algorithm that I described earlier. There were a lot of factors I had to take into account, and I spent a lot of my time reasoning it out on paper and even writing algorithms out by hand! Despite the large amount of planning, I still ended up having to track down a multitude of bugs with the implementation, which included (but were not limited to):
- Not realizing that I’m only able to retrieve the FFmpeg process’ output once it exits (forcing me to refactor my code to use
async
, which was not very fun) - Figuring out why my code sometimes panicked with overflow errors as Rust doesn’t let unsigned integer types overflow (turns out I overlooked the fact that the audio and video processes start at slightly different times)
- Figuring out why the audio and video became off-sync in the final recording (it turns out that I overlooked the importance of padding my numbers with 0, as instead of .039, my code outputted .39 which was interpreted by FFmpeg as .390)
I’m extremely proud that I got the complex concatenation algorithm to work. It took a lot more thought than I initially expected, and at times I almost felt like giving up (especially when I couldn’t make progress on the audio and video being off-sync for many hours). Luckily, I kept persevering, and it was only when I “brute-forced” my way through my code (by replicating a lot of the commands it was executing) did I discover the elusive bug.
I’m also very proud that I was able to build a working application in Rust. The last few times I tried using Rust, it gave me so much trouble and presented me with compilation errors that sometimes took days to fix. I wasn’t very confident in using Rust for EngHack due to the relatively short amount of time I would get to finish my project, but I’m so happy I pushed my doubts aside. I think Rust is a very promising language, even if it does have a higher-than-usual learning curve, and I’m very glad I pushed through the errors that helped me to better understand what exactly my code was doing.
I’m also proud of deciding to build Kapt using Tauri, a framework which I’ve never used before but have been interested in using. Normally, I would’ve used Electron for these types of apps, but I really wanted to seize the opportunity to learn a new technology at EngHack. I’m very glad I chose to step outside my comfort zone, because I found Tauri a pleasure to develop with, and I’m definitely planning on using it for future projects.
Due to the time constraints of the hackathon, I only designed Kapt to work on my local linux machine. Even though Tauri is cross-platform, some of the commands I used within Kapt (such as retrieving audio devices) won’t work on Windows or MacOS. Thus, I’ll need to add the alternative commands that would be compatible with Windows and/or MacOS (which will also require me to test Kapt on them, something that I didn’t have enough time to do during EngHack). I’m also going to make Kapt more configurable by allowing the user to select the screen or window they want to record, and even a specific area of their screen in the future. Kapt is a project that I’ve been longing to build for a while, and now that EngHack has helped me kick start its development, it’s definitely going to be an application that will last beyond EngHack and hopefully help people looking for a similar tool to apply in real-world use cases.