Android Audio Encoding
A short while back, a friend made mention of how he noticed that Android places a hard limit on the audio bitrate in videos recorded from the device. I found it interesting that Android by default reduces the audio quality to a hard limit, so I thought I’d post about the issue and explore it a little further.
By the end of this article, I hope to have explained some basic signal terminology, as well as have shown you how to improve the quality of audio recorded from android devices. That said, what do I even mean when I say that Android has placed a “hard limit” on the audio bitrate?
What is audio bitrate?
To understand bitrates, it is first important to understand what signals are, or rather, what I’m referring to when I say signals. Strictly speaking, a signal can be defined as any function that describes the behaviour or attributes of a phenominon, but for the sake of this article, let us only consider audio signals for simplicity’s sake.
Two important categories to distinguish between when discussing audio signals, and signals in general, are analog audio signals, and digital audio signals. Analog signals are the more common of the two, and are what most people interact with everyday. Specifically, these are the types of signals your ears can hear, and are the type of signals present in the real world. When thinking about these in a purely mathematical context, they often appear continuous, similar to a sin-wave:
Analog signals, while interesting in their own right, don’t really mean much to us with regards to computers and software. This is because while analog signals are pretty to look at, their information content has an effectively infinite resolution. Saying it this way makes it sound difficult, but in reality, this just means that for any signal (such as the one above), we can choose any x such that for any function defined by the signal f(x) we can find a corresponding value y such that y=f(x).
Initially, this sounds great, and in many ways it is, as this means that analog signals can retain quite a bit of information. However, when discussing this in a digital context this becomes a double-edged sword, because storage limits, processing limits, and other factors affect how much information we can store with a computer. Even despite these problems, knowing that computers store information in terms of bits and bytes, we soon find it impossible to perfectly store analog signals, as we would need to store a corresponding x and y value for every possible value of x!!!
Digital signals, unlike their analog counterparts, are stored as discrete values within in a computer. This means that unlike the continuous analog signal shown above, only specific samples in the signal are saved, which solves the information storage problem that dealing with analog signals produce. This is often best described visually, so compare the following figure with that of the analog signal depicted in the figure above:
Voila! Since such a signal no longer has infinite resolution, we can store it in a computer without issue. The next problem is that we don’t know how many samples to take. I mean, we want to save as much of our hard drives as possible, right? Well, partially right. Remember, the process of picking samples, also known as sampling, is something that every computer (or in this case, Android device) needs to perform in order to take a signal from the real (analog) world and convert it into the digital world. There are many advanced theories regarding how we should perform the sampling process, many of which revolve around the ever famous Nyquist-Shannon sampling theorem, but these are far too advanced and in-depth for the purposes of this article. I strongly urge readers to dig deeper into these types of theorems if you wish to learn more about signals and signal-processing.
Recall that I said that computers need to perform the sampling process in order to save signals to their respective storage media. This is because computers cannot save an infinite number of values to describe the signal function f(x). But, can humans hear digital signals? It would seem strange, that if all “real world” signals are analog, that we would be able to hear digital signals. If we can hear them, they exist in the real (meatspace) world, and are by definition no longer digital signals, but rather continuous signals of short, individual pulses. Therefore, part of the problem with choosing an appropriate sampling rate revolves around using the sampled signal to reconstruct the original, analog signal. In the case of audio signals, we want to be able to reconstruct the original signal to that we can output them through speakers or headphones and listen to them, as in the case of music, recorded speeches, audiobooks, etc…
It seems obvious upon saying it, but this is really where we can start to think about signals in a more critical way. It might also seem obvious if I said that by increasing the number of samples we take, the easier it is to accurately reconstruct the original “recorded” signal. As a natural consequence of that, we know that by taking less samples, we cannot reconstruct our original signal as accurately. Take for example the figures posted above. If, without ever having seen the first figure, I had only plotted half the samples seen in the second figure, would you be able to tell that both of the figures were generated from the same mathematical function? Likely not.
A common way of classifying our sampling rate, or rather, how many bits of information we sample with respect to time, is called the bitrate. Bitrate is usually measured in bits-per-second (bit/s, bps) or kilobits-per-second (kbit/s, kbps). With regards to audio signals, higher bitrates equate to higher audio fidelity. Notice that this doesn’t directly tell us any information regarding the storage requirements of specific bitrates, because of that nasty unit of per-second. This is also why longer songs require more storage than shorter songs at equivalent bitrates. While bitrate does correlate to the sampling rate (also known as the sampling frequency) of a signal, it is very much time dependent. However, despite this, we can still make the assumption that higher bitrates are generally better, provided we have the storage on our device to save the signal (audio track).
I introduced this article by saying that Android places a hard limit on the audio bitrate when recording. As it turns out, Android does place a limit, which hopefully you can tell from the discussion above that this is not inherently a bad thing, provided we can still satisfactorily reconstruct our audio signal when we want to use it. However, in this case, a hard limit does make an interesting restriction when higher audio fidelity is desired. This is something enforced by Android at the system level, and while I am aware of the problem within Android itself, I am not familiar enough with the SDK to know whether apps can individually change this setting (my guess, though, is that they can’t).
Testing the bitrate
To test the audio bitrate in Android, we’re going to use a well known tool
available for GNU/Linux known as
avconv, which is a tool to perform
audio-video conversion. To get
avconv, we have to install the
libav-tools package (for Debian-based systems at least, check with your
respective distro). To install this, we can use the following command:
$ sudo aptitude install libav-tools
Alright. With that ready, all we need is a video to test it on. I took a sample video, which we’ll call sample.3gp (don’t be confused by the non-standard file extension, it seems this is the file format Android uses to encode videos.). I ran the following command:
$ avconv -i sample.3gp 2>&1 | grep Stream
Which gave the following output:
Stream #0.0(eng): Video: h264 (Baseline), yuv420p, 640x480, 1807 kb/s, PAR 65536:65536 DAR 4:3, 13.52 fps, 90k tbr, 90k tbn, 180k tbc Stream #0.1(eng): Audio: aac, 44100 Hz, mono, s16, 96 kb/s
See the end of line 2 above!? 96 kb/s. 96 kb/s!!! The number itself probably doesn’t mean much, but bear in mind that most decent audio streams over the net are typically 192 kb/s. Recording audio on modern (Jelly Bean 4.3) Android uses less than half the bitrate of a typical streaming service. Think about this for a moment: this is the same audio encoding setting used across most, if not all of Android’s services. Ever had problems getting an app like Shazam to work? This setting may be partially to blame, as low bitrates lead to poor reconstructions of the original signal, especially if there’s a lot of ambient noise in the room.
Of course, I wanted to see if this was something that was hardware dependent, or if it was something that was particular to the version of Android I had installed. I had done this testing on a 2012 model Nexus 7 (4.3 stock ROM), so I was interested to see if my Galaxy Nexus performed differently. Of course, I had similar results, and found that Android enforces a hard-limit of 96kb/s in Android 4.3. Naturally, I also had some older hardware lying around, namely my old Motorola Dext. Now, unfortunately, while I did retain a nandroid backup of the stock ROM using Android 1.5, I couldn’t for the life of me get it to work. The earliest version of Android I could manage to put on it was an old custom ROM running Android 2.1. I wanted to see if the bitrate (96 kb/s) had any historical significance, or if it was a more modern feature Google had added to recent Android ROMs. I ran the tests, and came back rather surprised. Here the video I recorded from my Motorola Dext is sample2.3gp:
$ avconv -i sample2.3gp 2>&1 | grep Stream Stream #0.0(eng): Video: mpeg4 (Simple Profile), yuv420p, 352x288 [PAR 1:1 DAR 11:9], 157 kb/s, 20.84 fps, 59.94 tbr, 1k tbn, 1k tbc Stream #0.1(eng): Audio: amrnb, 8000 Hz, 1 channels, flt, 12 kb/s
Sweet mother of… 12 kb/s!? To be honest, I was floored. Only masochists and devil-worshipers listen to audio at such a low bitrate. Suffice to say, it appears that Android has been slowly raising this (rather arbitrary and low) limit as time has gone on (likely coinciding with device specifications getting better).
So why does Android do this?
While I’m sure Google and OEMs in general could provide a plethora of reasons for the default bitrate to be hardcoded to such low values, I’m sure I can deconstruct a few of these reasons, at least with the context that we are talking about audio only. While the solution I’m going to show you can be applied to video streams as well, and while many of these reasons may be legitimate in certain use-cases, I would like to believe that this cap on audio encoding is not necessary anymore, especially now that it is 2013. Some reasons I could think of for why Google was enforcing this limit are as follows:
- Higher bitrates mean higher file sizes. Google intends to reduce the amount of storage taken up on the device from high bitrate encodings.
- As a corollary to the above, reducing the bitrate, or at least putting a cap on it, will help in reducing the amount of bandwidth necessary to transfer recorded audio and video streams.
- Lower-end Android devices are incapable of producing the processing power necessary to decode these streams later.
- They needed to set some max limit on encodings, at least with respect to the container they were encoding in (AAC). While they probably didn’t pick this arbitrarily, it likely came down to a lowest-common-denominator for hardware in a given generation.
Now, all of the above seem like valid reasons for why the audio bitrate might be capped at a certain level. In fact, a combination of some or all of these reasons probably played a part in it at some level. For the current discussion, however, I’m going to deconstruct the first three arguments with regards to audio with some fairly rudimentary and straightforward logic.
The first two arguments above, and indirectly the third, all revolve around the fact that as you increase the bitrate, you increase the size of your recorded signal saved to your phone / computer / tablet. Therefore, your file size is bigger, and subsequently, reconstructing the signal will take more computing power. One way you have probably seen this in real life is if you’ve ever tried to watch a 1080p video on an old netbook or phone. You notice that the video stutters and doesn’t play well at all, because the processor cannot decode and reconstruct the signal information fast enough for it to be displayed on-screen in real time. The same could be said here, but for audio; however, consider how much storage space a file will take at different bitrates. For the purposes of this test, let us assume our audio recording is 1 minute, or 60 seconds:
So, as seen above, the difference is approximately 9.2 MB when comparing a bitrate of 96 kb/s vs. a bitrate of 224 kb/s. Arguably, for those with a small data plan, this could be a serious issue, especially if long videos are being taken. In the case of technologies like Skype, this can amount to lots of data usage for long conversations. However, 9.2 MB is hardly much data in terms of storage, especially if you do not take a lot of videos very often, or you transfer your videos to other media (either through cloud services, transfer to your home computer, etc.). Moreover, if you have such a small data plan, sharing videos and audio constantly through your data plan is likely not the wisest idea. Though I can’t force your (the reader’s) opinion otherwise, such tasks would be much better suited to using wifi of some kind. Lastly, as I alluded to previously, 96 kb/s is not particularly high quality. If you’re the type of person who likes to record things, or makes multiple long recordings, surely you’ve noticed the sound gets choppy sometimes, which can be frustrating when trying to share videos with others. In such a case, the 9.2 MB difference in transfer and storage do not make a particularly strong argument for keeping the bitrate as low as it is.
I had likewise mentioned that there was little justification for reasoning that Google set the bitrate cap on devices to cater to those with weaker devices on the lower spectrum of the market. I don’t have any legitimate evidence for this, but I can satisfactorily playback videos well above the given limits on a specific version of Android (both my Motorola Dext, and my Galaxy Nexus) without running into errors. Given, the limits on the two devices were quite different, but nonetheless it seemed somewhat silly that the chosen caps were as low as they were (12 kb/s, no less). This testing also helped find the limits of upping the bitrate cap on Android, which both my friend who mentioned this problem in the first place, and myself found to be around 224 kb/s. In truth, Android will allow you to raise the bitrate higher than this, even as far as up to 288 kb/s, but once you raise it that high you can no longer play videos recorded at that rate with many common apps (Gallery and QuickPic both failed to play the higher bitrate audio, but I did not attempt to try and play with MX Player or any dedicated video apps). For this reason, my solution below shows how to raise the bitrate cap on audio encoding, but only to 224 kb/s, which is far better while still respecting some of the in-app and hardware limits of current devices.
Now that you have put up with me explaining the nitty-gritty, let me present the solution to you. Please be aware that in order to use this method to correct the Android audio bitrate cap, you’re going to need a rooted device. Why, you may ask? Because the file we’re going to correct is in the /system/ folder, which is protected by root (and SELinux, on Android 4.4 devices) by default. To start off, you’re going to need to prepare a couple of things:
- A computer that has the appropriate drivers for your device and Android DeBug interface installed (ADB). If you’re running GNU/Linux, you only really need to worry about using the appropriate ADB interface. Download the most recent one from here.
- A USB cable to connect your Android device to your computer.
- A basic text editor and some experience with the command line. Below is my
method to overwrite the file in question using
cat, which is a command available in every major GNU/Linux operating system, as well as within the ADB shell. However, you will still need to be somewhat comfortable entering commands into a little black box.
- You have enabled USB Debugging on your device. Where this checkbox is varies between versions of Android, but I’m going to be a little lazy here and assume if your device is rooted you’ve already figured this one out.
The method is simple: first plug your device into your computer, and start the ADB interface. Make sure that your device is detected by using the following command:
$ adb devices
Which should provide a similar (though not exact) output:
List of devices attached 015d165c5248280b device
NOTE: if it says “unauthorized” instead of “device” when running the previous command, you need to accept your computer as an allowed shell host from the screen on your Android device.
The next steps can be summed up in the following lines:
# adb shell $ su 0 # Again, needs to be a rooted device # setenforce 0 # this is for Android 4.4 only, see below # mount -o remount,rw /system
NOTE: it’s important to note that for versions of Android 4.4 and greater,
SELinux is enabled by default, so we need to disable it in order to remount
/system/ as rw (read and write).
setenforce 0 will do this for you, but keep
in mind we’ll want to either reboot the device (what I recommend below) or use
setenforce 1 to re-enable SELinux when we’re done.
From here, we want to edit the
/system/etc/media_profiles.xml file. First
off, lets pull a copy off of the terminal using
# cat /system/etc/media_profiles.xml
Then, paste the output text into your favourite text editor. Then, we want to edit pretty much any part that looks like the following:
Audio codec="aac" bitRate="96000" sampleRate="44100" channels="1"
For each section as above, change 96000 (in bits/s) to 224000 (in bits/s). There is often more than one of them, so be sure to find them all. Furthermore, make sure to be careful with the zeros, as the units for the values in this file are in terms of bits/s, not kb/s, as we’ve been previously using in this article. From there, copy the fixed text again, and run the following in your terminal (still within the ADB shell):
# cat > /system/etc/media_profiles.xml
After this, paste the text that you fixed previously, and hit ctrl+d in order
to exit the
cat program. If you’ve done everything correctly, the final step
should be to reboot your device. In this case, we can also apply the following
two lines of code before we quit the ADB shell, but we would also have to force
stop and restart each of our media apps in order for the setting to take
# mount -o remount,ro /system # setenforce 1
Awesome! While we didn’t hit our max (224 kb/s, remember, we were changing an upper bound on bitrate sampling) we surely got a better bitrate quality than seen previously (223 kb/s vs. 96 kb/s or even 12 kb/s). I can also confirm that if you update from the stock Nexus 4.3 ROM to a stock Nexus 4.4 ROM with an OTA update, this setting doesn’t get overwritten, so you should only have to do it once for the time being.
Hopefully, this article proved to shed some light on how audio encoding is done
in Android, and hopefully provided a useful solution to improve the audio
quality through the bitrate limit imposed in Android. I would also be
interested in hearing others’ experience with Android audio, and if possible,
some small test results as shown with
avconv above for Android 1.5 would
certainly be interesting to see.
One thing I only touched on a little bit in this article was the video bitrate. I would be remiss if I didn’t mention you could increase the bitrate for videos as well using the above method (and, correspondingly, edit the video bitrate sections instead of the audio ones). I didn’t bother with increasing the video bitrate limit on my devices, because the file-sizes grow much more dramatically than 9.2 MB / minute recording, and likewise begin to crash apps much faster than simply increasing the audio bitrate. On a final note, Nexus cameras tend to be quite poor quality in general, so I did not see much of a reason to mess with the video bitrate, as there will be a far less increase in quality by doing so for video encoding compared to the increases observed for audio streams.