Fun with Sound 


Saturday, February 21, 1999 - MPEG Sound Performance (and other Codecs)

Part One - the Project... 

It all started so innocently ...


Inspired by own article about how easy it would be to transfer certain old favorite bits of analog (LP) music to digital format via VHS HI-FI video tape, I actually went and did it.


It took about three hours to go through all my records and choose one or two (or three) cuts from each one and record it onto video tape. I ended up with a little under two hours of music.


Then, due to the recent miracle of the dirt-cheap 10 gigabyte hard drive, I just ran Cool Edit 96 and read the entire two hours of tape into my computer.


Some slicing and dicing, and presto! I had two hours of formerly analog music to go with the three hours of CD music I'd ripped to make my favorites.


But that's when things started to turn dark ...


First thing I learned is that moving 2.8 gigs around is considerably more than twice as much work as moving 1.4 gigs around.


The reason I had to move all the data around was because after I had burned my first three hours of audio onto CD, I went into Cool Edit 96 and started looking at the actual bits. I was motivated to do this after listening to my three CDs of favorites in the car: some of the cuts were really quiet! Even though I have a pretty quiet car there is still road noise and I didn't like that I had to adjust the volume for the quiet cuts.


So I went into Cool Edit 96 and programmed it to normalize all the cuts - this means that the loudest sound on each cut is at 100% full volume.


It turns out that many CDs are conservatively mastered and only use about 12 bits of information instead of much ballyhooed 16 bits.


If you look at the data in Cool Edit 96 (or some other program) you can see it. Here's a waveform from a classical piano piece:



That piece is only using about 30% of the available dynamic range. After normalizing the data extends all the way to the top and bottom as it should.


This doesn't add any new information to the track, of course, but after normalizing ALL of my favorites, the playback volume was much more uniform.


Even so, I still had classical music cuts that were too quiet within the piece, so I went in by hand and artificially boosted the volume in places.


For instance, here is the waveform for 'O Fortuna' from the Carmina Burana by Carl Orff (the most licensed piece of 20th century music in the world - you hear it in movie trailers a lot):



And here it is after I altered it:



Now I can hear the middle part in the car without turning the volume way up.


The next problem I had was that my mastering software Gear 95 has a bug where it sometimes puts a click at the end of a track when it copies it from a .wav file. I've had this problem for years and didn't know what to do, except to convert all my .wav files to .pcm files (which are raw 16-bit stereo audio data), because Gear doesn't introduce the click on those files.


So that was another 2.8 gigs worth.


Actually, I think I've left some steps out. Let's see:


First, record 1.2 gig file from VHS Hi-Fi tape.


Next, slice and dice it into individual files. Of course, I don't want to erase the original until I'm done, so that's another 1.2 gigs.


Then, normalize all 2.8 gigs (digital originals and analog originals). That's another 2.8 gigs. When I'm happy everything is okay, then I can delete the original captures and the sliced and diced, un-normalized originals.


Next, convert them all to PCM format - another 2.8 gigs. Keep the .wav files because I need those to feed to the .mp3 converter.


Next, create 'cd image' files in Gear, which are byte-for-byte images of the way the CD will be burned. This has the benefit increasing the odds that the CD will burn perfectly. BTW, I can get 4x burns now that my machine has a TNT and a 350 MHz processor - as long as I make CD images. Otherwise, the slightest hiccup from the network or another process is likely to wreck the burn.


So that's another 2.8 gigs.


Once I have actual CDs burned, I pack 'em into my CD-changer cartridge-thing and listen to them. The sixth CD, BTW, is 45 minutes of Above the Garage Productions music.


Amazingly, one of the tracks had an error! I went back and listened to the .wav files, and in fact, there error was there! I think it happened during the normalization stage. It was easily fixed by re-ripping the original track. It was at a decent volume so I left it alone. I reburned the CD and all was well except for those tracks that internally had passages that were too quiet.


I went back and manually adjusted the too-quiet passages to be louder. (At this point, purists are screaming foul, since I'm changing the data. Too bad.)


So, the CDs are okay (subject to future tweaking) and everything's great. The moral of the story (so far) is that 2.8 gigs is more than twice as much work as 1.6 gigs to move around and process. Keep this in mind as you contemplate the joys of DVD.


But the real journey into darkness started after I was happy with the .wav files. I was already a convert to MP3 format, except for two things:


1) The people that hold the patent (Fraunhofer) like to charge for commercial use (this doesn't affect my own fooling around, but could potentially affect posts to my web site). After exploring this a bit, it looks like posting on the web site is okay. But one thing I've noticed is that now that the Miles Sound System supports MP3 playback, everyone thinks they can mix in their music this way... Beware! You might owe $0.05 per track per CD you ship! More details are at


2) The files would not play on my laptop! This truly annoyed me. And this started my trip into the depths of compressed file formats. The answer was eventually really simple - but the trip getting there was convoluted.

Part Two: the depths of decoding ...

My laptop is pretty old (actually I have two - one is REALLY old - a 486 50 dx2 which means it has a whopping 25 Mhz bus inside). But I was concerned about the better one. It's a P75 machine with mostly ISA bus-based stuff inside [probably also running at 25 MHz] (I think it has VLB a graphics adapter at least).


The sound card reports that it can play 16-bit samples, but I think it is lying, since any attempt at playing 16-bit samples results in severe hiccups.


Okay, so what? I have an old laptop that has drivers that claim they can output 16-bit audio but they can't.


Well, I really wanted this laptop to play my MP3 files. But it wouldn't work. But amazingly, it would play MP2 files just fine.


How could this be? So I tried compressing all 2.8 gigs as Microsoft ADPCM files, which take about 4-bits per sample. Apparently they decode to 16-bit audio (well, duh), and so they wouldn't play.


What about just play the raw 44K original data? Wouldn't play.


So what was special about MP2? Why would they play?


(This is all through the Microsoft Windows Media Player, BTW.)




This simple boring problem took up a huge amount of my time. I really wanted to figure it out.


Also - what does Part One of this story have to do with Part Two?


Well, just as in Part One I ended up making lots of duplicate or near-duplicate copies of my 2.8 gigs of source material, I did the same here in Part Two.


I recompressed all 2.8 gigs into the following formats:


MP3 (128 k bits / sec)
MP2 (320 k bits / sec)
MS ADPCM (353 k bits / sec)
22K 8-bit stereo .wav (353 k bits / sec)


Interestingly, the 22K 8-bit stereo .wav files played just fine on my laptop - which is what gave me the clue that the problem was when my laptop tried to play 16-bit files.


Just for reference, note that the original 44K 16-bit stereo samples play at 1,411 k bits / sec.


So the relative compression is:


22K 8-bit stereo:

11.0 : 1
4.4 : 1
4.0 : 1
4.0 : 1


That MP3 stuff is looking pretty hot. My whole shebang of 2.8 gigs reduces down to a mere 260 megabytes (or so).


I was going to forget the whole thing and just keep the MP2 files around, but they took up something like 640 megs, which seemed like a major waste just so I could listen on one machine.


As part of the whole process I did a lot of listening tests - since I had boatloads of data in a form most interesting to me - my favorite music.


The ADPCM stuff is awesome and sounds just like the original.


The MP2 stuff - at 320 k bits / sec - sounds just like the MP3 stuff except for one song, the Dire Straits "Money for Nothing", which warbles like a bad analog cassette tape. Very strange. And I don't trust MP2 at anything less than 320 k bits / sec.


Perhaps you are wondering why there is such a thing as MP1, MP2, and MP3? Why not just use the best?


The reason is revealed if we set the 'way-back' machine to about eight years ago when CD-I and MPEG-1 were getting started.


The CD-I machine had an NS 32016 (I think) in it, which is a fairly mundane processor. And it had to decode all this stuff.


So, MPEG-1, Layer I, is very easy to decode on integer hardware, even though it takes more bits to encode the signal.


Likewise, MPEG-1, Layer II, is pretty easy to decode on simple hardware, but harder than Layer I.


MPEG-1, Layer III, which I suspect has never ever been used to encode audio along with video, takes the most processing power.


So, we're all sitting around with these Pentium machines on our desktops connected to the Internet, so we want maximum compression, because we have plenty of horsepower to decode the stream.


My concern with my laptop was that it simply didn't have the horsepower to decode Layer III. But that didn't really make sense, because the amount of system time used up while playing MP3 files in the background isn't so severe your typical Pentium 130 or even 90 grinds to a halt. So why wouldn't my MP3 files play?


I finally found the answer by clicking around inside the Windows Media Player. When you load up a tune, you can click on properties, and dig through some information about the decoder that is installed for that particular media type.


In this case, I was clicking around, looking at the options for the MP2 decoder when I found the answer.


The MP2 decoder does a better job of matching itself to the hardware it's running on. And if it screws up, you can set some defaults, such as forcing it to output in 8-bits. Voila!


So I poked around the Windows Media Player MP3 decoder (by loading an .mp3 file and selecting properties) and I discovered you can tell the MP3 decoder to go down to 8-bits. You can also tell it to decode less of the frequency spectrum. It turns out I had to do both things to get the MP3 files to play on my laptop. I still doubt the P75 is the real limitation, but by selecting less decoding I probably managed to get all the data to fit on the crappy ISA bus in the machine.


The problem is ... the Microsoft MP3 decoder doesn't remember any changes you tell it, and it resets to the (in my case) incorrect defaults every time you load a tune.


That basically sucks.


But there was a ray of hope ...


WinAmp! Maybe WinAmp would do a better job of matching itself to my hardware, or at least let me configure it better!


I downloaded the latest version 2.091 and found enough configuration options to make me and my P75 very happy.


I didn't like WinAmp before because, at least in earlier versions, it hiccuped a lot more than the Windows Media Player on MP3 files, even on a Pentium II - 400, which irritated me no end.


But I was happy to be able to play files on my P75 laptop. I am going to send the WinAmp people their $10.00 to register WinAmp.


And ... back to Part One ... I discovered why Gear 95 would add a click to my CD tracks when it used a Windows .wav file as source.


Apparently Cool Edit 96 obeys the .wav RIFF format specification, which allows for the inclusion of extra data in .wav files, like copyright and author information.


Gear 95, being stupid, could only load what I will call 'simplified .wav files', which consist of a small header and then raw sample data. Gear didn't use the MMIO routines to properly parse a .wav file.


Luckily, Cool Edit 96 has an option to NOT save the 'extra non-audio' information in a .wav file.


So now, praise the Lord, I don't require copies of my files in .pcm format. I have .wav files which contain the original 44K 16-bit stereo data and I have .mp3 files which contain the compressed versions I can haul around on a single CD-ROM. And yes, even listen to them on the laptop.


And I can delete the CD Image files required for Gear once I've burned my discs, so my disk requirements have gone down from about 8 gigs down to about 3 gigs. Yahoo!

Postmortem ...

I was standing in line at a movie theatre and some morons from Microsoft were blathering away. One moron said to the other, "Wav format is just a 16-word header followed by the wave data ... you don't need to know anything else." [If only it were true.]


Great - that's just what we need. Microsoft morons spreading mis-information to each other.


While waiting in line at a movie theatre, no less.

Postmortem 2 ...

There are some things I forgot to mention.


One thing is this: If you don't want to use MP3 compression because you are afraid of Fraunhofer and their patent enforcement or you want something that is less of a hit on the CPU (either because you want to support a wider range of CPUs or you want to use the CPU for something other than decoding sound), what else should you use?


The reasonable choices seem to be MP2, MP1, ADPCM, or old-fashioned 8-bit audio.


I can tell you the answer right now: use Microsoft ADPCM if you want 16-bit audio and old-fashioned 8-bit audio if you want the widest range of compatibility.


MS ADPCM sounds great, it decodes to 16-bits and is virtually indistinguishable from the original and it's free.


You don't want to use MP2 because the bit-rate has to be way too high to get reliable results, and likewise for MP1. Old fashioned 8-bit audio, which I am in fact listening to right now on an airplane on my way to Las Vegas, is great stuff if you're clever - which in this means simply using Cool Edit 96 to reduce your 44K originals. (Be sure to normalize your audio first!)


Back when I was trying to get my audio to play on my P75 laptop I experimented with reducing my 2.8 gigs of favorites down to 22K-8bit stereo. It plays back fine everywhere. The only trouble is it's a little noisy - which generally seems to be a feature of the sound hardware, since you can playback complete silence and still get noise.


The downside was that it is pretty big. Just for grins, I ran another pass over all 2.8 gigs and this time I reduced it down to 11K-8bit mono, which is what I'm listening to on the airplane. It's playing back on my ancient 486/50 laptop and it sounds fine - probably because I'm listening to it on an airplane.


If you think you can't really make 8-bit audio work at something as miserable as 11K, then think again. MDK shipped with 11K audio. It didn't seem to hurt sales.


Next Sound Article

Back to 'Random Blts' Table of Contents

Back to Above the Garage Productions