JCBS MP3 encoding basics Tutorial
    MP3 encoding basics   
  by Christophe Fantoni (05/2002)


First of all, please note that MP3 format has got nothing to do with the standard MPEG-3; this one only specifies coding audio with standard feature MPEG .

MP3 for MPEG-1 Layer 3 is in fact integrated in the format video MPEG-1. In this matter, we need to know that there are three existing undercoat of coding for the MPEG-1 format: Layer 1, Layer 2, and Layer 3. (Same thing for MPEG-2).
MP3 is in fact the third undercoat (layer 3).

It concretizes the size and quality of the most performing audio encoding in MPEG-1, even so the three undercoats keeps between them a downward compatibility.

Team work MPEG

The sub-committee MPEG is under the ISO (International Standards Organisation) organisations and the IEC (International Electrotechnical Commission) and his work consists to create standard numerical compression, either audio or video. These standards are written down on paper. But in order to put them in application, we need to create the adequate software, because no algorithm is defined by the teamwork MPEG. Only the sub-committees provide methods to test the encoding and decoding of compressed data with the help of the Standard set. Bear in mind that the Team Work MPEG publishes regularly Technical reports with results of their last research.

FhG: The pioneers of MP3 encoding

In 1987, the German Institute Fraunhofer (FhG) start the new creation of what will become later on the MP3 format, the way we know it today. A certificate of achievement will even crown those first positive results obtained for their algorithm in audio compression. The first easy to gain access software were born 3 years ago (thank you Windows) even so I remember encoding my first MP3’s under command line mode (DOS) in 1997. Those were the days…
Please note that the German Institute Fraunhofer (FhG) is a research institute belonging to Thomson Multimedia. The certificate registered is European…

Advantages of MP3 compression

In going more in depth, the MP3 audio compression enable us to stock 12 hours of music non stop in CD quality, on a simple CD-R of 74 minutes (650MB). With 80 minutes (700MB) we can go up to 14 hours. Most impressive when we know that a CD audio can only hold 74, even 80 minutes of music. With MP3, the compression ratio is in general 10:1, even 12:1. MP3 is probably the most known around compression format. It’s even he who is no 1. Files encoded in MP3 can also be read on PC, Mac, Linux, BeOS, Amiga or PocketPC. This format also represents the excellent ways of distribution over the Internet.

Remark: The only weakness of MP3 resides in the fact that today this one will require a specific licence for commercial purposes for the format (artists who sells his music in MP3), or the creation of a player or encoder/decoder. However if the MP3 is use for personal use it is totally free of charge and use.
The only exception resides probably in MP3 encoders free, based on sources ISO, like encoders under the name of Lame, Gogo or Blade. Encoders tolerated by Thomson Multimedia as well by the German institute Fraunhofer, but born originally in MP3 format.

Let’s now look at the technical aspect of MP3.

Technical specifications

The compression MPEG-1 Layer III is based on a encoding called perceptual. This is based on a human research perception, which means that the encoder decides if the relevant information must be kept or withdrawn. In order to do this, these are the techniques used:

The threshold of the human hearing

The minimum threshold of the human audition is not shelf space. Under the Fletcher and Munsen law (open your encyclopaedia to find out more), the minimum threshold is represented by a curb that shows us a gap between 2 KHZ and 5 KHZ.
There is no need to code those sounds under this path because the human average ear will not hear it. Concretely, frequencies of over 2000 Hz (2 KHz) will have to be skipped by the MP3 encoder. To give you an idea a passing band is held between 500Hz and 2 KHz. However, regarding the human hear, please note that it is capable to detect sounds where frequencies are between 100Hz and +/- 16 KHz…

The Masking Effect

When playing stronger sounds, you never hear the weakest sounds: This is called the Masking Effect. Here too, it is not necessary to code all sounds. For this to be done, the MP3 encoder employs a psychoacoustic model based on the behaviour of the human ear.
This psychoacoustic model analyses signal entries on a few consecutive blocks and determine in each blocks the spectrum signal. He then modelises the properties of the masking in proportion to our hearing system and gives an estimate of the minimal audible rate.

The psychoacoustic model

Two psychoacoustic models are available today. The 1st model is a simple sound generator of mask tonal. This one is recommended when sound is not too complex.
The model 2 is a bit more sophisticated in his own way to mask sounds. Most often this is the recommended default option, because he gives good results even in low connections. To conclude, please note that few MPEG encoders accept three parameters. Don’t be surprised…

Huffman Coding

At the end of the compression, the information is coded using the Huffman algorithm. This coding creates codes of various lengths in a round number of bits. The Huffman codes uses a unique prefix, they can be decoded correctly despite of their various length.
This type of encoding enables us to save at least 20% of disk space. Still today, he is used in a number of algorithm compressions: The Format ZIP uses this algorithm and also many graphics format: JPEG (standard version) or TIFF (option)…

Spatial original sound

Under a given frequency, the human ear is not capable to determine the original spatial of a sound. The MP3 format exploits this particularity thanks to a technical term called joint stereo.
Technically, joint stereo records low frequencies like monophonic signals, signals which comes with added information to be able to reconstitute an appearance of spatialization.In fact this technique implies really a light loss of the stereophonic and must only be used in low connections. (Inferior or equal to 128Kbps).
The use of joint stereo can be really interesting when you re-encode (at 64kbps for example) of your MP3 to which the data is transferred to your player.
Why? Simple, due to the lack of memory put at their disposal. However, in most cases I invite you to use only the stereo mode… in the objective of pure quality obviously.

Flag

For each MP3, three flags may be activated: copyright, original and private. They are only there for information purposes. There is also a fourth one, but lets study it for a while: CRC. A part from Audio data, each tram of an MP3 can contain a CRC code, or Code of cyclic redundancy.
This information permits us to correct errors and also to verify the integrity of the file. If your encoder has this option, feel free to use this option, it is most useful. To put you in the right direction, please note that the shareware software Audio Grabber exploits this to perfection. But this is not the only good software obviously…

Channels: mono and stereo.

With the notions of high debit and transfer rate (which we will point a little further) comes equally the add on of channels. Present channel may be monophonic (one path), stereophonic (two paths – this is the most frequent case) or in joint stereo.
The joint stereo is most used to improve the quality of the files encoded with the help of weak transfer rates ( under 96kbps) This mode permits us de improve the quality of the final phase of the file and mostly used with MP3 players. Thus, limited by their memory size holds mostly pieces of music encoded in 64 kbps joint stereo… This is not always the case and it isn’t rare to see pieces of music encoded in 128 kbps using joint stereo.
We have already had the occasion to see in the previous chapter the mode joint stereo. (In depths). We need to know that this mode includes equally MS stereo mode and MS/IS Stereo. Depending which software you use for the encoding, the name of the mode joint stereo can vary. Here is a complete retrospective of all types of channels that you could encounter.

1. Mono: Only one channel exists, most often simultaneously playing on both speakers right or left whatever installation you may have.

2. Dual Channels or dual mono: The channels right and left are treated in an independent way. For example, you can use this mode if in your audio file you wish to insert two languages: French and English. Each of those languages will be monophonic but this is a particularity of MPEG encoding, not well know by the public, this will enable you to easily create multilingual VideoCD.

3. Stereo: The right and left channels are treated simultaneously without the encoder giving a conflict between each other. Despite what the public believes, all phase information are well kept. This information can be used by a Dolby surround matrices or Dolby prologic. However, this mode negotiate the demand of bits of 2 channels, to give - for example – a channel more space if the other one is silent…

4. MS Stereo: In Mid-Size Stereo, the encoder make a correlation between the right channel and left. This correlation has for effect de boot up the quality of the compression. There too, this mode does not destroy information used by the formats Dolby Surround and Dolby Prologic.
For sources nearly-mono, this means when a monophonic source plays on the right and left channels, this gives a gain of quality not refutable, with look alike Stereophonic spatialization.

5. MS/IS Stereo: In this mode mixing the mid-size stereo (MS) and Intensity stereo (IS), the encoder decompose High frequencies in monophonic signal to which we add on a directional signal. This mode especially permits us to have low connection encoding of good quality. Please note however that during the phase information is a little lost in a MS/IS stereo.

That is it for the characteristics of the MP3 compression. Lets concentrate on how choosing a transfer rate. It can be constant (CBR) or Variable (VBR). Let me explain.

CBR (Constant Bit Rate), or Constant Rate

At this time, the CBR is a universal and reliability source. His transfer rate can go from 32 Kbps to 320 Kbps in the utmost condition. However this will vary depending the sample’s frequency that you choose. Encoders the most powerful will be able to encode from 8 KHz…
In CBR, you must indicate a constant transfer rate (a kind of maximal rate), to express the quality that you wish to give to your final phase of your file. More the transfer rate is high, better the quality of the final restitution of your MP3, but your file will take more space on your hard disk.
Generally, we do not go down under 96 Kbps, because under this rate the encoding will become inaudible, even so bad quality. (Except in a case of a Web radio). To this stage, all encoders do not react the same way. For example, the encoder Blade finds itself completely lost in low connections however the Lame encoder, does here, again, miracles!

VBR (Variable Bit Rate) or Variable Rate

VBR must be seen from a more quality point of view. Here as well his transfer rate can vary between 32 Kbps to 320 Kbps. (same thing as said previously).
Opposite to CBR, you must indicate in VBR, not a fix rate, but a range (maximal rate and minimum rate) in which the encoder will take baring in mind the sample to encode. The more the range, better quality will come from your file, and most of all you will have the chance to have a small file.
Be careful however, it is not rare to see a file encoded in VBR a lot bigger than if it had been encoded in CBR. Generally, the ideal range starts at 160Kbps and finishes at 256 Kbps in the context of a VBR encoding of quality. We will come back to it.

Which rate must we choose?

In CBR and at 128 Kbps, the quality of the restitution of a MP3 is very close from a CD audio and enables us to stock about 12 hours of music non-stop on a simple CD-R of 700 MB. This rate represents today the standard distribution adopted by all. However, this is not the ideal rate…

In VBR, it is very difficult to recommend an ideal transfer rate. I have a few friends audiophile who achieves VBR encoding at 100%, which is ranged between 32 Kbps and 320 Kbps.
After testing personally, I have myself adopted these parameters due to the final result which will give a much better quality. Nevertheless we will always be under to see what can bring the lossless encoding in Monkey’s Audio.

Which motor compression must I choose?

Three sorts of MP3 exists today. The officials based upon the compressor created by the German Institute Fraunhofer-Geshellschaft IIS ( equally called FhG or IIS), those based on the encoding motor of Xing, belonging today to the American company Real Network, and those based on ISO specifications built by the Forum MPEG. The first two are for commercial purposes, because they both require a licence to be able to use them commercially or not. On the other hand ISO are totally free of charge and use.

The encoders Audioactive Production Studio, MP3 Producer, even more plug-ins exportation such as software called Sound Forge or Cool Edit are based on the encoding motor IIS. To my knowledge , only the shareware software MusicMatch Jukebox (but fully functional) seem to propose this encoder but in a complete uninteresting manner. (A commercial version for this software does exist). The Institute Fraunhofer had the intelligence to create a ACM for Windows, to enable all application using the WAV ACM ( and there are a lot of them), to be able to encode MP3. A part from MusicMatch Jukebox (who looks more like a shareware software), you will soon find that those software are real commercial soft wares.

The software XingMPEG Encoder, Real Jukebox, Sound Limit or even more MPEGDJ, uses all four the encoding Xing. Until now, the encoder Xing was well known for its poor quality encoding in CBR however it had a good rating for its encoding in VBR( this is why it pushed Real Networks into buying back Xing Technologies). Today this encoder seems to be left out and only the Real Jukebox from Real Networks seems to continue to get credit.

The Encoders Blade, Lame, or again Gogo are all three based on the ISO specifications of encoding MPEG-3. Thousands of freeware software uses their encoding library: Let’s look at some examples: DBPowerAmp, Razorlame, AudioCrusher or AudioGogo… These are for most of them free software. However, a lot of shareware software like MediaJukebox or again Audiograbber are equally capable of exploiting those libraries… To conclude, please note that a Japanese web surfer has recently inserted into ACM the source code of the encoder ISO Lame. (Code that I have equally translated in French). An excellent initiative of his behalf who puts the encoder IIS and the encoder Lame on the same level.

Which one to choose? The answer is simple. This rests on the speed of your encoding. It was proven, many times, that encoding IIS was very good in terms of low connections. Up to 128 Kbps, the encoder was doing true miracles. (In fact, in his official version, he doesn’t go above that). An optimised version of this encoder was born, created by the pirate group Radium. Result: “ unblocking” the encoder( encoding possible up to 320Kbps) and improves the encoding speed. Thus making the success of this encoder due to the integration in the distribution pack of the codec Divx. But this version of MP3 is obviously illegal. The use of the MP3 codec from the Fraunhofer institute may be interesting if you count to transfer regularly music on your MP3 player, limited in memory and may need most of the time a re-encoding in 64kbps.

Above 160 Kbps, proven to be the best quality encoders are Blade, Lame, and Gogo (based on the same motor of Lame but optimised for MMX, SSE, KNI, 3Dnow, etc…) in broadband connections. I personally admit, that I never encode under 128Kbps, this for me is my minimum transfer rate. In general, I always use a minimum of 160Kbps for CBR.

Which Software do I have to choose?

Today a thousand of software is capable of handling MP3. Difficult to make a choice. At this stage, it is more a question of precision and knowing what you want. No need to put a high all in one software especially if the work is not better than the others. Nice interface but results are not that satisfactory compared to others. These software know how to extract, encode, decode, burn cd, but not going in deep depths in each of those categories. Softwares unique, sometime complex, integrating all necessary tools from the creation to the management of your MP3’s. However, lets not get to excessive, I mean by that you will have so much software to a pre-determine task that you will find yourself with a multitude of software to realise only 3 or 4 tasks. However, your MP3 will be perfect…But what a waste of time!