The sub-committee MPEG is under the ISO (International Standards
Organisation) organisations and the IEC (International Electrotechnical
Commission) and his work consists to create standard numerical compression,
either audio or video. These standards are written down on paper.
But in order to put them in application, we need to create the adequate
software, because no algorithm is defined by the teamwork MPEG.
Only the sub-committees provide methods to test the encoding and
decoding of compressed data with the help of the Standard set. Bear
in mind that the Team Work MPEG publishes regularly Technical reports
with results of their last research.
In 1987, the German Institute Fraunhofer (FhG) start the new creation
of what will become later on the MP3 format, the way we know it
today. A certificate of achievement will even crown those first
positive results obtained for their algorithm in audio compression.
The first easy to gain access software were born 3 years ago (thank
you Windows) even so I remember encoding my first MP3’s under
command line mode (DOS) in 1997. Those were the days…
Please note that the German Institute Fraunhofer (FhG) is a research
institute belonging to Thomson Multimedia. The certificate registered
is European…
In going more in depth, the MP3 audio compression enable us to
stock 12 hours of music non stop in CD quality, on a simple CD-R
of 74 minutes (650MB). With 80 minutes (700MB) we can go up to 14
hours. Most impressive when we know that a CD audio can only hold
74, even 80 minutes of music. With MP3, the compression ratio is
in general 10:1, even 12:1. MP3 is probably the most known around
compression format. It’s even he who is no 1. Files encoded
in MP3 can also be read on PC, Mac, Linux, BeOS, Amiga or PocketPC.
This format also represents the excellent ways of distribution over
the Internet.
The compression MPEG-1 Layer III is based on a encoding called
perceptual. This is based on a human research perception, which
means that the encoder decides if the relevant information must
be kept or withdrawn. In order to do this, these are the techniques
used:
The threshold of the human hearing
The minimum threshold of the human audition is not shelf space.
Under the Fletcher and Munsen law (open your encyclopaedia to
find out more), the minimum threshold is represented by a curb
that shows us a gap between 2 KHZ and 5 KHZ.
There is no need to code those sounds under this path because
the human average ear will not hear it. Concretely, frequencies
of over 2000 Hz (2 KHz) will have to be skipped by the MP3 encoder.
To give you an idea a passing band is held between 500Hz and 2
KHz. However, regarding the human hear, please note that it is
capable to detect sounds where frequencies are between 100Hz and
+/- 16 KHz…
The Masking Effect
When playing stronger sounds, you never hear the weakest sounds:
This is called the Masking Effect. Here too, it is not necessary
to code all sounds. For this to be done, the MP3 encoder employs
a psychoacoustic model based on the behaviour of the human ear.
This psychoacoustic model analyses signal entries on a few consecutive
blocks and determine in each blocks the spectrum signal. He then
modelises the properties of the masking in proportion to our hearing
system and gives an estimate of the minimal audible rate.
The psychoacoustic model
Two psychoacoustic models are available today. The 1st model
is a simple sound generator of mask tonal. This one is recommended
when sound is not too complex.
The model 2 is a bit more sophisticated in his own way to mask
sounds. Most often this is the recommended default option, because
he gives good results even in low connections. To conclude, please
note that few MPEG encoders accept three parameters. Don’t
be surprised…
Huffman Coding
At the end of the compression, the information is coded using
the Huffman algorithm. This coding creates codes of various lengths
in a round number of bits. The Huffman codes uses a unique prefix,
they can be decoded correctly despite of their various length.
This type of encoding enables us to save at least 20% of disk
space. Still today, he is used in a number of algorithm compressions:
The Format ZIP uses this algorithm and also many graphics format:
JPEG (standard version) or TIFF (option)…
Spatial original sound
Under a given frequency, the human ear is not capable to determine
the original spatial of a sound. The MP3 format exploits this
particularity thanks to a technical term called joint stereo.
Technically, joint stereo records low frequencies like monophonic
signals, signals which comes with added information to be able
to reconstitute an appearance of spatialization.In fact this technique
implies really a light loss of the stereophonic and must only
be used in low connections. (Inferior or equal to 128Kbps).
The use of joint stereo can be really interesting when you re-encode
(at 64kbps for example) of your MP3 to which the data is transferred
to your player.
Why? Simple, due to the lack of memory put at their disposal.
However, in most cases I invite you to use only the stereo mode…
in the objective of pure quality obviously.
Flag
For each MP3, three flags may be activated: copyright, original
and private. They are only there for information purposes. There
is also a fourth one, but lets study it for a while: CRC. A part
from Audio data, each tram of an MP3 can contain a CRC code, or
Code of cyclic redundancy.
This information permits us to correct errors and also to verify
the integrity of the file. If your encoder has this option, feel
free to use this option, it is most useful. To put you in the
right direction, please note that the shareware software Audio
Grabber exploits this to perfection. But this is not the only
good software obviously…
Channels: mono and stereo.
With the notions of high debit and transfer rate (which we will
point a little further) comes equally the add on of channels.
Present channel may be monophonic (one path), stereophonic (two
paths – this is the most frequent case) or in joint stereo.
The joint stereo is most used to improve the quality of the files
encoded with the help of weak transfer rates ( under 96kbps) This
mode permits us de improve the quality of the final phase of the
file and mostly used with MP3 players. Thus, limited by their
memory size holds mostly pieces of music encoded in 64 kbps joint
stereo… This is not always the case and it isn’t rare
to see pieces of music encoded in 128 kbps using joint stereo.
We have already had the occasion to see in the previous chapter
the mode joint stereo. (In depths). We need to know that this
mode includes equally MS stereo mode and MS/IS Stereo. Depending
which software you use for the encoding, the name of the mode
joint stereo can vary. Here is a complete retrospective of all
types of channels that you could encounter.
1. Mono: Only one channel exists, most often
simultaneously playing on both speakers right or left whatever
installation you may have.
2. Dual Channels or dual mono: The channels
right and left are treated in an independent way. For example,
you can use this mode if in your audio file you wish to insert
two languages: French and English. Each of those languages will
be monophonic but this is a particularity of MPEG encoding,
not well know by the public, this will enable you to easily
create multilingual VideoCD.
3. Stereo: The right and left channels are
treated simultaneously without the encoder giving a conflict
between each other. Despite what the public believes, all phase
information are well kept. This information can be used by a
Dolby surround matrices or Dolby prologic. However, this mode
negotiate the demand of bits of 2 channels, to give - for example
– a channel more space if the other one is silent…
4. MS Stereo: In Mid-Size Stereo, the encoder
make a correlation between the right channel and left. This
correlation has for effect de boot up the quality of the compression.
There too, this mode does not destroy information used by the
formats Dolby Surround and Dolby Prologic.
For sources nearly-mono, this means when a monophonic source
plays on the right and left channels, this gives a gain of quality
not refutable, with look alike Stereophonic spatialization.
5. MS/IS Stereo: In this mode mixing the mid-size
stereo (MS) and Intensity stereo (IS), the encoder decompose
High frequencies in monophonic signal to which we add on a directional
signal. This mode especially permits us to have low connection
encoding of good quality. Please note however that during the
phase information is a little lost in a MS/IS stereo.
That is it for the characteristics of the MP3 compression.
Lets concentrate on how choosing a transfer rate. It can be
constant (CBR) or Variable (VBR). Let me explain.
CBR (Constant Bit Rate), or Constant Rate
At this time, the CBR is a universal and reliability source.
His transfer rate can go from 32 Kbps to 320 Kbps in the utmost
condition. However this will vary depending the sample’s
frequency that you choose. Encoders the most powerful will be
able to encode from 8 KHz…
In CBR, you must indicate a constant transfer rate (a kind of
maximal rate), to express the quality that you wish to give to
your final phase of your file. More the transfer rate is high,
better the quality of the final restitution of your MP3, but your
file will take more space on your hard disk.
Generally, we do not go down under 96 Kbps, because under this
rate the encoding will become inaudible, even so bad quality.
(Except in a case of a Web radio). To this stage, all encoders
do not react the same way. For example, the encoder Blade finds
itself completely lost in low connections however the Lame encoder,
does here, again, miracles!
VBR (Variable Bit Rate) or Variable Rate
VBR must be seen from a more quality point of view. Here as well
his transfer rate can vary between 32 Kbps to 320 Kbps. (same
thing as said previously).
Opposite to CBR, you must indicate in VBR, not a fix rate, but
a range (maximal rate and minimum rate) in which the encoder will
take baring in mind the sample to encode. The more the range,
better quality will come from your file, and most of all you will
have the chance to have a small file.
Be careful however, it is not rare to see a file encoded in VBR
a lot bigger than if it had been encoded in CBR. Generally, the
ideal range starts at 160Kbps and finishes at 256 Kbps in the
context of a VBR encoding of quality. We will come back to it.
Which rate must we choose?
In CBR and at 128 Kbps, the quality of the restitution of a MP3
is very close from a CD audio and enables us to stock about 12
hours of music non-stop on a simple CD-R of 700 MB. This rate
represents today the standard distribution adopted by all. However,
this is not the ideal rate…
In VBR, it is very difficult to recommend an ideal transfer rate.
I have a few friends audiophile who achieves VBR encoding at 100%,
which is ranged between 32 Kbps and 320 Kbps.
After testing personally, I have myself adopted these parameters
due to the final result which will give a much better quality.
Nevertheless we will always be under to see what can bring the
lossless encoding in Monkey’s Audio.
Which motor compression must I choose?
Three sorts of MP3 exists today. The officials based upon the
compressor created by the German Institute Fraunhofer-Geshellschaft
IIS ( equally called FhG or IIS), those based on the encoding
motor of Xing, belonging today to the American company Real Network,
and those based on ISO specifications built by the Forum MPEG.
The first two are for commercial purposes, because they both require
a licence to be able to use them commercially or not. On the other
hand ISO are totally free of charge and use.
The encoders Audioactive Production Studio, MP3 Producer, even
more plug-ins exportation such as software called Sound Forge
or Cool Edit are based on the encoding motor IIS. To my knowledge
, only the shareware software MusicMatch Jukebox (but fully functional)
seem to propose this encoder but in a complete uninteresting manner.
(A commercial version for this software does exist). The Institute
Fraunhofer had the intelligence to create a ACM for Windows, to
enable all application using the WAV ACM ( and there are a lot
of them), to be able to encode MP3. A part from MusicMatch Jukebox
(who looks more like a shareware software), you will soon find
that those software are real commercial soft wares.
The software XingMPEG Encoder, Real Jukebox, Sound Limit or even
more MPEGDJ, uses all four the encoding Xing. Until now, the encoder
Xing was well known for its poor quality encoding in CBR however
it had a good rating for its encoding in VBR( this is why it pushed
Real Networks into buying back Xing Technologies). Today this
encoder seems to be left out and only the Real Jukebox from Real
Networks seems to continue to get credit.
The Encoders Blade, Lame, or again Gogo are all three based on
the ISO specifications of encoding MPEG-3. Thousands of freeware
software uses their encoding library: Let’s look at some
examples: DBPowerAmp, Razorlame, AudioCrusher or AudioGogo…
These are for most of them free software. However, a lot of shareware
software like MediaJukebox or again Audiograbber are equally capable
of exploiting those libraries… To conclude, please note
that a Japanese web surfer has recently inserted into ACM the
source code of the encoder ISO Lame. (Code that I have equally
translated in French). An excellent initiative of his behalf who
puts the encoder IIS and the encoder Lame on the same level.
Which one to choose? The answer is simple. This rests on the
speed of your encoding. It was proven, many times, that encoding
IIS was very good in terms of low connections. Up to 128 Kbps,
the encoder was doing true miracles. (In fact, in his official
version, he doesn’t go above that). An optimised version
of this encoder was born, created by the pirate group Radium.
Result: “ unblocking” the encoder( encoding possible
up to 320Kbps) and improves the encoding speed. Thus making the
success of this encoder due to the integration in the distribution
pack of the codec Divx. But this version of MP3 is obviously illegal.
The use of the MP3 codec from the Fraunhofer institute may be
interesting if you count to transfer regularly music on your MP3
player, limited in memory and may need most of the time a re-encoding
in 64kbps.
Above 160 Kbps, proven to be the best quality encoders are Blade,
Lame, and Gogo (based on the same motor of Lame but optimised
for MMX, SSE, KNI, 3Dnow, etc…) in broadband connections.
I personally admit, that I never encode under 128Kbps, this for
me is my minimum transfer rate. In general, I always use a minimum
of 160Kbps for CBR.