Hydrogenaudio Knowledgebase - User contributions [en]

AAC FAQ

2011-09-17T09:52:09Z

Junh1024:

=Overview AAC FAQ=
===Great, so you've given me all the technical stuff, but what is [[AAC]] really?===
[[AAC]] is the culmination of the current state of the art audio encoding techniques. It is designed
to improve upon and replace [[MP3]] as the defacto Audio Encoding standard. It usually offers (depending on the codec) equivalent quality to MP3 at a lower bitrate.

===What is the difference between *.MP4 and *.M4A?===
Besides the extension, absolutely nothing. Apple came up with extension to distiguish between files with Video and Audio (the [[MP4]] extension) and files with Audio only (the M4A extension). As far as the internal structure of the file, nothing is different.

===What MPEG 4 extensions does the Apple iPod Accept?===
The iPod accepts files with the MP4 extension, the M4A extension, the M4P extension (a Protected AAC file), and the M4B extension for audiobook files (which can be either protected or unprotected). It will not accept unwrapped AAC files (files with the .AAC extension).

===What is the difference between LC (Low Complexity) and HE (High Efficiency)?===
These are two of the various Object Types in the MPEG4 Systems Standard. LC is the most popular Object Type with all encoders/decoders supporting it. Currently, Apple, Nero, Coding Technolgies, and Panasonic have incorporated the HE AAC standard into their encoders, which allows for higher quality sound at lower bitrates then the LC Object Type does (at the same bitrate). The HE Object Type is only used for music with a bitrate of less than ~80kbps.

===What's the best AAC encoder?===
Deciding the best AAC encoder is difficult, because the quality of an encoding depends not only on the encoder implementation, but also on bitrate, audio content, playback equipment and conditions, and the subjective perceptual judgement of the listener at playback time.

Since it is very difficult to quantify the quality of an encoder, [[Listening Tests|listening tests]] are used.

Guruboolez's [http://www.hydrogenaudio.org/forums/index.php?showtopic=29924 last test] concluded that [http://www.nero.com/en/ Nero AAC] was the best AAC encoder, at 128kbps, on classical samples, at the time the test was conducted.

On the other hand, a public [http://www.rjamorim.com/test/multiformat128/results.html listening test] conducted by [[User:rjamorim|rjamorim]] in mid-2004 comparing different codecs, at 128kbps, with several music styles and featuring several listeners concluded that [[iTunes]] (the only AAC codec included in the test) was better than other codecs - even VBR-enabled ones.

The quality of any encoder is not linear with bitrates, and therefore these results can not be extrapolated to other higher or lower bitrates. It can also be said with great confidence that both the iTunes AAC encoder and the Nero AAC encoder, although still under development at the end of 2006, are relatively 'mature' and should not fail badly (result in any obvious artifacts) on any particular sample at an average bitrate of 128kbps (i.e. Internet Profile for Nero AAC) or above (based on Roberto's listening tests, see bottom).

Beyond that, only you can decide; you may want to conduct your own private [[Listening Tests|listening tests]], or you may base your decision on other criteria besides audio quality. See the [[Audio format guide]] for more information.

===Do AAC encoded files play back gaplessly?===
[[Gapless]] playback is not part of the AAC standard and as such is not mandatory. However, certain companies can choose to add gapless encoding/decoding if they desire, providing it doesn't break compatibility with previous decoders. This is what Ahead have done with their Nero AAC codec. The files get encoded with information that allows the gap heard between files to be removed. This however is only possible with supported players (currently these include foobar2000 and Nero ShowTime). Currently Nero AAC and FAAC are the only encoders to have gapless encoding/decoding support.

===What software players can play back AAC music?===
There are now a number of software players that can play back this new format. [[foobar2000]] is considered by many to be a very high quality audio player, and it is certainly capable of playing back AAC encoded files. Other players include [http://amarok.kde.org Amarok] using [http://www.audiocoding.com/ libfaad2], Apple's [[iTunes]], [[Winamp]], [http://www.real.com/ Real Player] and [http://www.microsoft.com/windows/windowsmedia/default.aspx Windows Media Player] using the [http://corecodec.org/projects/coreaac CoreAAC filter] and [http://www.elecard.com/download/ Moonlight MP4 Demultiplexer]. Also for Directshow-based applications playback and encoding is possible using the commercial [http://www.3ivx.com/ 3ivx filter suite].

===What hardware players can play back AAC music?===
There are also a few hardware players that can play back AAC audio. The most famous of these is the [[Apple iPod]] series of products, all of which feature AAC playback. A number of mobile (cell) phones also support unwrapped AAC (AAC not contained in the MP4 container). Recent Pioneer HT receivers can play back AAC files on a USB key or other USB mass-storage device.

==Related Links==

* [[AAC]] description article
* Known [[AAC implementations]].
* Read the [[AAC guide]] to learn how to obtain AAC/[[MP4]] files out of [[WAV]] files and CDs.
* Detailed AAC comparisons can be found at [http://www.rjamorim.com/test/ Roberto's listening tests page].

[[Category:Technical]]
[[Category:Codecs]]

AAC encoders

2011-09-17T09:49:35Z

Junh1024: /* Nero AAC */

These are some known [[AAC]] encoder implementations.

==[[Nero AAC]]==

A commercial implementation of both LC AAC and HE AAC, Nero AAC is produced by Nero AG as part of their Nero Digital line of products. It is generally percieved to have the highest quality VBR LC AAC implementation (although [[QuickTime AAC]] beats it in CBR mode at 128kbps). The codec can also create HEv1/v2 AAC streams for extremely low bitrates and supports multi-channel surround sound encoding. As of May 2006, Nero AAC is available for free as a command line tool called "Nero Digital Audio" [http://www.nero.com/nerodigital/eng/down-ndaudio.php here].

===Recommended Nero AAC Presets===

NOTE: Once a preset has been selected, the "Encoding Quality" option should be changed to the "Fast" mode. Despite the name implying worse quality then high, a test undertaken by guruboolez shows that the "Fast" mode offers significant quality advantages over the "High" (see the test [http://www.hydrogenaudio.org/forums/index.php?showtopic=29924 here]). In the forthcoming release of Nero AAC 3.0 (or a release soon afterwards), the "Fast" mode will become the default and the high quality mode will be removed.

====High Quality====

: - VBR/Stereo - Streaming, 100-120 Kb/s (LC AAC) / Actual bitrate ~150kbps

====Portable====

: - VBR/Stereo - Internet, 90-100 Kb/s (LC AAC) / Actual bitrate ~128kbps

====Small Filesize====

: - VBR/Stereo - Portable, 50-70 Kb/s (HE AAC) / Actual bitrate ~90kbps

The High Quality preset is for the archival of music, while the Small Filesize preset is for internet/streaming purposes.

More information can be found in the [ftp://ftp6.nero.com/infosheets/Nero_Digital/db_nerodigital5.pdf Nero Digital PDF] and on the [http://www.nerodigital.com/ Nero Digital Website].

==iTunes AAC==

Another proprietary AAC implementation, [[iTunes]] AAC is known to be one of the highest quality medium-bitrate [[CBR]] LC AAC encoders.

The codec is available for free through the [[iTunes]] Digital Jukebox.

More information can be found about Apple's AAC implementation on their [http://www.apple.com/quicktime/technologies/aac/ AAC Audio information page].

The current recommended high quality encoding setting is 160kbps, or 128kbps for portable use.

The iTunes AAC encoder can be externally envoked via a command line tool created by Otto42 called [http://www.rarewares.org/files/aac/iTunesEncode46.zip iTunesEncode], which can be found at [http://www.rarewares.org/ RareWares] in the AAC section. This allows for the generation of iTunes AAC files from audio formats not inherently supported by iTunes using the format conversion functionality found in programs such as [[foobar2000]].

==FAAC==

[[FAAC]] is a free LC AAC encoder under the Lesser GPL license. Its quality has improved drastically over the last few years and FAAC is nowadays a viable alternative to the commercial encoders (although, at 128kbps or lower bitrates, not at the same quality level as some of them, according to Guruboolez's [http://www.hydrogenaudio.org/forums/index.php?showtopic=29924 last listening test]).

The default quality setting is -q 100 -c 16000 (~120kbps average bitrate), for better quality encodings use -q 150 -c 22000 (~175kbps average bitrate).

More information can be found at [http://www.audiocoding.com/ AudioCoding].

==HHI/zPlane (Compaact!)==

Compaact is one of the newest AAC encoders. Like Nero AAC, compaact is not free, however it does offer an impressive feature set. Roberto Amorim's last AAC test showed that at 128kbps, Compaact! is tied with both the FAAC and Coding Technologies (Real) encoders. Compaact! features both the LC and Main Object Types, [[CBR]], [[VBR]], [[Multichannel]], high resolution (24bit/96kHz) encoding, and command line support. Development on Compaact has stopped.

For portable encoding, try -q5 to -q6. For music archive purposes, try -q7 to -q8.

More information can be found at the [http://www.compaact.com/aacPage.php?SPRACHE=UK&PAGE=compaact Compaact website].

==PsyTEL==

The creation of Ivan Dimkovic (who now works on Nero AAC), PsyTEL AAC was one of the first AAC encoders. Its multichannel support has bugs that make it unusable, but its stereo mode had the best quality available in its day. Since the implementation of Nero AAC, this codec has become obsolete. It's is now outclassed by both Nero AAC and [[iTunes]] - both offer higher quality and are much faster encoders.

The PsyTEL encoder can be found in the AAC section of [http://www.rjamorim.com/rrw/ ReallyRareWares].

===Usability (Psytel aacenc/fastenc)===

; -tape
; -radio
; -internet
; -streaming
; -normal
; -extreme
; -archive
; -ultra

For music encoding. The quality ranges from -tape (lowest [[VBR]] quality) to -ultra (highest VBR quality). Ultra is considered overkill for most audio tracks, i.e: shouldn't be used except for extremely difficult music signals. Example: aacenc -extreme -if "audio file.wav"

===Encoder switches (Psytel aacenc/fastenc)===

; -if
: Input filename. The name of the track to be encoded (must be a [[WAV]] file)

; -of
: Output filename. May be omitted, because encoder will automatically create the output file name from the input file name.

; -br
: Bitrate switch ([[CBR]] mode). Sets the number of bits utilized per second for the encoding process. Example: aacenc -br 192 -if "audio file.wav"

; -vbrhi
: High quality [[VBR]] mode. Can be used with -br switch to select base BitRate. If -br is not specified, it takes as default 64kbps/channel. Example: aacenc -br 192 -vbrhi -if "audio file.wav"

; -vr
: Lower quality [[VBR]] mode. Recommended for internet streaming. Example: aacenc -vr -if "audio file.wav"

; -c
: LowPassFilter cut-off (in Hertz). Not recommended. Example: aacenc -br 128 -c 15995 -if "audio file.wav"

; -qual
: Encoder quality level (1 to 9). 9 is usually taken as default, but you can use smaller numbers if you need high speed and high quality isn't essential. Example: aacenc -br 192 -qual 9 -if "audio file.wav"

; -adif
: Use adif instead of adts (default) headers. For compatibility with some decoder software and hardware players. Example: aacenc -br 192 -adif -if "audio file.wav"

; -nh
: No headers (raw iso aac stream). For decoder compatibility. Example: aacenc -br 192 -nh -if "audio file.wav"

; -profile "x"
: Choose iso aac encoding profile:
:: 0 - low complexity (default, recommended)
:: 1 - main (not recommended, buggy)
:: 2 - main ltp (mpeg-4 only)

: Only lc profile is playable on hardware players so far. Example: aacenc -br 192 -profile 2 -if "audio file.wav"

; -ihsc
: Improved human speech coding. Best for human voice encoding. Not recommended for low Bitrates or [CBR] coding. Example: aacenc -vbrhi -br 192 -ihsc -if "audio file.wav"

; -low_ath
: Tells encoder to use highest sensitivity threshold of audibility. Not recommended on Bitrates lower than 192kbps. Example: aacenc -br 192 -low_ath -if "audio file.wav"

; -pns
: (perceptual noise substitution) - Improves the quality at very low Bitrates. Should be used only at 64kbps or less. Example: aacenc -br 56 -pns -if "audio file.wav"

==Imagine==

Imagine Technology provided an [[MPEG-4]] LC AAC plugin for [[Adobe Audition]]. This plugin provided file input and output for the MPEG-4 AAC specification, defined in ISO/IEC 14496-3. After Imagine was bought by Ingenient Technologies, they stopped marketing the Audition plugin.

==Coding Technologies==

Coding Technologies (CT) is a Swedish/German company that works close to FhG IIS in development and research of new audio compression techniques.

Thet have distinguished themselves in development of parametric coding methods, such as [[SBR]] and Parametric Stereo. SBR is the technology behind the quality boost in MP3pro and HE AAC/AACplus.

They have licensed their encoding and decoding tools to several companies - E.G, Real Networks and Magix.

Does exist an encoder, [[Aacplusenc]], which is based on the Coding Technologies reference code

==FhG==

[http://www.iis.fraunhofer.de/amm/techinf/aac/ Audio & Multimedia MPEG-2 AAC]

==Emuzed==

Emuzed develops and sells various products and technologies for the PC multimedia and embedded multimedia markets. They have ported and optimized codecs for MPEG-4 ASP and AAC LC for a chip vendor preparing to offer bundled multimedia hardware and software. More info can be found at their [http://www.emuzed.com/encoders.html encoders & decoders] page.

==NEC==

NEC Corporation has developed an LC AAC decoding algorithm for mobile devices. They have also developed a codec named MPEG-4 AAC Ext.1, which they claim decreases bitrate while maintaining the same audio quality. The new MPEG-4 AAC Ext.1 coding technology also features high compatibility with current MPEG-4 AAC. For more information, see [http://www.neceurope.com/release.asp?parentid=671&Area=1 NEC's press release].

==Panasonic==

Panasonic has developed an HE AAC codec together with NEC and Coding Technologies as described in
[http://www.telos-systems.com/techtalk/hosted/m4-in-30100%20(M4IF_HE_AAC_paper).pdf this MPEG Industry Forum paper].

==Real/Helix Producer==

RealNetworks has incorporated Coding Technologies/FhG's MPEG-4 AAC / aacPlus™ technology and software within RealNetworks’ software products. As a result, in the newest version of RealProducer 10, AAC has replaced [[ATRAC]]3 as the high bitrate audio codec, and that software can encode AAC files wrapped in the [[MP4]] container. In addition, the Producer SDK on Windows also includes HE-AAC encoding. More info can be found at [http://www.realnetworks.com/company/press/releases/2004/codingtech.html RealNetworks' press release], as well as Coding Technologies' [http://www.codingtechnologies.com/products/aacPlus.htm aacPlus page].

[[Category:Encoder/Decoder]]

Nero AAC

2011-09-17T09:47:53Z

Junh1024:

A commercial implementation of both LC AAC and HE AAC, Nero AAC is produced by Nero AG as part of their Nero Digital line of products. It is generally percieved to have the highest quality [[VBR]] LC AAC implementation (although [[QuickTime AAC]] beats it in [[CBR]] mode at 128kbps). The codec can also create HEv1/v2 AAC streams for extremely low bitrates and supports multi-channel surround sound encoding. As of May 2006, Nero AAC is available for free as a command line tool called "Nero Digital Audio".

==Command Line Options==

===Usage===
neroAacEnc.exe [options] -if <input-file> -of <output-file>

<input-file>: Path to source file to encode. The file must be in Microsoft WAV format and contain PCM data. Specify - to encode from stdin.

<output-file>: Path to output file to encode to, in MP4 format.

===Quality/Bitrate Control===

'''-q <number>''':
Enables "target quality" mode. <number> is floating-point number between 0 and 1.

'''-br <number>''':
Specifies "target bitrate" mode. <number> is target bitrate in bits per second.

'''-cbr <number>''':
Specifies "target bitrate (streaming)" mode. <number> is target bitrate in bits per second.

When none of above quality/bitrate options is used, the encoder defaults to equivalent of -q 0.5

===Multipass Encoding===
'''-2pass''':
Enables two-pass encoding mode. Note that two-pass more requires a physical file as input, rather than stdin.

'''-2passperiod <number>''': Overrides two-pass encoding bitrate averaging period, in milliseconds. Specify zero to use least restrictive value possible (default).

===Advanced Features / Troubleshooting===
'''-lc''': Forces use of LC AAC profile (HE features disabled)

'''-he''': Forces use of HE AAC profile (HEv2 features disabled)

'''-hev2''': Forces use of HEv2 AAC profile

Note that the above switches (-lc, -he, -hev2) should not be used; optimal AAC profile is automatically determined from quality/bitrate settings when no override is specified.

'''-hinttrack''': Generates an RTP hint track in output MP4 file.

'''-ignorelength''': Ignores length signaled by WAV headers of input file. Useful for certain frontends using stdin.

==Example EAC Settings==

The following settings are examples of what one might use for music and spoken word sources, respectively. Files will be tagged properly using these commands via neroaactag.exe. These commands do run ReplayGain on the files.

This is the syntax for older versions of EAC. For 1.0 beta 2 and above, variables have a changed syntax. See [[http://wiki.hydrogenaudio.org/index.php?title=EAC_and_AAC]]

===320kbps CBR AAC (Forced LC)===
'''Program, including path, used for compression''': C:\Windows\system32\cmd.exe<br>
'''Additional Command Line Options''': /c ""C:\path\to\neroaacenc.exe" -cbr 320000 -lc -if %s -of %d && "C:\path\to\Neroaactag.exe" %d -meta:artist="%a" -meta:album="%g" -meta:track="%n" -meta:title="%t" -meta:genre="%m" -meta:year="%y""

===Q0.3 VBR AAC (Forced HC)===
'''Program, including path, used for compression''': C:\Windows\system32\cmd.exe<br>
'''Additional Command Line Options''': /c ""C:\Program Files\Exact Audio Copy\neroaacenc.exe" -q 0.3 -hc -if %s -of %d && "C:\Program Files\Exact Audio Copy\Neroaactag.exe" %d -meta:artist="%a" -meta:album="%g" -meta:track="%n" -meta:title="%t" -meta:genre="%m" -meta:year="%y""

== Nero AAC in Foobar ==

The Nero AAC Codec can be used inside Foobar2000's convert function.

== References ==

[http://www.hydrogenaudio.org/forums/index.php?showtopic=44310 Recommended Settings Sticky]

[http://www.hydrogenaudio.org/forums/index.php?showtopic=44283 Discussion Thread for the Recommended Setting Sticky]

== External References ==
[http://www.nero.com/eng/technologies-aac-codec.html Nero AAC Website].

User:Junh1024

2011-09-17T08:51:43Z

Junh1024:

Hi, I am junh1024 and I specialize in AAC implementations and audio analysis.

I also edit the [http://mewiki.project357.com/wiki/Computer_movie_files/Audio AAC section] at the MeGUI wiki

MainConcept AAC

2011-09-17T08:48:02Z

Junh1024: Created page with "MainConcept AAC is an AAC encoder that is potentially better than Apple AAC and Nero AAC. It is only available as commercialware under Mainconcept Reference and the Mainconcept …"

MainConcept AAC is an AAC encoder that is potentially better than Apple AAC and Nero AAC.

It is only available as commercialware under Mainconcept Reference and the Mainconcept AAC packages for Adobe Creative Suite.

Unique to this encoder in the MainConcept Reference GUI is the ability to change cutoff independently of bitrate. Typical encoders, cutoff is a function of bitrate and cannot be modified directly.

Advanced Audio Coding

2011-09-17T08:43:25Z

Junh1024: small updates

= Introduction =
'''Advanced Audio Coding''' ('''AAC''') forms part of the latest specifications from the MPEG committee, and is their official successor to the popular [[MP3]] format. As with MP3, the AAC format is an international standard, and is backed by several big-name companies, including Dolby, Sony and Nokia.

With the 13 years that had passed since the creation of the MP3 format, many improvements had been realised leading to a seemingly complex specification with several flavours of AAC available. To potentially add to the confusion, AAC is usually wrapped inside an [[MP4]] container to provide tagging and seeking benefits. For this reason, AAC can also be referred to as MP4 audio.

There are several AAC encoders to choose from, coming from large names such as Apple ([[iTunes]] and [[QuickTime AAC]]), Real Networks and Nero AG (Creators of Nero Burning Rom), or the open source [http://www.audiocoding.com FAAC] which is analogous to the [[LAME]] encoder. AAC is supported on some hardware players, most notably the [[Apple iPod]] and some cell phones, and is available in Apple's online store.

In terms of quality, the AAC format is on par with (Ogg) [[Vorbis]], [[LAME]] MP3, [[WMA]] Pro and other modern codecs, and with added SBR coding (HE AAC) it can provide quite high quality at low bitrates.

Recent developments have led to [[AACplus]] which is able to give subjectively good results at low bitrates. The website [http://www.tuner2.com Tuner2] has several Internet radio stations which are sending out streams at low rates – such as 40 kbps – and some of these are surprisingly good considering the bit rates used.

== Pros ==
* An international standard approved by the [http://www.iso.ch ISO]
* Flexible: supports several [[sampling rate]]s (8000–96000 Hz), bit depths, and [[multichannel]] (up to 48 channels)
* Several implementations, including free and high quality ones ([http://www.itunes.com iTunes] or [http://www.nero.com/nerodigital/eng/Nero_Digital_Audio.html Nero Digital])
* Reaches transparency in most samples and for most users at around 150 kbps
* Part of [[MPEG-4]] specs
* Anyone can create its own implementation (specifications and demo sources available)
* Some portable players support it (Philips Expanium, [[Apple iPod]], cell phones from Nokia, Sony Jukebox)

== Cons ==
* Problem cases that trip out all transform codecs
* Heavily patented
* Increased complexity
* '''AAC''' comes in different "flavors" (object types: '''AAC LC''', '''AAC HE''', '''AAC PS''' etc.). Many (especially portable) players only support LC (at the moment) so you can have files that are valid but your player won't play them or play at a reduced quality.

== Technical Information ==
'''AAC''' stands for 'Advanced Audio Coding' and is part of the [[MPEG-4]] Systems Standard. Originally known as MPEG-2 Non-Backwards Compatible (As apposed to MPEG-2 Backwards Compatible) it is the succesor to MPEG-1/2 Layer III ([[MP3]]). It uses the [[MP4]] [[container]] (which is based on Apple's [[MOV]] container) to store metadata (i.e. tag information).

As part of the MPEG-4 Systems Standard, an '''AAC''' encoded file can include up to 48 full-bandwith audio channels (up to 96 kHz) and 15 Low Frequency Enhancement channels (limited to 120 Hz) plus 15 data streams.

'''AAC''' encoding methods are organised into Profiles (MPEG-2) or Object Types (MPEG-4). These different Object Types are not necessarily compatible with each other and may not be playable with various decoders. Some of the various Object Types are:

* MPEG-2 AAC LC / Low Complexity
* MPEG-2 AAC Main
* MPEG-2 AAC SSR / Scalable Sampling Rate
* MPEG-4 AAC LC / Low Complexity
* MPEG-4 AAC Main
* MPEG-4 AAC SSR / Scalable Sampling Rate
* MPEG-4 AAC LTP / Long Term Prediction
* MPEG-4 AAC HE / High Efficiency
* MPEG-4 AAC LD / Low Delay

Different Object Types vary in complexity. Some take longer to encode/decode as a result of the different complexities. Furthermore, the benefits of the more complex profiles are often not worth the CPU power required to encode/decode them. As a result the Low Complexity/LC Object Type has become the profile used by most encoders and supported by most decoders. However, the High Efficiency (HE) Object Type has become more popular recently with its addition to the Nero and Quicktime '''AAC''' encoder.

Currently all players support the LC Object Type, although some will work on only MPEG2 or MPEG4 streams. Players based on the FAAD2 decoder (eg. [[foobar2000]], [[Winamp]] plugins) support almost all Object Types including HE '''AAC'''. 3ivX also supports all Object Types except SSR.

== Technologies used for compression ==
* [[Huffman coding]]
* [[Quantization]] and scaling
* [[Joint stereo|M/S matrixing]]
* [[Intensity stereo]]
* Channel coupling
* Backward adaptive prediction
* Temporal Noise Shaping (TNS)
* Modified Discrete Cosine Transform (I[[MDCT]])
* Gain control and hybrid filter bank (polyphase quadrature filter (IPQF)+IMDCT)
* Long Term Prediction (LTP) – MPEG4 '''AAC''' only
* Perceptual Noise Substitution (PNS) – MPEG4 '''AAC''' only
* Spectral Band Replication ([[SBR]]) – HE '''AAC'''
* Parametric Stereo (PS) – HE '''AAC'''

== Encoders / Decoders (Supported Platforms) ==
* [[Nero AAC]] (Win32 and [[Linux_and_Nero_AAC|Linux under Wine]])
* [[QuickTime AAC]] (Win32/MacOS X)
* [[FAAC]] [[FAAD]] (Multiplatform)
* HHI/zPlane [[Compaact!]] (Win32)
* [[PsyTEL]] (Win32)
* [[aacplusenc]] (Multiplatform)

== External References ==
* [[AAC FAQ]]
* Known [[AAC implementations]].
* Read the [[AAC guide]] to learn how to obtain '''AAC'''/[[MP4]] files out of WAV files and CDs.
* Detailed '''AAC''' comparisons can be found at [http://www.rjamorim.com/test/ Roberto's listening tests page].

[[Category:Codecs]]
[[Category:Lossy]]

Apple AAC

2011-09-17T08:00:14Z

Junh1024:

'''QuickTime AAC''' is known to be one of the highest quality medium-bitrate [[CBR]] and [[VBR]] LC
[[AAC]] encoders and is also another commercial AAC implementation. The current version of Quicktime supports [[Multichannel]] encoding up to 8 channels, and HE-AAC v1 with [[SBR]].

Although this part of the commercial QuickTime Proe, VBR and CBR modes are available for free
through [http://www.apple.com/itunes/ iTunes] (with a maximum [[sampling rate]] of 44.1khz for iTunes 10+ and 48khz for [http://www.oldapps.com/itunes.php?old_itunes=61 iTunes 9-])

Free Third-party CLI interfaces such as [http://sites.google.com/site/qaacpage/ qAAC] expose more modes such as TVBR (equivalent to quality modes in other encoders) and CVBR ("constrained VBR")

In recent tests for [http://listening-tests.hydrogenaudio.org/igorc/results.html 64kbps HE-AAC] and [http://listening-tests.hydrogenaudio.org/igorc/aac-96-a/results.html 96kbps LC-AAC] Quicktime has consistently come out near the top. This also applies to higher bitrates such as 128kbps and is likely to apply to higher bitrates such as 160kbps and beyond.

In 2009, Quicktime was updated to 7.6 which [http://support.apple.com/kb/HT3292 "Improves AAC encoding fidelity"]. The bitrate distribution algorithm was also [http://forums.macrumors.com/showpost.php?p=6986639&postcount=70 changed].

==External links==
*[http://www.apple.com/quicktime Apple's Quicktime Website]

{{stub}}

User:Junh1024

2011-09-17T07:42:48Z

Junh1024: Created page with "Hi, I am junh1024 and I specialize in AAC implementations and audio analysis."

Hi, I am junh1024 and I specialize in AAC implementations and audio analysis.

Apple AAC

2011-09-17T07:41:29Z

Junh1024:

'''QuickTime AAC''' is known to be one of the highest quality medium-bitrate [[CBR]] and [[VBR]] LC
[[AAC]] encoders and is also another commercial AAC implementation. The current version of Quicktime supports [[Multichannel]] encoding up to 8 channels, and HE-AAC v1 with [[SBR]].

Although this part of the commercial QuickTime Proe, VBR and CBR modes are available for free
through [http://www.apple.com/itunes/ iTunes] (with a maximum [[sampling rate]] of 44.1khz for iTunes 10+ and 48khz for [http://www.oldapps.com/itunes.php?old_itunes=61 iTunes 9-])

Free Third-party CLI interfaces such as [http://sites.google.com/site/qaacpage/ qAAC] expose more modes such as TVBR (equivalent to quality modes in other encoders) and CVBR ("constrained VBR")

In recent tests for [http://listening-tests.hydrogenaudio.org/igorc/results.html 64kbps HE-AAC] and [http://listening-tests.hydrogenaudio.org/igorc/aac-96-a/results.html 96kbps LC-AAC] Quicktime has consistently come out near the top. This also applies to higher bitrates such as 128kbps and is likely to apply to higher bitrates such as 160kbps and beyond.

In 2009, the core AAC encoding algorithm was updated with "higher fidelity". The bitrate distribution algorithm was also changed.

==External links==
*[http://www.apple.com/quicktime Apple's Quicktime Website]

{{stub}}

Apple AAC

2011-09-17T06:19:47Z

Junh1024:

'''QuickTime AAC''' is known to be one of the highest quality medium-bitrate [[CBR]] and [[VBR]] LC
[[AAC]] encoders and is also another commercial AAC implementation. The current version of Quicktime supports [[Multichannel]] encoding up to 8 channels, and HE-AAC v1 with [[SBR]].

Although this is a comercial implementation and part of the QuickTime Pro package, VBR and CBR modes (with a maximum [[sampling rate]] of 44.1khz for iTunes 10+ and 48khz for iTunes 9-) are available for free
through [http://www.apple.com/itunes/ iTunes ]

Free Third-party interfaces such as [http://sites.google.com/site/qaacpage/ qAAC] expose more modes such as TVBR (equivalent to quality modes in other encoders) and CVBR ("constrained VBR")

Multichannel channel mapping is buggy but can be remedied by installing an older version of Quicktime (7.68-)

In recent tests for [http://listening-tests.hydrogenaudio.org/igorc/results.html 64kbps HE-AAC] and [http://listening-tests.hydrogenaudio.org/igorc/aac-96-a/results.html 96kbps LC-AAC] Quicktime has consistently come out near the top.

In 2009, the core AAC encoding algorithm was updated with "higher fidelity". The bitrate distribution algorithm was also changed.

More information can be found at the [http://www.apple.com/quicktime Apple Quicktime Website]

{{stub}}

Apple AAC

2011-09-17T06:14:25Z

Junh1024:

'''QuickTime AAC''' is known to be one of the highest quality medium-bitrate [[CBR]] and [[VBR]] LC
[[AAC]] encoders and is also another commercial [[AAC]] implementation. The current version of QUicktime supports Multichannel encoding up to 8 channels, and HE-AAC v1 with [[SBR]].

Although this is a comercial implementation and part of the QuickTime Pro package, VBR and CBR modes (with a maximum [[sampling rate]] of 44.1khz for iTunes 10+ and 48khz for iTunes 9-) are available for free
through [http://www.apple.com/itunes/ iTunes ]

Third-party interfaces such as [http://sites.google.com/site/qaacpage/ qAAC] expose more modes such as TVBR (equivalent to quality modes in other encoders) and CVBR ("constrained VBR")

Multichannel channel mapping is buggy but can be remedies by installing an olvder version of Quicktime (7.68-)

More information can be found at the [http://www.apple.com/quicktime Apple Quicktime Website]

{{stub}}

QuickTime

2011-09-17T06:06:42Z

Junh1024: HE supp.

QuickTime is a multimedia platform developed by Apple computer for their MacOS operating system.

Its most known interface is the QuickTime Player, that is also available for Windows.

The latest version of Quicktime can decode [[multichannel]] AAC and HE-[[AAC]] with [[SBR]], but v2 with [[PS]] AAC files will fallback to [[mono]] decoding.

http://www.quicktime.com

{{stub}}

MP3

2011-09-17T04:42:32Z

Junh1024:

'''MPEG-1 Audio Layer 3''', more commonly referred to as MP3, is a popular digital audio encoding and lossy compression format, designed to greatly reduce the amount of data required to represent audio, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. It was invented by a team of European engineers who worked in the framework of the EUREKA 147 DAB digital radio research program, and it became an ISO/IEC standard in 1991.

== History ==
The MP3 algorithm development started in 1987, with a joint cooperation of [http://www.iis.fraunhofer.de/ Fraunhofer IIS-A] and the University of Erlangen. It is standardized as ISO-MPEG Audio Layer-3 (IS 11172-3 and IS 13818-3).

It soon became the de facto standard for lossy audio encoding, due to the high [[compression rates]] (1/11 of the original size, still retaining considerable quality), the high availability of decoders and the low CPU requirements for playback. (486 DX2-100 is enough for real-time decoding)

It supports [[multichannel]] files (see [http://www.mp3surround-format.com/ page]), [[sampling rate]]s from 16 kHz to 24 kHz (MPEG2 Layer 3) and 32 kHz to 48 kHz (MPEG1 Layer 3)

Formal and informal listening tests have shown that MP3 at the 160-224 kbps range provide encoded results indistinguishable from the original materials in most of the cases.

== Encoding and decoding ==
=== Encoding of MP3 audio ===
The MPEG-1 standard does not include a precise specification for an MP3 encoder. The decoding algorithm and file format, as a contrast, are well defined. Implementers of the standard were supposed to devise their own algorithms suitable for removing parts of the information in the raw audio (or rather its MDCT representation in the frequency domain). During encoding 576 time domain samples are taken and are transformed to 576 frequency domain samples. If there is a transient 192 samples are taken instead of 576. This is done to limit the temporal spread of quantization noise accompanying the transient.

This is the domain of psychoacoustics: the study of subjective human perception of sounds.

As a result, there are many different MP3 encoders available, each producing files of differing quality. Comparisons are widely available, so it is easy for a prospective user of an encoder to research the best choice. It must be kept in mind that an encoder that is proficient at encoding at higher bitrates (such as LAME, which is in widespread use for encoding at higher bitrates) is not necessarily as good at other, lower bitrates.

=== Decoding of MP3 audio ===
Decoding, on the other hand, is carefully defined in the standard. Most decoders are "bitstream compliant", meaning that the decompressed output they produce from a given MP3 file will be the same (within a specified degree of rounding tolerance) as the output specified mathematically in the ISO/IEC standard document. The MP3 file has a standard format which is a frame consisting of 384, 576, or 1152 samples (depends on MPEG version and layer) and all the frames have associated header information (32 bits) and side information (9, 17, or 32 bytes, depending on MPEG version and stereo/mono). The header and side information help the decoder to decode the associated Huffman encoded data correctly.

Therefore, for the most part, comparison of decoders is almost exclusively based on how computationally efficient they are (i.e., how much memory or CPU time they use in the decoding process).

== MP3 file structure ==
[[Image:MP3 file structure.png|thumb|right|500px|Breakdown of an MP3 File's Structure]]
An MP3 file is made up of multiple MP3 frames which consist of the MP3 header and the MP3 data. This sequence of frames is called an Elementary stream. Frames are independent items: one can cut the frames from a file and an MP3 player would be able to play it. The MP3 data is the actual audio payload. The diagram shows that the MP3 header consists of a sync word which is used to identify the beginning of a valid frame. This is followed by a bit indicating that this is the MPEG standard and two bits that indicate that layer 3 is being used, hence MPEG-1 Audio Layer 3 or MP3. After this, the values will differ depending on the MP3 file. The range of values for each section of the header along with the specification of the header is defined by ISO/IEC 11172-3.

Most MP3 files today contain ID3 metadata which precedes or follows the MP3 frames; this is also shown in the diagram.

===VBRI, XING, and LAME headers===
MP3 files often begin with a single frame of silence which contains an extra header that, when supported by decoders, results in the entire frame being treated as informational instead of being played (although some are known to do both). The extra header is in the frame's data section, before the actual silent audio data, and was originally intended to help with the playback of VBR files.

Xing and Fraunhofer each developed their own formats for this header. The Xing-format header is just called the ''Xing header'' or ''XING header''. The Fraunhofer-format header is called the ''VBRI header'' or ''VBR Info header''.

Both formats specify a table of seek points which help players jump to approximate points in the file. In addition to the seek-point table, the Fraunhofer format contains a combined encoder delay & padding value (measured in samples), which can assist [[gapless playback]].

The [[LAME]] encoder further extended the Xing format to work for CBR and to include encoder settings and separate delay & padding values. This version of the Xing header is often called a ''LAME header'' or ''LAME tag'', although according to the spec, the actual LAME tag is only the portion that's different from a normal Xing header. It has an explicit specification, but the Xing and Fraunhofer formats can only be inferred from the C code the companies provided to read the headers:
* [http://gabriel.mp3-tech.org/mp3infotag.html LAME MP3 Info Tag spec]
* [http://www.iis.fraunhofer.de/bf/amm/download/MP3%20VBR-Header%20Software%20Development%20Kit.zip Fraunhofer MP3 VBR-Header SDK]
* [http://www.mp3-tech.org/programmer/sources/vbrheadersdk.zip Xing Variable Bitrate MP3 Playback SDK]

== Technical information ==
=== Codec block diagram ===
A basic functional block diagram of the MPEG1 layer 3 audio codec is as shown below.
[[Image:Layer3_block.png|frame|center|Block diagram of the MPEG1 layer 3 audio]]

=== The hybrid polyphase filterbank ===

The polyphase [[filterbank]] is the key component common to all layers of MPEG1 audio compression. The purpose of the polyphase filterbank is to divide the audio signal into 32 equal-width [[frequency]] [[subband]]s, by using a set of [[bandpass filters]] covering the entire audio frequency range (a set of 512 tap FIR Filters).

====Polyphase Filterbank Formula====
[[Image:Poly_samples.png|frame|center|Polyphase filterbank]]

Audio is processed by frames of 1152 samples per audio channel. The polyphase filter groups 3 groups of 12 samples (3x12=36) samples per subband as seen from the picture above (3x12x32 subbands=1152 samples).

The polyphase filter bank and its inverse are not [[lossless]] transformations. Even without [[quantize|quantization]], the inverse transformation cannot perfectly recover the original signal. However by design the error introduced by the filter bank is small and inaudible.<br /><br />[[Image:Mdct.png|frame|center|MDCT]]<br />MDCT formula: <math> X(m)= \sum_{k=0}^{n-1}f(k)x(k)\cos [{ {\pi \over {2n}} ({2k+1+{n \over 2}})({2m+1})}],~m=0 ... {n \over 2}-1</math><br />

Layer 3 compensates for some of the filter bank deficiencies by processing the filter bank output with a Modified Discrete Cosine Transform ([[MDCT]]). The polyphase [[filterbank]] and the MDCT are together called as the hybrid filterbank. The hybrid filterbank adapts to the signal characteristics (block switching depending on the signal etc.).

The 32 [[subband]] signals are subdivided further in frequency content by applying a 18-spectral point or 6-spectral point MDCT. Layer 3 specifies two different MDCT block lengths: a long block (18 spectral points) or a short block (6 spectral points).

Long blocks have a higher frequency resolution. Each subband is transformed into 18 spectral coefficients by MDCT, yielding a maximum of 576 spectral coefficients (32x18=576 spectral lines) each representing a bandwidth of 41.67Hz at 48kHz sampling rate. At 48kHz sampling rate a long block has a time resolution of about x ms. There is a 50% overlap between successive transform windows, so the window size is 36 for long blocks.

Short blocks have a higher time resolution. Short block length is one third of a long block and used for transients to provide better time (temporal) resolution. Each subband is transformed into 6 spectral coefficients by MDCT, yielding a maximum of 192 spectral coefficients (32x6=192 spectral lines) each representing a bandwidth of 125Hz at 48kHz [[sampling rate]]. At 48kHz sampling rate a short block has impulse response of 18.6ms. There is a 50% overlap between successive transform windows, so the window size is 12 for short blocks.

Time resolution of long blocks and time resolution of short blocks are not constants, but jitter depending on the position of the sample in the transformed block. See [http://hydrogenaudio.org/musepack/klemm/www.personal.uni-jena.de/~pfk/mpp/timeres.html here] for diagrams showing the average time resolutions of different codecs.

[[Image:Freqlines.png|center|frame|Psychoacoustic-MDCT]]

Block switching ([[MDCT]] window switching) is triggered by [[Psychoacoustic|psycho acoustics]].

For a given frame of 1152 samples, the MDCT's can all have the same block length (long or short) or have a mixed-block mode (mixed-block mode for Lame is in development).

Unlike only the polyphase [[filterbank]], without quantization the MDCT transformation is [[lossless]].

Once the MDCT converts the audio signal into the [[frequency domain]], the [[aliasing]] introduced by the subsampling in the filterbank can be partially cancelled. The decoder has to undo this so that the inverse MDCT can reconstruct the [[subband]] samples in their original aliased form for reconstruction by the synthesis filterbank.

=== The psychoacoustic model ===

This section is a work in progress. It is incomplete and data is still being gathered.

==== Concepts ====
;[[Critical band]]s
: Much of what is done in simultaneous [[masking]] is based on the existence of critical bands. The hearing works much like a non-uniform filterbank, and the critical bands can be said to approximate the characteristics of those filters. Critical bands does not really have specific "on" and "off" frequencies, but rather width as a function of [[frequency]] - critical [[bandwidth]]s.

;Tonality estimation

;Spreading function
: Masking does not only occur within the [[critical band]], but also spreads to neighboring bands. A spreading function SF(z,a) can be defined, where z is the frequency and a the amplitude of a masker. This function would give a masking threshold produced by a single masker for neighboring frequencies. The simplest function would be a triangular function with slopes of +25 and -10 dB / [[Bark]], but a more sophisticated one is highly nonlinear and depends on both frequency and amplitude of masker.

;Simultaneous masking
: Simultaneous [[masking]] is a frequency domain phenomenon where a low level signal, e.g, a smallband noise (the maskee) can be made inaudible by simultaneously occurring stronger signal (the masker), e.g, a pure tone, if masker and maskee are close enough to each other in frequency. A masking threshold can be measured below which any signal will not be audible. The masking threshold depends on the sound pressure level (SPL) and the frequency of the masker, and on the characteristics of the masker and maskee. The slope of the masking threshold is steeper towards lower frequencies,i.e., higher frequencies are more easily masked.

: Without a masker, a signal is inaudible if its SPL is below the threshold of quiet, which depends on frequency and covers a dynamic range of more than 60 dB. We have just described masking by only one masker. If the source signal consists of many simultaneous maskers, a global masking threshold can be computed that describes the threshold of just noticeable distortions as a function of frequency. The calculation of the global masking threshold is based on the high resolution short term [[frequency|amplitude]] spectrum of the audio or speech signal, sufficient for critical band based analysis, and is determined in audio coding via 512 or 1024 point FFT. In a first step all individual masking thresholds are calculated, depending on signal level, type of masker(noise or tone), and frequency range. Next the global masking threshold is determined by adding all individual thresholds and the threshold in quiet (adding this later threshold ensures that the computed global masking threshold is not below the threshold in quiet). The effects of masking reaching over [[critical band]] bounds must be included in the calculation. Finally the global signal-to-mask ratio (SMR) is determined as the ratio of the maximum of signal power and global masking threshold.

;Temporal masking
: In addition to simultaneous [[masking]] two [[time domain]] phenomena also play an important role in human auditory perception, pre-masking and post-masking. The temporal masking effects occur before and after a masking signal has been switched on and off, respectively. The duration when pre-masking applies is less than -or as newer results indicate, significantly less than-one tenth that of the post-masking, which is in the order of 50 to 200 msec. Both pre and post-masking are being exploited in the ISO/MPEG audio coding algorithm.

: It uses either a separate [[filterbank]] or combines the calculation of energy values (for the masking calculations) and the main filter bank. The output of the perceptual model consists of values for the masking threshold or the allowed noise for each coder partition. If the quantization noise can be kept below the masking threshold, then the compression results should be indistinguishable from the original signal.

;[[ATH]]

;[[Masking]] threshold
: Masking raises the threshold of hearing, and compressors take advantage of this effect by raising the noise floor, which allows the audio waveform to be expressed with fewer bits. The noise floor can only be raised at [[frequency|frequencies]] at which there is effective masking.

: The equal widths of the [[subband]]s do not accurately reflect the human auditory system's frequency dependent behavior. The width of a "[[critical band]]" as a function of frequency is a good indicator of this behavior. Many psychoacoustic effects are consistent with a critical band frequency scaling. For example, both the perceived loudness of a signal and its audibility in the presence of a masking signal is different for signals within one critical band than for signals that extend over more than one critical band. Figure 2 compares the polyphase filter [[bandwidth]]s with the width of these critical bands. At lower frequencies a single subband covers several critical bands.

==== Simplified overview of the psychoacoustic model ====
* Perform a 1024-sample [[FFT]]s on each half of a frame (1152 samples) of the input signal, selecting the lower of the two masking thresholds to use for that subband.
* Each frequency bin is mapped to its corresponding critical band.
* Calculate a tonality index, a measure of whether a signal is more tone-like or noise-like.
* Use a defined spreading function to calculate the masking effect of the signal on neighbouring [[critical band]]s.
* Calculate the final masking threshold for each subband, using the tonality index, the output of the spreading function, and the [[ATH]].
* Calculate the signal-to-mask ratio for each [[subband]], and passes information on to the [[quantize|quantizer]].

==== More detailed overview the psychoacoustic model====
The MPEG/audio algorithm compresses the audio data in large part by removing the acoustically irrelevant parts of the audio signal. That is, it takes advantage of the human auditory system's inability to hear quantization noise under conditions of auditory masking. This masking is a perceptual property of the human auditory system that occurs whenever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible. A variety of psychoacoustic experiments corroborate this masking phenomenon.

Empirical results also show that the human auditory system has a limited, [[frequency]] dependent, resolution. This frequency dependency can be expressed in terms of critical band widths which are less than 100Hz for the lowest audible frequencies and more than 4kHz at the highest. The human auditory system blurs the various signal components within a critical band although this system's frequency selectivity is much finer than a critical band.

The psychoacoustic model analyzes the audio signal and computes the amount of noise [[masking]] available as a function of frequency. The masking ability of a given signal component depends on its frequency position and its loudness. The encoder uses this information to decide how best to represent the input audio signal with its limited number of code bits. The MPEG/audio standard provides two example implementations of the psychoacoustic model.

Below is a general outline of the basic steps involved in the psychoacoustic calculations for either model. Differences between the two models will be highlighted.

* Time align audio data. There is one psychoacoustic evaluation per frame. The audio data sent to the psychoacoustic model must be concurrent with the audio data to be coded. The psychoacoustic model must account for both the delay of the audio data through the [[filterbank]] and a data offset so that the relevant data is centered within the psychoacoustic analysis window.
* Convert audio to a [[frequency]] domain representation. The psychoacoustic model should use a separate, independent, time-to-frequency mapping instead of the polyphase filter bank because it needs finer frequency resolution for an accurate calculation of the masking thresholds.

Layer II and III use a 1,152 sample frame size so the 1,024 sample window does not provide complete coverage. While ideally the analysis window should completely cover the samples to be coded, a 1,024 sample window is a reasonable compromise. Samples falling outside the analysis window generally will not have a major impact on the psychoacoustic evaluation.

For Layers II and III, the model computes two 1,024 point psychoacoustic calculations for each frame. The first calculation centers the first half of the 1,152 samples in the analysis window and the second calculation centers the second half. The model combines the results of the two calculations by using the higher of the two signal-to-mask ratios for each [[subband]]. This in effect selects the lower of the two noise masking thresholds for each subband.

* Process spectral values in groupings related to critical band widths. To simplify the psychoacoustic calculations, both models process the frequency values in perceptual quanta.

Psychoacoustic model 2 never actually separates tonal and non-tonal components. Instead, it computes a tonality index as a function of frequency. This index gives a measure of whether the component is more tone-like or noise-like. Model 2 uses this index to interpolate between pure tone-masking-noise and noise-masking-tone values. The tonality index is based on a measure of predictability. Model 2 uses data from the previous two analysis windows to predict, via linear extrapolation, the component values for the current window. Tonal components are more predictable and thus will have higher tonality indices. Because this process relies on more data, it is more likely to better discriminate between tonal and non-tonal components than the model 1 method.

* Apply a spreading function. The [[masking]] ability of a given signal spreads across its surrounding [[critical band]]. The model determines the noise masking thresholds by first applying an empirically determined masking (model 1) or spreading function (model 2) to the signal components.

* Set a lower bound for the threshold values. Both models include an empirically determined absolute masking threshold, the threshold in quiet. This threshold is the lower bound on the audibility of sound.

* Find the masking threshold for each [[subband]]. Model 2 selects the minimum of the masking thresholds covered by the subband only where the band is wide relative to the critical band in that [[frequency]] region. It uses the average of the masking thresholds covered by the subband where the band is narrow relative to the critical band. Model 2 is not less accurate for the higher frequency subbands because it does not concentrate the non-tonal components.

* Calculate the signal-to-mask ratio. The psychoacoustic model computes the signal-to-mask ratio as the ratio of the signal energy within the subband (or, for Layer III , a group of bands) to the minimum masking threshold for that subband. The model passes this value to the bit (or noise) allocation section of the encoder.

==== Model 2 technical details ====


The psychoacoustic model calculates just-noticeable distortion (JND) profiles for each band in the [[filterbank]]. This noise level is used to determine the actual quantizers and quantizer levels. There are two psychoacoustic models defined by the standard. They can be applied to any layer of the MPEG/Audio algorithm. In practice however, Model 1 has been used for Layers I and II and Model 2 for Layer III. Both models compute a signal-to-mask ratio (SMR) for each band (Layers I and II) or group of bands (Layer III).

The more sophisticated of the two, Model 2, will be discussed. The steps leading to the computation of the JND profiles is outlined below.

;1. Time-align audio data

The psychoacoustic model must estimate the [[masking]] thresholds for the audio data that are to be [[quantize|quantized]]. So, it must account for both the delay through the filterbank and a data offset so that the relevant data is centered within the psychoacoustic analysis window. For the Layer III algorithm, time-aligning the psychoacoustic model with the filterbank demands that the data fed to the model be delayed by 768 samples.

;2. Spectral analysis and normalization.

A high-resolution spectral estimate of the time-aligned data is essential for an accurate estimation of the masking thresholds in the [[critical band]]s. The low frequency resolution of the filterbank leaves no option but to compute an independent time-to-frequency mapping via a fast Fourier Transform ([[FFT]]). A Hanning window is applied to the data to reduce the edge effects of the transform window.

Layer III operates on 1152-sample data frames. Model 2 uses a 1024- point window for spectral estimation. Ideally, the analysis window should completely cover the samples to be coded. The model computes two 1024-point psychoacoustic calculations. On the first pass, the first 576 samples are centered in the analysis window. The second pass centers the remaining samples. The model combines the results of the two calculations by using the more stringent of the two JND estimates for bit or noise allocation in each [[subband]].

Since playback levels are unknown3, the sound-pressure level (SPL) needs to be normalized. This implies clamping the lowest point in the absolute threshold of hearing curves to +/- 1-bit [[frequency|amplitude]].

;3. Grouping of spectral values into threshold calculation partitions.

The uniform [[frequency]] decomposition and poor selectivity of the filterbank do not reflect the response of the BM. To accurately model the masking phenomenon characteristic of the BM, the spectral values are grouped into a large number of partitions. The exact number of threshold partitions depends on the choice of sampling rate. This transformation provides a resolution of approximately either 1 FFT line or 1/3 critical band, whichever is smaller. At low frequencies, a single line of the FFT will constitute a partition, while at high frequency|frequencies many lines are grouped into one.

;4. Estimation of tonality indices.

It is necessary to identify tonal and non-tonal (noise-like) components because the masking abilities of the two types of signals differ. Model 2 does not explicitly separate tonal and non-tonal components. Instead, it computes a tonality index as a function of frequency. This is an indicator of the tone-like or noise-like nature of the spectral component. The tonality index is based on a measure of predictability. Linear extrapolation is used to predict the component values of the current window from the previous two analysis windows. Model 2 uses this index to interpolate between pure tone-masking-noise and noise-masking-tone values. Tonal components are more predictable and thus have a higher tonality index. As this process has memory, it is more likely to discriminate better between tonal and non-tonal components, unlike psychoacoustic Model 116.

;5. Simulation of the spread of masking on the BM.

A strong signal component affects the audibility of weaker components in the same critical band and the adjacent bands. Model 2 simulates this phenomenon by applying a Spreading function to spread the energy of any critical band into its surrounding bands. On the [[Bark]] scale, the spreading function has a constant shape as a function of partition number, with slopes of +25 and –10 dB per Bark.

;6. Set a lower bound for the threshold values.

An empirically determined absolute [[masking]] threshold, the threshold in quiet, is used as a lower bound on the audibility of sound.

;7. Determination of masking threshold per [[subband]].

At low [[frequency|frequencies]], the minimum of the masking thresholds within a subband is chosen as the threshold value. At higher frequencies, the average of the thresholds within the subband is selected as the masking threshold. Model 2 has the same accuracy for the higher subbands as for low frequency ones because it does not concentrate non-tonal components16.

;8. [[Pre echo]] detection and window switching decision.

;9. Calculation of the signal-to-mask ratio (SMR).

SMR is calculated as a ratio of signal energy within the subband (for Layers I and II) or a group of subbands (Layer III) to the minimum threshold for that subband. This is the final output of the psychoacoustic model.

The masking threshold computed from the spread energy and the tonality index.

== Pros and cons ==
=== Pros ===
* Widespread acceptance, support in nearly all hardware audio players and devices
* An [[ISO]] standard, part of MPEG specs
* Fast decoding, lower complexity than [[Advanced Audio Coding|AAC]] or [[Vorbis]]
* Anyone can create their own implementation (Specs and demo sources available)
* Relaxed licensing schedule

=== Cons ===
* Lower performance/efficiency than modern codecs.
* Problem cases that trip out all transform codecs.
* Sometimes, maximum bitrate (320kbps) isn't enough.
* Unusable for high definition audio (sampling rates higher than 48kHz).

== See also ==
=== Techniques used in compression ===
* [[Huffman coding]]
* [[Quantization]]
* [[Joint stereo|M/S matrixing]]
* [[Intensity stereo]]
* [[Channel coupling]]
* Modified discrete cosine transform ([[MDCT]])
* Polyphase filter bank

There is a non-standardized form of MP3 called [[MP3Pro]], which takes advantage of [[SBR]] encoding to provide better quality at low bitrates.

=== Encoders/decoders (supported platforms) ===
* [[LAME]] (Win32/Posix)
* [[Audioactive]] (Win32)
* [[Blade]] (Win32/Posix)
* [[Xing]] (Win32)
* [[Gogo]] (Win32/Posix)

=== Metadata (tags) ===
* [[ID3v1]]
* [[ID3v1.1]]
* [[ID3v2]]

== Further reading and bibliography ==
* [[Best MP3 Decoder]]

== External links ==
* [http://web.archive.org/web/20070113015413/http://www.rjamorim.com/test/mp3-128/results.html Roberto's listening test] featuring MP3 encoders
* [http://en.wikipedia.org/wiki/Mp3 MP3 at Wikipedia]

[[Category: Codecs]]
[[Category: Lossy]]