Hydrogenaudio Knowledgebase - User contributions [en]

Revised ReplayGain specification

2014-06-17T04:08:01Z

Notat: Move player requirements to bottom. BS.1770 recommendation for loudness measurement.

''This is a proposed update to the [[ReplayGain 1.0 specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The original [http://wiki.hydrogenaudio.org/index.php?title=Replaygain ReplayGain 1.0 specification] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-3<ref>http://www.itu.int/rec/R-REC-BS.1770-3-201208-I/en</ref> and was published in August 2012. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770-3 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by RG1. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG1 specification.

The computation required for BS.1770-3 loudness measurement is reduced compared to the RG1 technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG1 algorithm in modelling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

RG2 uses BS.1770-3 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG2 follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

RG1 is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

The loudness measurement of the RG1 reference signal is -18 LUFS. In order to maintain backwards compatibility with RG1, RG2 uses a -18 LUFS reference.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Replay gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2014-05-27T00:20:31Z

Notat: /* Metadata format */ bold format for WMA

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2014-05-27T00:17:40Z

Notat: revert some non-clarifying changes

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - In Extended Content Description Object, Vorbis Comment ASCII values
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Revised ReplayGain specification

2012-12-04T02:36:35Z

Notat: some comments on new revisions

''This is a proposed update to the [[ReplayGain 1.0 specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

ReplayGain 2.0 (RG2) is a proposed update to the [http://wiki.hydrogenaudio.org/index.php?title=Replaygain ReplayGain 1.0 (RG1) specification] from 2001 originally published by David Robinson. RG2 features an updated loudness measurement technique which better simulates human auditory perception of digital music. This improved measurement enables more accurate gain adjustment during playback to achieve better perceived consistent loudness when listening to digital music from different albums and sources. 
This proposed RG2 specification includes: 
:*a way to measure and calculate the apparent loudness of a given track or album ([[#Loudness Measurement]])
:*definition of appropriate reference level and ideal gain adjustment during playback ([[#Reference Level and Gain]])
:*practical considerations for loudness level in real world use ([[#Pre-amplification]])
:*a way to prevent clipping when the calculated replay gain exceeds the limits of digital audio ([[#Clipping Prevention]])
:*a method for the user to specify album vs. track gain adjustment at playback ([[#Track or Album Gain Adjustment]])
:*description of how replay gain information is stored within audio files (#[[Metadata]])
 Note to End Users: Any software program or device which supports playback of original ReplayGain (RG1) scanned tracks should also play ReplayGain 2 (RG2) scanned tracks at the intended RG2 loudness level. More information for end users on music scanning and playback can be found in the general Wikipedia page for ReplayGain 2.

==Loudness Measurement==

The original [http://wiki.hydrogenaudio.org/index.php?title=Replaygain ReplayGain 1.0 specification] relied on the Root Mean Square (RMS) method of calculation to perform (a now more primitive form of) loudness measurement. ReplayGain 2.0 loudness measurement and calculation is based on the [http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-2-201103-S!!PDF-E.pdf ITU BS. 1770-2 standard] as defined by the [http://www.itu.int/en/Pages/default.aspx International Telecom Union] (ITU) for broadcast audio. Details about the loudness measurement algorithm, rationale, and supporting data were [http://www.itu.int/rec/R-REC-BS.1770-2-201103-S/en published by the ITU] in March 2011. In summary, BS. 1770-2 uses the K-weighting system to model human perceived loudness of audio.

The ITU BS. 1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It was given a minor update, published as [http://www.itu.int/rec/R-REC-BS.1770-3-201208-I/en BS. 1770-3] in August 2012, but the change had no effect on the Replay Gain 2.0 specification or loudness measurement for digital music.

==Reference Level and Gain==

The initial ReplayGain 1.0 in 2001 specification adapted the [http://www.smpte.org/ SMPTE] standard reference level for movie sound of 83db SPL (-20 dbFS), and was subsequently updated to target 89db SPL (-14 dbFS) by David Robinson. The ReplayGain 2.0 target reference level is based on the research and findings of Martin Wolters, Harald Mundt, and Jeffrey Riedmiller of [http://www.dolby.com Dolby Laboratories] as [http://www.dolby.com/uploadedFiles/Assets/US/Doc/Professional/AES128-Loudness-Normalization-Portable-Media-Players.pdf published in May 2010] for the [http://www.aes.org Audio Engineering Society]. Since RG2 is based on [http://www.itu.int/rec/R-REC-BS.1770-2-201103-S/en ITU BS. 1770-2] which specifies a target loudness reference of -23 LKFS, the calculation to determine the gain of ReplayGain is as follows (from page 11 of the [http://www.dolby.com/uploadedFiles/Assets/US/Doc/Professional/AES128-Loudness-Normalization-Portable-Media-Players.pdf 2010 Dolby paper]): 
<blockquote style="background-color: lightgrey; border: solid thin grey;">
Therefore the following reversible conversion between Replay Gain and ITU-based loudness is proposed: 
RG = -18db - L 
where RG is the estimated Replay Gain and L is the Loudness according to ITU-R BS.1770 in LKFS (dB relative full scale).
</blockquote>
In general, the adjusted playback loudness using RG2 will be lower than RG1, but will have better perceived loudness consistency from track to track.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

==Pre-amplification==
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be -23db LKFS, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 18 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

==Clipping prevention==
ITU BS. 1770-2 suggestion of a -23 LKFS average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

==Track or Album Gain Adjustment==
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

The updated ReplayGain 1.0 specification reflecting recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Revised ReplayGain specification

2012-10-08T22:00:24Z

Notat:

''This is a proposed update to the [[ReplayGain 1.0 specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Talk:Revised ReplayGain specification

2012-10-08T21:46:19Z

Notat: new source

==Improvement discussion threads==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=15445 Improving ReplayGain, some ideas for Devs etc]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=89841 ReplayGain2, ReplayGain2 proposal]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85614 ReplayGain album gain problem]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=84769 ReplayGain when converting 5.1 to 2]

===R128===
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85978 R128GAIN: An EBU R128 compliant loudness scanner]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=86116 libebur128 - (yet another) EBU R 128 implementation]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=86424 R128 versus ReplayGain, The cage match begins here.]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=88498 ReplayGain: Foobar2000 results differ from MP3Gain and MetaFLAC ones]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=88778 replaygain and R 128]

==External resources==
*[http://www.dolby.com/uploadedFiles/Assets/US/Doc/Professional/AES128-Loudness-Normalization-Portable-Media-Players.pdf Loudness Normalization in the Age of Portable Media Players]
*[http://music-loudness.com/PDFs/Loudness_Alliance_White_Paper_final_v1.pdf Loudness Normalization: The Future of File-Based Playback]

Revised ReplayGain specification

2012-10-08T21:42:33Z

Notat:

''This is a proposed update to the [[ReplayGain 1.0 specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Revised ReplayGain specification

2012-10-08T21:42:02Z

Notat: under construction

''This is a proposed update to the [[ReplayGain 1.0 specification]]. This proposal is currently ''''Under Construction''''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Talk:Revised ReplayGain specification

2012-01-25T00:12:02Z

Notat: /* External resources */ free link

Talk:Revised ReplayGain specification

2012-01-24T23:51:24Z

Notat: links to threads

==Improvement discussion threads==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=15445 Improving ReplayGain, some ideas for Devs etc]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=89841 ReplayGain2, ReplayGain2 proposal]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85614 ReplayGain album gain problem]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=84769 ReplayGain when converting 5.1 to 2]

===R128===
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85978 R128GAIN: An EBU R128 compliant loudness scanner]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=86116 libebur128 - (yet another) EBU R 128 implementation]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=86424 R128 versus ReplayGain, The cage match begins here.]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=88498 ReplayGain: Foobar2000 results differ from MP3Gain and MetaFLAC ones]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=88778 replaygain and R 128]

==External resources==
*[http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization in the Age of Portable Media Players]

Talk:Original ReplayGain specification

2012-01-24T23:50:23Z

Notat: links to threads

== Musepack ==

Hi, there is an inaccuracy about Musepack files. Although they use APEv2 for metadata, replaygain is stored in the file header by specification, see [http://trac.musepack.net/trac/wiki/SV8Specification here]. Actually, this is the first format introducing APEv2 tags and native replaygain support.
So, every musepack compliant player must read the RG data from the header rather then APEv2.
[[User:Antonski|Antonski]] 14:19, 4 May 2011 (UTC)

==Development discussion threads==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=1709 Flaw in ReplayGain spec]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=62374 Replay Gain Site, Why does it look like a museum?]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=83397 Does Replay gain work differtly in Media monkey, Foobar and Media Monkey given 2 differnt Results]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85536 Replay Gain specification, update in progress]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=69568 ReplayGain equal loudness filter]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85834 Replay Gain tagging, ID3, LAME, Others?]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=86745 ReplayGain player recommendations]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=87442 ReplayGain specification complete, official launch 25 March proposed]

Topic Index

2011-07-27T17:30:16Z

Notat: Replay Gain -> ReplayGain

* For a more structured 'table of contents', use the '''[[Main Page#Categories|Categories List]]'''.
* Please see [http://www.hydrogenaudio.org/forums/index.php?showtopic=12979&st=25&p=247441&#entry247441 this thread] for a discussion of the future structure of this wiki. If you have thoughts, comments, suggestions, etc., please join in this discussion. In the meantime, please feel free to fill in gaps in the information below.
* See also [http://www.hydrogenaudio.org/forums/index.php?showtopic=28658 the style related discussion thread] in the forums.

= General Information =
== General Guides ==
* [[Create a long-term archive]]
* [[Secure ripping|Secure Ripping]]
* [[Enabling DMA]]
* [[Choosing_the_best_codec.|Choosing the best codec]]
* [[Lossless_comparison|Lossless Comparison]]

== EAC Guides ==
* Configuring [[EAC Drive Configuration|EAC and CD-ROM Drives]]
* Configuring [[EAC and Lame]]
* Configuring [[EAC and AAC | EAC and Nero AAC]]
* Configuring [[EAC and Ogg Vorbis | EAC and Vorbis]]
* Configuring [[EAC and Musepack]]
* Configuring [[EAC and WavPack]]
* Configuring [[EAC and FLAC]]
* Configuring [[EAC and Monkey's Audio]]
* Configuring [[EAC and Cue Sheets]]
* Configuring EAC and [[REACT]]

== CDex Guides ==
* Configuring [[CDex Drive Configuration|CDex and CD-ROM Drives]]
* Configuring [[CDex and FLAC]]

== AAC Guides ==
* [[AAC_FAQ|AAC FAQ]] frequently asked questions in reguard to AAC the latest industry standard.
* [[AAC encoders|AAC Encoders]] known AAC encoder/decoder implementations and configuring them (Apple Itunes, Nero AAC, etc)
* [[Linux and Nero AAC]] a short guide for configuring Nero AAC encoder to run under Linux.

== Vorbis Guides ==
* [[Recommended_Ogg_Vorbis|Recommended encoders and settings for Vorbis]].
* [[Lancer|Ogg Vorbis Acceleration Project]] information reguarding optimized Vorbis binaries.
* [[OggDropXPd|OggDropXPd]] guide for encoding with John 33's popular drag-n-drop frontend.
* [[Compiling_aoTuV|Compiling AoTuV]] compiling the AoTuV binaries under Linux.

= Audio Codecs =
== [[Lossy]] ==
* [[Advanced Audio Coding]] (AAC)
* [[AC3]]
* [[ATRAC3]]
* [[DTS]]
* [[MP2]]
* [[MP3]]
* [[Musepack]] (MPC, MP+)
* (Ogg) [[Vorbis]]
* [[QDesign]]
* [[VQF]]
* [[Windows Media Audio]] (WMA)

== [[Lossless]] ==
* [[ALAC|Apple Lossless]]
* [[ALS|Audio Lossless Coding]]
* [[DTS-HD|DTS Master Audio]]
* [[Free Lossless Audio Codec]] (FLAC)
* [[Lossless Audio]] (LA)
* [[Lossless Predictive Audio Compression]] (LPAC)
* [[Monkey's Audio]]
* [[OptimFROG]]
* [[Lossless comparison#RealAudio Lossless|RealAudio Lossless]]
* [[Shorten]]
* [[TTA|True Audio]]
* [[WavPack]]
* [[Windows Media Audio|WMA Lossless]]

= [[Metadata]] (Tags) =
* [[APEv1]]
* [[APEv2]]
* [[ID3v1]]
* [[ID3v1.1]]
* [[ID3v2]]
* [[Vorbis Comment]]

= Media Extractors =
== CD Extractors ==
* [[Audiograbber]] (Win32)
* [[CDex]] (Win32)
* [[cdparanoia]] (Posix)
* [[dBpowerAMP with AccurateRip]] (Win32)
* [[Exact_Audio_Copy|Exact Audio Copy]] (Win32)
* [[Grip]] (Posix)
* [[iTunes]] (Win32/Mac OS/X)
* [[MediaMonkey]] (Win32)
* [[Max]] (Mac OS/X)
* [[XLD]] (Mac OS/X)
* [[PlexTools]] (Win32)
* [[Rubyripper]] (Posix/Mac OS/X)

== DVD Extractors ==
* [http://pessoal.onda.com.br/rjamorim/SetupDVDDecrypter_3.5.4.0.exe DVD Decrypter] (Win32)
* DVD-A / CPPM Decrypter (Win32/Posix)

= Media Players =
== Windows ==
* [[Apollo]]
* [[dBpowerAMP]]
* [[Foobar2000:Foobar2000|foobar2000]]
* [[iTunes]]
* [[MediaMonkey]]
* [[musikCube]]
* [[Quintessential Player]]
* [[VUplayer]]
* [[Winamp]]
* [[Windows Media Player]]
* [[wxMusik]]
* [[XMPlay]]
* [[WMPTSE]] (with WMP)

== Linux/BSD ==
* [[Amarok]]
* [[BMP]]
* [[JuK]]
* [[LAMIP]]
* [[Muine]]
* [[Music Player Daemon (MPD)]]
* [[Quod Libet]]
* [[Rhythmbox]]
* [[wxMusik]]
* [[XMMS]]

== Mac OS X (Non-BSD Specific) ==
* [[iTunes]]
* [[skiTunes]]
* [[Whamb]]

== Other ==
* [[CL-Amp]] (BeOS)

= Audio Editors =
== Windows ==
* [[Adobe Audition]] (previously known as ''Cool Edit'')
* [[Audacity]]
* [[Goldwave]]
* [http://www.sonymediasoftware.com/products/soundforgefamily.asp Sony Sound Forge] (Previously released by Sonic Foundry)

== Linux/BSD ==
* [[Ardour]]
* [[Audacity]]
* [[ReZound]]

== Mac OS X (Non-BSD Specific) ==
* [[Ardour]]
* [[Audacity]]

== Other ==
* [http://timidity.sourceforge.net/ Timidity++] (MIDI to PCM (WAV) converter) Timidity++ synthesizes MIDI files (sequences) in real-time using Gravis UltraSound Soundfont patches (loosly based upon Wavetable Synthesis) to common digital audio file formats such as, WAV, AU, AIFF, Ogg Vorbis, FLAC, etc. Useful for those who want to bypass FM Synthesizers on their sound card's to hear MIDI sequence as it was intended to be heard.)

= Testing Software =
== Subjective Perceptual ==
* [[ABC/HR]]
* [[PCABX]]

== Objective ==
''Note: Might be good to put something here about the problems of quality comparisons using graphs, frequency sweeps, etc.''

* [[EAQUAL]]
* [[Rightmark_Audio_Analyzer|Rightmark Audio Analyzer]]

= Audio Hardware =
== PC Audio ==
* [[Terratec EWX 24/96]]
* [[M-Audio Audiophile 24/96]]
* [[M-Audio Revolution 5.1]]
* [[M-Audio Revolution 7.1]]
* [[Chaintech AV-710]]
* [[E-MU 0404 24/192]]
* [[ASUS Xonar D1]]
* [[ASUS Xonar D2/PM]]

== Notebook Audio ==
* [[Echo Indigo IO 24/96]]

== Firewire ==
* [[E-MU 1212M 24/192]]
* [[M-Audio Firewire 410]]

== HiFi ==
* [[M-Audio Fast Track USB]]
* [[Slim Devices Squeezebox]]
* [[Slim Devices Transporter]]
* [[Hermstedt AG Hifidelio]]
* [[Olive Musica]]

== MIDI Interfaces ==
* M-Audio MIDISport Uno 1x1
* M-Audio MIDISport 2x2
* MOTU 5x5 Micro Lite
* MOTU Fastlane USB

== Digital Audio Players ==
=== Portable Flash ===
''(These players make use of a internal flash drive.)''
* [[Apple iPod]] Nano
* [[Apple iPod]] Shuffle
* Creative MuVo
* iRiver iFP Series
* MPIO lFP Series
* [[Rio Carbon]]

=== Portable HD ===
''(These players make use of a internal harddrive.)''
* [[Apple iPod]] with ''([http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iriver_H110_H115_H120_H140 Rockbox firmware])''
* [[Archos Jukebox with Rockbox Software]]
* [[Cowon iAudio]] with ''([http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iAudio_X5 Rockbox firmware])''
* [[iRiver H-Series]] with ''([http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iriver_H110_H115_H120_H140 Rockbox firmware])''
* [[MPIO H-Series]]
* [[Neuros]]
* [[Rio Karma]]
* [[Sandisk]] with ''([http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iAudio_X5 Rockbox firmware])''

=== Portable CD ===

=== Car Players ===
''(Car stereos that can read MP3, Vorbis, WMA, etc.).''
* [[Aiwa CDC-MP3]]
* [[Yakumo Ultrasound]]

===DVD Players===
* [[Neuston's Maestro DVX-1201]]

=== Firmware ===
* [[Rockbox]]

= Audio Theory =
== Analog Audio ==
* [[Tube Amplifiers]]
* [[Vinyl_Playback_and_Recording|Vinyl Audio]]

== Digital Audio ==
* [[Solid State Amplifiers]]
* [[ReplayGain]]

== Testing Methodology ==
* [[ABX]]
* [[EAQUAL]]

= Audio Development =
''note: Let's start with basic development tools (compilers, engineering tools, dev. libraries) until we think of more tools to add. I am also adding external links to books, tutorials, etc under resources.''

== Tools ==
* [http://www.mathworks.com/products/matlab/ MATLAB 7.0] commercial software for algorithmic design, developement, engineering, and scientific computing. (multi-platform support)
* [http://www.octave.org/ GNU Octave] open-source alternative software (GPL) to MATLAB for numerical computations, engineering, and scientific computing. (multi-platform support)
* [http://www.fftw.org/ FFTW] Is a C subroutine library for computing the Discrete Fourier transform (DFT) in one or more dimensions on real and complex inputs.
* [http://gcc.gnu.org/ GCC] THE GNU compiler collection for C, C++, Objective-C, Fortran, Java, and Ada.
* [http://www.gnu.org/software/emacs/emacs.html GNU Emacs] an extensible, customizable, self-documenting real-time display editor. Great for writing all types of source code especially on Unix. (multi-platform support)
* [http://www.bloodshed.net/devcpp.html DevCPP] free front-end IDE and compiler for the C and C++ languages. Delphi and C source code available. (Win 9x, NT, 2000, and XP)

== Resources ==
* [http://www.hydrogenaudio.org/forums/index.php?showforum=30 Scientific/R&D Forums] for Psychoacoustic, DSP, Electrical Engineering, theory, and coding related questions. (most questions are generally answered)
* [http://www.aes.org/ AES] The Audio Engineering Society website. Home of year-round world AES conferences.
* [http://www.dspguru.com/info/books/favor.htm DSP Tutorials] this site provides another good introduction in to the area of DSP.
* [http://www.musicdsp.org/archive.php?classid=2 Music-DSP] source-code archive for analysis, filters, effects and synthesis. (C, C++, and Java code)
* [http://www.itakura.nuee.nagoya-u.ac.jp/HRTF/ HRTF] A database of measurements and research papers on Head Related Transfer Functions for 3D-Audio. (PDF, Audio)
* [http://www.midi.org/about-midi/specshome.shtml MIDI Specifications] MIDI 1.0, the new MusicXMF specification, and SP-MIDI for third generation 3GPP mobile devices (PDF)
* [http://www.gamedev.net/reference/articles/article2008.asp OpenAL] a beginners tutorial on writing code using OpenAL for audio programming in computer games and other applications. (C, C++).
* [http://www.alsa-project.org/ ALSA Project] (Advanced Linux Sound Architecture) bringing audio and MIDI capabilities to Linux.
* [http://www.engmath.dal.ca/courses/engm6610/notes/notes.html A Really friendly guide to Wavelets] A good introduction to wavelets aimed towards engineer, requires a fair amount of background knowledge.

== Books/Research ==
* [http://www.amazon.com/gp/product/3540231595/qid=1135380559/sr=1-3/ref=sr_1_3/102-1730075-7300931?s=books&v=glance&n=283155 Psychoacoustics - Facts and Models] author's Zwicker, Fastl, and Hugo, revised 2005 third edition. The book for comprehensive psychoacoustics models and figures.
* [http://www.eas.asu.edu/~spanias/papers/paper-audio-tedspanias-00.pdf Perceptual Audio Coding] authors A. Painter and T. Spanias. A comprehensive paper on percepual audio coding (PDF)
* [http://www.amazon.com/gp/product/0780334493/103-2094923-9567001?v=glance&n=283155&%5Fencoding=UTF8&me=ATVPDKIKX0DER&no=283155&st=books Speech Communications Human and Machine] this book provides a good introduction to speech coding, inlcuding anaylsis, recognition, and perception. This text is a very good introduction for beginners.
* [http://www.dspguide.com/ Scientist and Engineer's Guide to DSP] author Steve Smith, a great guide for beginners new to the subject of DSP (free online text)(PDF)
*[http://www.amazon.com/exec/obidos/tg/detail/-/0792391810/ref=ase_theinternetdatac/103-9882844-5344648?v=glance&s=books Vector Quantization] authors Gersho and Gray. Good read for understanding how VQ and arithmetic coding work.

= Audio Resources =
== Websites ==
''Note: Let's include a small description to the side for now, so that we have something to work with when this section becomes large enough for its own page''

* http://www.audiocoding.com (Page with a wiki on technical audio topics, homepage of FAAC and FAAD2, also has an AAC forum.)
* http://www.ff123.net (Lots of general information on various MP3 implementations, test samples, testing methodology information, homepage of ABC/HR)
* http://www.head-fi.org (general information/board about head phones and portable audio players)
* http://www.rarewares.org (Downloads for many audio and media tools)
* http://www.rjamorim.com/rrw/ (Download old versions of foobar2000 and other audio and media tools)
* http://www.rockbox.org/ (Open-source jukebox firmware for numerous DAP and architectures, GNU/GPL License).
* http://www.dapreview.net/ (Reviews of some of the most popular digital audio players out there)
* http://www.anythingbutipod.com/ (Thorough reviews of some of the most popular digital audio players out there)

== Articles/Debates ==
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=31759&st=0 DVD-A vs. SACD debate]
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=38041&st=0 Subjective vs. Objective testing]
* [http://www.ambisonic.net/pdf/ambidvd2001.pdf 5.1 surround vs. Ambisonics comparison]

== Listening Tests ==
* [http://www.rjamorim.com/test/ Roberto's listening tests]
* [[Listening_Tests|Inventory of several listening tests, mainly on HA.org]]

= Other Topics =
== Video ==
* [[MPEG-4 Visual]]
* [[Real Video]]
* [[Theora]]
* [[Tarkin]]
* [[Snow]]
* [[VP6]]
* [[Windows Media Video]]

== [[Container format]]s ==
* [[ASF]]
* [[AVI]]
* [[Matroska]]
* [[MOV]]
* [[MP4]]
* [[Ogg]]

= Glossary =
* [[Glossary_Of_Audio_Terms|Glossary of Audio Terms]]

= Introduction & User Guides =
''A starting place for new users to audio, with guides to compression and CD ripping and a glossary of all common terms.''

* [[Glossary Of Audio Terms]]
* [[FAQ]]
* [[Audio format guide]]
* Ripping Guides
** [[EAC]] (Win32)
** [[CDex]] (Win32)
** [[DBpowerAMP with AccurateRip]] (Win32)
** [[Plextools]] (Win32)
** [[Max]] (Mac OS/X)
** [[XLD]] (Mac OS/X)
** [[Rubyripper]] (Posix/Mac OS/X)
* [[Tagging]]
* [[ReplayGain]]



= Audio Codecs =
''Pros/cons, Recommended settings, Useful tools, etc.''

*'''[[:Category:Codecs|The Technical/Codecs Category]]'''


= Container Formats =
''What is a [[container format]]?''

* [[Matroska]]
* [[MP4]]
* [[Ogg]]


= Audio Hardware & CD Ripping =
*''CD Tools, Secure Ripping, Soundcard Quality''
** [[Secure ripping]]
** Ripping Guide
*** [[EAC]]
*** [[CDex]]
*** [[DBpowerAMP with AccurateRip]]
*** [[Plextools]]
** [[CD copy protection]]
** [[CD Hardware]]
* Vinyl records and turntables
** [[Introduction to Vinyl|Introduction]]
** [[Advantages of Vinyl]]
** [[Disadvantages of Vinyl]]
** [[Vinyl Myths]]
** [[Purchasing Vinyl LPs and Components|Purchasing]]
** Record Player Components
*** [[Turntable]]
*** [[Cartridge]]
*** [[Phono preamplifier]]
** [[Evaluating Vinyl Sound Quality]]
** [[Vinyl Playback and Recording|Playback and Recording]]
** [[Vinyl Maintenance|Maintenance]]
** [[Vinyl Forum Posts and FAQs|FAQs]]
** [[Vinyl Glossary|Glossary]]
** [[Vinyl Links|Links]]
** [[Vinyl Mastering|Mastering]]
* [[Soundcard|Soundcards]]
* [[Other hardware]]



= Tests =
* [[EAC Vs CDex SecureMode]] (by Pio2001)
* [[EAC Vs CDex SecureMode II]] (by westgroveg)
* [[Listening Tests]]


=Downloads=
''Where to obtain the software discussed in HAK.''

* [[Download page]]


= Using HAK =
* [[Help:Contents|Wiki User Guide]]
* Play around at the [[Hydrogenaudio Knowledgebase:Sandbox|Sandbox]] to try your formatting skills. Everything goes here and everything can/may be deleted.
* Contributors should read [[Help:Editing|editing help]].

TAK

2011-07-27T17:29:03Z

Notat: Replay Gain -> ReplayGain

{{Codec Infobox
| name = Tom's lossless Audio Kompressor
| logo =
| type = lossless
| purpose = lossless audio compression.
| maintainer = Thomas Becker
| recommended_encoder = TAK encoder
| recommended_text = TAK v1.1.0
| website = [http://thbeck.de/Tak/Tak.html ThBeck.de/Tak/Tak.html] ''(german)''
}}

== Description ==
'''Tom's lossless Audio Kompressor''' ('''TAK''') is a lossless audio compressor which promises compression performance similar to [[Monkey's Audio]] “High” and decompression speed similar to [[Free Lossless Audio Codec|FLAC]].

=== Features ===
* High compression
* Fast compression and decompression speed
* Streaming support (necessary headers for decompressing the audio are written to the stream every 2 seconds)
* Piping support for encoding
* Error tolerance (single bit error will never affect more than 250 ms)
* Error detection (each frame protected by a 24-bit checksum (CRC))
* High-resolution (up to 24-bit/channel) audio support
* Support for up to 192 Khz Audio
* Seeking without seek table
* APEv2 tags supported at end of file

=== Pros ===
* Fast encoding speed (while providing better compression TAK encodes as fast as [[Free Lossless Audio Codec|FLAC]] -8 in TAK's “Insane” and several times faster in “Turbo” mode)
* Fast decompression speed (on par with FLAC / [[WavPack]])
* Good compression levels (on par with [[Monkey's Audio]] High)
* Error Robustness
* Fast Seeking

=== Cons ===
* Closed Source (at the moment)
* No hardware support
* Very limited software support (playback: [[Winamp]] & [[foobar2000]] plugins, tagging: Mp3Tag)

== Hardware and Software That Support TAK ==
=== Hardware ===
* None

=== Software ===
==== Windows ====
* offical TAK Applications v1.1.0 (Applications, Winamp plugin, SDK, Decoding library) [http://www.hydrogenaudio.org/forums/index.php?showtopic=68456&st=0 here]
* foo_input_tak, TAK decoder for [[foobar2000]] [http://foosion.foobar2000.org/components/ here] (supports tagging and [[ReplayGain]])
* [[Mp3tag]] – universal tag editor with support for TAK
* [http://etree.org/shnutils/shntool/ shntool] (since version 3.0.6)

==== Linux ====
* The TAK reference applications (GUI as well as commandline) are known to run on Linux via Wine.

== Recommended Settings ==
* Default compression: “-p2” (formerly ''Normal'') is the most attractive setting, providing an excellent compromise between compression and encoding speed. (At compression levels close to [[Monkey's Audio]] High (<0.4% difference), it is able to encode more quickly.)
takc -e [input file]
* Highest compression: “-pMax” (same as -p5m) (This will create files which are comparable in size to file created using [[Monkey's Audio]] High. Decompression speed is comparable to [[WavPack]] Normal.)
takc -e -pMax [input file]
* Fastest compression: “-p0” (This will create files which are comparable in size to [[Monkey's Audio]] Fast or [[WavPack]] High. Decompression speed is comparable to [[Free Lossless Audio Codec|FLAC]] 0.)
takc -e -p0 [input file]

=== TAK Performance Graph ===
[[Image:TAK_performance_graph_1-0-4.png|frame|center|Graph showing encoding and decoding rate against compression, using data from Synthetic Soul's test on TAK 1.0.4 (see [[TAK#External Links|External Links]])]]

== Using TAK ==
=== TAK with [[foobar2000]] ===
* Copy the takc.exe to your [[foobar2000]] directory
* Go to File → Preferences → Tools → Converter
* Set it up as shown:
[[Image:Tak_foobar_converter.png|frame|center|Screenshot of foobar 0.9.5 Converter settings for TAK 1.0.3]]
'''Note:''' replace -p2 with the desired compression level.

* TAK introduced encoding from STDIN in version 1.0.3, eliminating the need for a temporary file and greatly improving overall compression time. If you are using an earlier version of TAK use the following command line instead:
-e -p2 %s %d
* Use [[APEv2 specification|APEv2]] tagging (will be used as internal tagging)

=== TAK with EAC ===
Please read the [[EAC and TAK|wiki guide]], which details how to create TAK files with [[Exact Audio Copy|EAC]].

== Future Features ==
* Unicode support
* MD5 audio checksums for verification and identification
* A German version
* Embedded cue sheets
* Embedded cover art
* Multichannel audio

== Frequently Asked Questions ==
; Is the codec safe for use?
: Yes. To check, convert a WAVE to TAK and back and compare the two (or use foobar's bitcompare tool).
; Why should I use TAK?
: TAK offers high compression ratios with great decoding rates.
; What can I compress with TAK?
: TAK 1.0 can compress any integer-format (up to 24 bits per channel) PCM RIFF WAVE file (.wav). Piping support as of v1.0.3 is implemented, so converting lossless files to WAV first is not necessary.
; What about hardware support?
: None at the moment. Although, ''-p0'', ''-p1'' and ''-p2'' are the candidates for hardware playback.
; When will the source be opened?
: Yes, TAK will be open-source, as soon as the code is ported to C or C++ and documented. However, Thomas has mentioned that he would like to improve the codec before opening the source.

== External Links ==
* [http://thbeck.de/Tak/Tak.html thbeck.de/Tak/Tak.html] – Official Website ''(german)''
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=68454 TAK 1.1.0 Release Announcement / Discussion Thread on HA] ''(english)''
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=68456&st=0 TAK 1.1.0 Downloads]
* [http://synthetic-soul.co.uk/comparison/lossless/ synthetic-soul.co.uk/comparison/lossless] – Comparison with Other Codecs (by Synthetic Soul)
* [http://flac.sourceforge.net/comparison.html flac.sourceforge.net/comparison.html] – An Updated Comparison (from FLAC Homepage)

[[Category:Lossless]]
[[Category:Encoder/Decoder]]

Foobar2000:Components/Winamp DSP Bridge (foo dsp winamp)

2011-07-27T17:28:44Z

Notat: Replay Gain -> ReplayGain

{{fb2k}}
[[Category:Foobar2000 3rd-Party Components|Winamp DSP Bridge (foo dsp winamp)]]
= Description =

Allows the use of Winamp DSP plugins.

== Usage ==

# Choose "Winamp plugins path" with "Browse" button;
# Click "Rescan" button;
# Set bit-depth in "Fixed point conversion parameters"([[Foobar2000:Components_0.9/Winamp_DSP_Bridge_%28foo_dsp_winamp%29#Sound_quality_issues|details]]);
# Choose one of Winamp plugins from "Plugin list";
# Click "Show interface window" to show plugin settings window (if available);
Plugin setting window also available from Foobar2000 main menu: choose "Show Winamp DSP window" from "View" menu;

==Known bugs and limitations==
* Version 1.4.1 - 1.4.4 startup crashes([[Foobar2000:Components_0.9/Winamp_DSP_Bridge_%28foo_dsp_winamp%29#How_to_avoid_startup_crash|how to avoid]]);
* Supports only Winamp 2.0 compatible plugins;
* Doesn't support plugins with ''Pitch control'' and ''Speed control'' functionality;

==Sound quality issues==
Due to difference in Foobar and Winamp architecture (Foobar has floating point audio chunks, while Winamp has fixed point ones), floating point to fixed point conversion (and vice versa) is necessary.

Conversion Bit-depth parameter can be set to:
* 16-bit: low quality, failsafe. Choose this setting, if you encounter a problem while playback;
* 24-bit: hi-quality;
* 32-bit: highest quality.

Foo_dsp_winamp converter has built-in limiter to avoid audio signal [[clipping]]. Limiter is non-bypassing (i.e. always on), so if you want to keep signal spectrum close to its original, you should use [[ReplayGain]] subsystem.

==How to avoid startup crash==
* For Foobar v0.9.6 and later: download the latest version of foo_dsp_winamp (1.4.5);
* For Foobar versions prior to 0.9.6: you can avoid startup crash by removing dsp_sps.dll from ''C:\Program Files\Winamp\Plugins'' folder (it's a default path, check your winamp installation to find desired path). Than you can change "Winamp plugins path" in foo_dsp_winamp settings and put dsp_sps.dll back to Plugins folder.

=Link=

* [http://pelit.koillismaa.fi/plugins/dsp.php#149 Official Website]
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=49356 Discussion thread]
* [http://www.fb2k.org/info.php?user=476 Feedback/Bugreports]

Slim Devices Squeezebox

2011-07-27T17:28:26Z

Notat: Replay Gain -> ReplayGain

= Overview =
'''Squeezebox v3''' is a hi-end audiophile grade audio player that is designed by Slim Devices (''now Logitech Inc''). It's made for a vast repotoire of listening environments. Additionally it can interface with the music services [http://wwww.pandora.com Pandora], [http://www.rhapsody.com/ Real Rhapsody], and many other streaming internet radio stations. The latest version was first released and shipped in October 2005.

== Technical Specifications ==
=== Audio outputs (general) ===
==== Digital and analog outputs ====
* All RCA connectors are gold-plated
* Volume control is provided for all outputs
* Multiple outputs may be used at the same time

==== Analog RCA outputs ====
* High fidelity [http://focus.ti.com/docs/prod/folders/print/pcm1748.html Burr-Brown™ 24-bit DAC]
* Two dedicated linear power regulators for DAC and line-out stages
* Full 6.0Vpp line-level signals
* Signal-to-noise ratio: over 100dB
* Total harmonic distortion: less than -93.5dB (0.002%)

==== Digital S/PDIF outputs ====
* Optical and coax digital connections
* Dedicated high-precision crystal oscillators (no PLL, no resampling)
* Standard IEC-958 (S/PDIF) encoding
* Optical connector: TOSLINK 660nm
* Coax connector: RCA, 500mVpp into 75 ohms
* Sample rates: 44.1Khz, 48Khz, 96Khz
* Audio format: linear PCM, 16 or 24 bits per sample
* Intrinsic jitter: less than 50ps (standard deviation)

==== Headphone output ====
* Standard 1/8" jack also functions as an IR blaster
* Minimum headphone impedance: 16 ohms
* Total harmonic distortion: less than 0.03%
* Left/right crosstalk attenuation: 92dB

=== Audio Codecs ===
==== Lossless ====
* [[ALAC]]
* [[FLAC]]
* [[WMA Lossless]]

==== Lossy ====
* [[MP3]]
* [[AAC]]
* [[Ogg Vorbis]]
* [[Musepack]]

==== Misc ====
* [[MAD]] decoding
* High accuracy 24-bit synthesis
* [[Sound check]] and [[ReplayGain]] support for adjusting the gain on your audio files.

=== Tech Information ===
==== Firmware ====
* Flash upgradeable firmware
* Network settings are stored in non-volatile memory
* Auto-configuration for most networks
* Easy setup for wireless networks

==== Architecture ====
* 250 MHz 8-way multithreaded RISC processor
* "Pure software" SlimDSP™ architecture
* Field-upgradeable Xilinx CPLD
* 64 Mb high-speed RAM
* 16 Mb program flash
* Low-power design, all solid-state, fanless

=== Network ===
====Wireless Interface====
* True 802.11g wireless networking
* Bridging capability allows Ethernet devices to connect to the network through Squeezebox Wireless
* Throughput up to 54 Mbps, high speed PCI interface to radio module
* Dual antennas for improved range and throughput
* Supports all 802.11b and 802.11g access points
* Internal antennas: planar inverted-F antenna
* Automatically detects available networks for quick setup
* Supports both WPA Personal, WPA2-AES and 64/128-bit WEP encryption

==== Ethernet Interface ====
* Available on both Wireless and Wired models
* True 100Mbps throughput
* Shielded CAT5 RJ-45 connector
* Connects to any 100 Mbps or 10 Mbps network
* Auto-detects full duplex and half duplex modes
* Automatic receive polarity correction
* Maximum cable length: 100 meters (328 feet)

=== Power ===
==== Power input ====
* 5 V DC, regulated
* Center positive, sleeve ground
* Connector: 2.5 mm ID, 5.5 mm OD, 11 mm long
* Min supply rating: 1000 mA

==== Power supply ====
* Switching power supply included
* Input voltage range and plug style specific to shipping destination
* Power supplies are small, efficient, and do not get hot
* One of four styles is included depending on country

=== Software installation ===
* All systems: 256 MB RAM, ethernet or wireless network, and 20 MB hard disk space
* Macintosh: Mac OS X 10.3 or later
* Windows: 733 MHz Pentium running Windows NT/2000/XP
* Linux/BSD/Solaris/Other: Perl 5.8.3 or later

== See also ==
* [[Slim Devices Transporter|Transporter]]

== External links ==
* [http://wiki.slimdevices.com/index.cgi?HardwareComparison Slim Devices comparions] a wiki article comparing the various devices made and a hardware comparison between them
* [http://forums.slimdevices.com/ Slim Devices forum] 3rd party user forum regarding topics surrounding the Squeezebox

[[Category:Hi-Fi]]

Slim Devices Transporter

2011-07-27T17:28:11Z

Notat: Replay Gain -> ReplayGain

= Overview =
'''Transporter''' is an expensive hi-end audiophile grade network audio player that is designed by Slim Devices (''now Logitech Inc''). The player is preferred by audiophiles due to it's reduced clock noise and intrinsic jitter, which is the only distinct difference between it and the [[Slim Devices Squeezebox|Squeezebox]]. Additionally it can interface with many streaming internet radio stations, including the open-source Slim Server software. Transporter was first released and shipped in September 2006.

== Technical Specifications ==
=== Audio Input and Outputs ===
==== Audio Outputs ====
* Digital and analog outputs
* Gold-plated RCA, XLR, and BNC connectors
* Volume control is provided for all outputs
* Multiple outputs may be used at the same time

==== Analog Outputs ====
* [http://www.asahi-kasei.co.jp/akm/en/product/ak4396/ak4396.html AKM AK4396] Multi-bit delta-sigma digital to analog converter
* Signal-to-noise, Dynamic Range: 120 dB
* THD+Noise: -106 dB (0.00005 %)
* Linear Super-Regulated supplies for DAC and line-out stages

==== Digital Outputs and Inputs ====
* Optical, coax, BNC, and XLR digital connectors
* Word clock input for synchronization with an external clock
* Linear-regulated power for all clock paths
* Dedicated high-precision crystal oscillators (no PLL, no resampling)
* Standard IEC-958 (S/PDIF) or AES/EBU encoding
* Optical connector: TOSLINK 660nm
* RCA connector: capacitor-coupled 500mVpp into 75 ohms
* BNC connector: transformer-coupled, 500mVpp into 75 ohms
* XLR connector: 4.7Vpp into 110 ohms
* Sample rates: 44.1kHz, 48kHz, 96kHz
* Audio format: linear PCM, 16 or 24 bits per sample
* Jitter (standard deviation):
: 11ps at oscillator (intrinsic jitter)
: 17ps at DAC
: 35ps at S/PDIF receiver

=== Audio Codecs ===
==== Lossless ====
* [[ALAC]]
* [[FLAC]]
* [[WMA Lossless]]

==== Lossy ====
* [[MP3]]
* [[AAC]]
* [[Ogg Vorbis]]
* [[Musepack]]

==== Misc ====
* [[MAD]] decoding
* High accuracy 24-bit synthesis
* [[Sound check]] and [[ReplayGain]] support for adjusting the gain on your audio files.

=== Technical information ===
==== Firmware ====
* Flash upgradeable firmware
* Network settings are stored in non-volatile memory
* Auto-configuration for most networks
* Easy setup for wireless networks

==== Architecture ====
* 325 MHz 8-way multi-threaded RISC processor
* Field-upgradeable Xilinx CPLD
* 64 Mb high-speed RAM
* 16 Mb program flash
* Low-power, fanless design

=== Network ===
==== Wireless Interface ====
* True 802.11g wireless networking (can be disabled)
* Bridging capability allows Ethernet devices to connect to the network through the wireless interface
* Throughput up to 54Mbps, high speed PCI interface to radio module
* Dual external antennas for improved range and throughput
* Supports all 802.11b and 802.11g access points
* Automatically detects available wireless networks for quick setup
* Supports both WPA Personal, WPA2-AES and 64/128-bit WEP encryption

==== Ethernet Interface ====
* True 100 Mbps throughput
* Shielded CAT5 RJ-45 connector
* Connects to any 100Mbps or 10Mbps network
* Auto-detects full duplex and half duplex modes
* Automatic receive polarity correction
* Maximum cable length: 100 meters (328 feet)

=== Power ===
==== Power Input ====
* 100–240 V, 50-60 Hz AC
* Internal Fuse: 500 mA
* Standard IEC power connector
* Included IEC power cable specific to shipping destination

==== Power Supply ====
* Separate linear supplies for Analog, DAC, and clocks
* Auto-ranging, relay-controlled AC input
* Three Super Regulators for analog stages (+15, -15, +5)
* High-efficiency, low noise SMPS for CPU, Display
* Continuous AC voltage monitoring
* Automatic over-voltage protection
* Low-power "deep sleep" mode

=== Software installation ===
==== System Requirements ====
* All systems: 256MB RAM, ethernet or wireless network, and 20MB hard disk space
* Macintosh: Mac OS/X 10.3 or later
* Windows: 733Mhz Pentium running Windows NT/2000/XP
* Linux/BSD/Solaris/Other: Perl 5.8.3 or later

== External links ==
* [http://wiki.slimdevices.com/index.cgi?HardwareComparison Slim Devices comparions] a wiki article comparing the various devices made and a hardware comparison between them
* [http://www.slimdevices.com/su_downloads.html Slim Server downloads] downloading the Slim Server open source software for your machine
* [http://forums.slimdevices.com/ Slim Devices forum] 3rd party user forum regarding topics surrounding the Transporter.
* [http://www.engadget.com/2006/07/24/slim-devices-transporter-unwires-high-end/ Engadget review] a technical journalist review of the Transporter
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=48760&st=0 HA Transporter thread] a few screenshots from an HA member who owns a Transporter.
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=11909&hl= Anti-Jitter RAM Buffers thread] discusses whether or not jitter in electronic devices can make a difference audible.

[[Category:Hi-Fi]]

MP3Gain

2011-07-27T17:27:53Z

Notat: Replay Gain -> ReplayGain

'''MP3Gain''' is a program that analyzes [[MP3]] files to determine how loud they sound to the human ear. It can then adjust the [[MP3]] files so that they all have the same loudness without any quality loss. This way, you don't have to keep reaching for the volume dial on your [[MP3]] player every time it switches to a new song.

MP3Gain is an implementation of [[ReplayGain]], supporting Track mode and Album mode. However, with most other formats, the necessary loudness adjustment of ReplayGain is stored as metadata, thus leaving the encoded results alone. With MP3Gain, the loudness adjustment is done on the data itself, albeit in a lossless/reversible way. Another difference with MP3Gain is the fact that it can only adjust physical volume in 1.5 dB steps.

== Technical Explanation ==
Here's the technical reason on why it's lossless (despite operating on the data itself), and also why the smallest change possible is 1.5 dB:

The MP3 format stores the sound information in small chunks called "frames". Each frame represents a fraction of a second of sound. In each frame there is a "global gain" field. This field holds an 8-bit integer which can represent values from 0 to 255.

When an MP3 player decodes the sound in the frame, it uses the global gain field to multiply the decoded sound samples by 2^(gain/4).
* If you add 1 to this field in all the MP3 frames, you effectively multiply the amplitude of the whole file by 2^(1/4) = 119 % = +1.5 dB.
* Likewise, if you subtract 1 from this field, you multiply the amplitude by 2^(-1/4) = 84 % = -1.5 dB.

The way MP3Gain works actually has a very strong benefit: Since it is the data itself that is modified, MP3Gain does not require special support from players.

== Links ==
* [http://mp3gain.sourceforge.net/ MP3Gain's official website]
* [http://www.hydrogenaudio.org/forums/index.php?act=ST&f=15&t=3274 MP3Gain tutorial] on HA Forums

[[Category:Software]]

Rockbox

2011-07-27T17:27:32Z

Notat: Replay Gain -> ReplayGain

{{featured}}
[[Image:Rockboxlogo.png|right]]

'''Rockbox''' is a [[GPL]]-compliant [[open source]] operating system for portable digital audio players (DAPs). The Rockbox Project began in 2002 and was first implemented on the [[Archos]] Studio DAP because of owner frustration with severe limitations in the manufacturer-supplied user interface and device operations.

Rockbox can completely replace the host device's operating system firmware and has matured to become an extensible, flexible platform that provides a plug-in architecture for adding PDA functionality, applications, utilities, and games, and has also managed to retrofit video playback functionality onto DAPs first released in mid-2000. Recently, Rockbox now includes a voice-driven user-interface suitable for operation by blind and visually impaired users.

Although Rockbox's official title is "Rockbox: Open Source Jukebox Firmware", in many instances it is not actually installed to (or run from) flash memory. Instead a minimal bootloader is installed in the supported device's flash which is capable of either loading Rockbox from the hard disk or, alternately, the original factory firmware.

== Codecs ==

Rockbox on software decoding platforms (non-Archos) supports playback of eleven [[lossy compression|lossy]] codecs (depending on how one counts), five [[lossless data compression|lossless]], two uncompressed and six miscellaneous formats.<ref>{{cite web|title=Rockbox Supported audio formats|url=http://download.rockbox.org/daily/manual/rockbox-sansaclipplus/rockbox-buildap2.html#x17-335000B.1|work=Rockbox Manual}}</ref> This makes a conservative total of 25 supported audio formats, although a few of them do not operate in realtime on all platforms. Extensive work has gone into optimizing each codec, with FLAC, Ogg, WMA, APE and WMA Pro among the fastest known implementations for those formats.<ref>{{cite web|url=http://www.hydrogenaudio.org/forums/index.php?showtopic=82125&view=findpost&p=716976 |title=Codec performance comparison – Hydrogenaudio Forums |publisher=Hydrogenaudio.org |date= |accessdate=2011-03-12}}</ref>

=== Lossy formats ===

* MPEG audio layers I-III ([[MP3]]/[[MPEG-1 Audio Layer II|MP2]]/[[MPEG-1 Audio Layer I|MP1]])
* [[Vorbis|Ogg Vorbis]]
* [[Advanced Audio Coding|MPEG-4 AAC]](-LC/HE/HEv2 profiles) (in [[MPEG-4 Part 14|MP4]] or [[RealMedia|RM]] containers)
* [[Musepack]]
* [[Dolby Digital|AC3]] (raw or [[RealMedia|RM]] container)
* [[Windows Media Audio|WMA Standard]]
* [[Windows Media Audio|WMA Professional]]
* [[Speex]]
* [[Cook Codec|Cook]]
* [[Adaptive Transform Acoustic Coding#ATRAC3 (LP2 and LP4 Modes)|ATRAC3]]
* The lossy portion of [[WavPack]] hybrid files

=== Lossless formats ===

* [[Free Lossless Audio Codec|FLAC]]
* [[WavPack]]
* [[Shorten]]
* [[Apple Lossless]]
* [[Monkey's Audio]]
* [[TTA (codec)|TTA]]

=== Uncompressed formats ===

* Intel-style [[WAV]]
* Apple [[Audio Interchange File Format|AIFF]]
Together they include over a dozen different [[Pulse-code modulation|PCM]] and [[Adaptive DPCM|ADPCM]] formats.

== Rockbox features ==

Beside the ability of playing and recording audio files, Rockbox offers many playback enhancements that other firmware packages may not have implemented yet. Listed below are a handful of these features.

* [[Gapless playback]]<ref>{{cite web|title=Codec Featureset|url=http://download.rockbox.org/daily/manual/rockbox-sansaclipplus/rockbox-buildap2.html#x17-339000B.1.4|work=Rockbox Manual|accessdate=22 May 2011}}</ref>
* [[crossfader|Crossfading]]<ref>{{cite web|title=Crossfade|url=http://download.rockbox.org/daily/manual/rockbox-sansaclipplus/rockbox-buildch7.html#x10-1220007.7|work=Rockbox Manual|accessdate=22 May 2011}}</ref>
* [[ReplayGain]]<ref name="soft_decode">Software decoding targets only</ref>
* 5 band fully parametric [[equalization (audio)|equalizer]]<ref name="soft_decode" />
* Variable speed decoding with pitch correction<ref>{{cite web|title=Pitch|url=http://download.rockbox.org/daily/manual/rockbox-sansaclipplus/rockbox-buildch4.html#x7-630004.3.3|work=Rockbox Manual|accessdate=22 May 2011}}</ref>
* [[Crossfeed]]<ref name="soft_decode" />
* OTF ("on the fly") playlists
* True random shuffle (fresh randomly shuffled list every time)
* Custom [[Theme (computing)|UI themes]]
* Dynamic Playlists (queue files to play next, or in other parts of a dynamic playlist)
* Stereo recording to WAV/AIFF/WavPack (lossless) and MP3<ref>MP3, WavPack and AIFF are available on non-Archos devices. Multiple sample rates and bitrates available (hardware-dependent).</ref><ref>{{cite web|title=Recording|url=http://download.rockbox.org/daily/manual/rockbox-sansaclipplus/rockbox-buildch10.html#x13-14900010|work=Rockbox Manual|accessdate=22 May 2011}}</ref> (supporting devices)
* [[FM broadcasting|FM radio]], including FM recording (supporting devices)
* Remote control (supporting devices)
* Digital [[S/PDIF]] input/output (supporting devices)
* [[Last.fm]] support (even on players lacking [[Real-time clock|RTC]])
* [[cue sheet (computing)|Cue sheet]] support
* Changeable selector bar
* Album art<ref>{{cite web|url=http://www.rockbox.org/twiki/bin/view/Main/AlbumArt |title=Some limitations. Details at Rockbox Wiki |publisher=Rockbox.org |date= |accessdate=2011-03-12}}</ref>
* Sleep timer

== External links ==
* [http://www.rockbox.org/ The Rockbox Project]

''~ Text taken from [http://en.wikipedia.org/wiki/Rockbox Wikipedia entry for Rockbox]''

[[Category:Firmware]]

OggdropXPd

2011-07-27T17:27:12Z

Notat: Replay Gain -> ReplayGain

{{Infobox Software
| name = OggDropXPd
| logo =
| screenshot = [[image:Oggdropxpd-idle.PNG|130px]]
| caption = Graphical drag-n-drop frontend
| maintainer = John Edwards
| stable_release = 1.9.0
| preview_release =
| operating_system = Windows
| use = Encoder/Decoder
| license = GPL
| website = [http://www.rarewares.org/ogg-oggdropxpd.php RareWares]
}}
= Introduction =
John33's (Ogg) [[Vorbis]] encoder with a nice drag-and-drop interface.

== Features ==
* Compression from [[lossless]] files ([[FLAC]], [[LPAC]], [[Monkey's Audio]], [[OptimFROG]], and [[WavPack]])
:'''Note:''' if you want to use Monkey's Audio, LPAC, WavPack, or OptimFrog as source files, you must provide the decoder yourself.
* Auto-tagging
* Renaming encoded files (using FLAC Tags)
* Setting of advanced encoder parameters
* Use of VorbisGain tags ([[ReplayGain]] for Vorbis) on decode
* Playlist (.pls) creation
* And more...

= Quick start manual =
Here is a short user manual on how to quickly employ OggDropXPd. This is not exhaustive; a far more exhaustive one is hosted at [http://www.rarewares.org/ogg-oggdropxpd.php this site].

== Installing ==
* Extract the ZIP file you download (see the [[OggDropXPd#Download|Download section]]) to any folder.
* If you use the processor-optimized version (i.e. for P3/AMD or for P4), also copy '''libmmd81.dll''' into the same folder. You can get it at [http://www.rarewares.org/files/libmmd8.1.zip here].
* (Optional) Create a shortcut and copy it into your Start Menu, Desktop, QuickLaunch bar, etc.
* If you want support for encoding from lossless file (other than [[FLAC]] support which is built-in), extract the proper decoders in the same folder. You can get them at RareWares.

== Configuring for Encoding ==
''Note: All ScreenCaps taken from OggDropXPd v1.8.7''
* Start OggDropXPd. The small Drop Target window will open (the right one is for [[Lancer]] OggDropXPd):
<div style="margin-left:60px;">[[Image:Oggdropxpd-idle.PNG]] [[Image:Oggdropxpd-idle_lancer.PNG]]</div>
* Right-click on the small window. The following menu will be displayed:
<div style="margin-left:60px;">[[Image:Oggdropxpd-menu.png]]</div>
* Click on a menu item to change the options. Menu items relevant for Encoding are described below.

=== Encoding Options ===
This is used to configure the Vorbis Encoder

<div style="margin-left:60px;">[[Image:Oggdropxpd-encodingoptions.png]]</div>
* General EncoderOptions
:This is where you choose the quality of the encoded file.

:* '''Use Standard Quality Mode''' ''-- Recommended''
::This ensures the highest quality, although you cannot exactly determine the bitrate.
::You can either type the exact -q value in the textbox, or drag the slider. Higher -q value gives better quality at the expense of larger file size.
::For some guidelines on what -q value to use, check out the [[Recommended Ogg Vorbis]] page.
:* '''Use Quality Mode by selection of Approximate Bitrate'''
::Sometimes you need to encode files at a certain bitrate, e.g. for streaming. Choose this and specify the approximate bitrate you're trying to get.
:* '''Bitrate management'''
::Although these two radio-buttons seem to be subsets of the previous option, they stand on their own, actually. i.e choosing either one of these two unselects the top 2 options.
::The following two options are not recommended, as they tend to sacrifice quality.
:::* '''Use ABR mode''', here you can limit the minimum bitrate (may cause size bloat), maximum bitrate (may reduce quality on 'difficult' songs), and the nominal bitrate (affects quality if too low)
:::* '''Use CBR mode'''. Here you just specify the bitrate you want.

* '''Delete input files after encoding'''
:Self-explanatory. However, if you do not have the original source (e.g. CD), then '''it is recommended to uncheck this option'''. Just in case you need to re-encode, you will still have the source.

* '''Advanced Encoder Options'''
:For QuickStarting, you can leave these options '''unchecked'''.

* '''Other Advanced Encoder Options'''
:For QuickStarting, leave this button alone.

=== Select Output Directory ===
This is used to configure where the encoded file will be placed.

<div style="margin-left:60px;">[[Image:Oggdropxpd-output_directory_options.png]]</div>

* '''Same as Input Directory'''
:The resulting .ogg file will be placed in the same directory as the source file. This is the default after installation.
:<div style="color:blue;">For the purposes of this QuickStart, choose this option.</div>
* '''Set Other Output Directory - THIS SESSION ONLY'''
:You can specify in where to put the resulting .ogg file by clicking on the "..." button. However, if you close OggDropXPd and restart it later, the setting will revert back to "Same as Input Directory."
* '''Set Other Output Directory - AS DEFAULT'''
:Same as above, but your specified directory will be used for later sessions also.

=== Select Temporary Directory ===
If you use the [[lossless]] source files of the formats .ape, .pac, .wv, .ofr, or .ofs, then this is where you store the temporary uncompressed .wav file.

<div style="margin-left:60px;">[[Image:Oggdropxpd-temporary_directory_options.png]]</div>

This dialog box should be self-explanatory.

=== Other Settings ===
* Uncheck AUTO Tagging
* Check: Write Log File, Show Bit Rate, Always On Top

== Encoding! ==
Drag and drop the file to be encoded from Windows Explorer onto the OggDropXPd droptarget window, and wait. The logo will spin while it is encoding: (Image capture of Lancer OggDropXPd)

<div style="margin-left:60px;">[[Image:Oggdropxpd-encoding_lancer.png]]</div>

There are 4 information shown while encoding, under the spinning logo:
* Last granule bitrate
* Current setting | encoding speed (i.e. x of real time)
* The lighter bar indicates the progress for the currently encoded file
* The darker bar indicates the total progress (i.e. when you dropped more than one source file onto OggDropXPd)

When the logo stops spinning... you're done! The resulting .ogg file can be found in the same directory as the source file.

'''''Thus concludes OggDropXPd QuickStart manual for encoding.'''''

= Download =
You can download the latest version from [http://www.rarewares.org/ogg.html Vorbis page at RareWares].

A highly-optimized version is also available at the [http://homepage3.nifty.com/blacksword/index_e.htm Ogg Vorbis Acceleration Project], with the codename of [[Lancer]].

[[Category:Software]]
[[Category:Encoder/Decoder]]

MediaMonkey

2011-07-27T17:26:55Z

Notat: Replay Gain -> ReplayGain

{{Software Infobox|
|name = MediaMonkey
|logo = [[image:Monkey_head_wiki.png|noframe|MediaMonkey Logo]]
|screenshot =
|caption =
|maintainer = [http://www.ventismedia.com/ Ventis Media, Inc.]
|stable_release = [http://www.mediamonkey.com/MediaMonkey_Setup.exe 3.2.6.1307]
|preview_release = [http://www.mediamonkey.com/forum/viewtopic.php?f=6&t=54426&sd=d 4.0.0.1411]
|operating_system = Windows
|use = Music Organizer, Mass Tagger, [http://wiki.hydrogenaudio.org/index.php?title=Category:Media_Players Media Player], Burning Media, Portable Player Synchronizing
|license = Freeware, Proprietary
|website = [http://www.mediamonkey.com/ www.mediamonkey.com]
}}

'''MediaMonkey''' is a Windows-based media player/library, with built-in file tagger & organizer, [[Compact Disc Digital Audio|Audio CD]] ripper/burner (limited speed on free version), Data CD burner (also limited speed on free version), [[transcoding]] tool, [[ReplayGain]] tool, and lots others, all in an integrated user interface natively themed as a (skinnable) media library.

It maintains its library in a database that is compatible with '''Microsoft Access''' database format, and as such, the database entries can be changed by using Access or other compatible programs. However, it also stores most of its information into the media files' [[tags]] ([[ID3v1]], [[ID3v2]], [[Vorbis comment]], [[WMA]], [[APEv2]], [[WAV]] and [[AAC]] tags are supported), thus ensuring usability with other media players.

MediaMonkey also provides auto-tagging/auto-renaming support, in which tag information are deduced from file's path and name (or alternatively through Amazon which support tagging/adding [[Album Art]]), or reversely, renaming (and relocating) files based on their tag or path information.

Finally, MediaMonkey also support integration with various portable DAPs, including [[Apple iPod]], [[iRiver]], and [[Creative Labs]] devices. Latest MediaMonkey 3.2 added support for [[Palm Pre]]

== Supported Formats ==
* [[Free Lossless Audio Codec]] (FLAC)
* [[Monkey's Audio]] (APE)
* [[MP3]]
* [[Musepack]] (MPC)
* (Ogg) [[Vorbis]]
* [[WAV]]
* [[Windows Media Audio]] (WMA)
* [[AAC]] *Version 3.x and Above

Additional formats may be supported through Winamp input plugins (see below)

== Extensibility ==
MediaMonkey supports most Winamp's input, output, DSP, general, and visualization plugins.

In addition, MediaMonkey provides [http://www.mediamonkey.com/wiki/index.php/Scripting scripting interfaces], so it is scriptable (built-in support for JavaScript/VB Script) and also controllable by external applications (through Winamp Compatible Messages or OLE Automatization Server/COM+).

Finally, MediaMonkey is skinnable. Many MediaMonkey users have developed various skins for it, which you can find in the [http://www.mediamonkey.com/wiki/index.php/Skinning MediaMonkey Wiki].

== Price ==
The [http://www.mediamonkey.com/product.htm free (Standard) version], is, well, free.

[http://www.mediamonkey.com/product_gold.htm The Gold version] – which adds, among others, automatic/periodic scan of "watched folders" – is 29.95 USD for lifetime.

== External links ==
* [http://www.mediamonkey.com MediaMonkey homepage]
* [http://www.mediamonkey.com/forum/ MediaMonkey forums]
* [http://www.mediamonkey.com/wiki/ MediaMonkey Wiki]
* [http://mediamonkey.com/faq/ MediaMonkey FAQ]
* [http://mediamonkey.com/support/ MediaMonkey Support Knowledgebase]
* [http://home.scarlet.be/ruben.castelein/MediaMonkey%20Scripts.htm Ruben Castelein (Steegy) Script Collection]
* [http://trixmoto.net/mm/ Richard Lewis (t-rix-mo-to) MediaMonkey page]
* [http://webmonkey.flyinglowlander.com/ Martin Warning (FlyingLowlander) WebMonkey Page]

[[Category:Software]]
[[Category:Media Players]]

Music Player Daemon

2011-07-27T17:26:40Z

Notat: Replay Gain -> ReplayGain

'''Music Player Daemon''' ('''MPD''') is an open-source music playback and playlist handling daemon. It can be controlled via local clients or remotely.

== Supported formats ==
* [[MP3]], (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], [[Advanced Audio Coding|AAC]], [[MOD]], [[RIFF WAVE|WAV]] and [[Musepack]]

== Features ==
* [[Tags]]
* [[ReplayGain]]
* Gapless playback

== Supported languages ==
* English.

== Supported platforms ==
* Linux/BSD

== External links ==
* [http://musicpd.org/ Homepage]
* [http://musicpd.org/download.shtml Download]

[[Category:Media Players]]

Lossless comparison

2011-07-27T17:26:21Z

Notat: Replay Gain -> ReplayGain

The '''lossless comparison page''' aims to gather information about lossless codecs available so users can make an informed decision as to what lossless codec to choose for their needs.

== Introduction ==
Given the enormous amount of [[lossless]] audio compressor choices available, it is a very difficult task to choose the one most suited for each person's needs.

Several people only take into consideration compression performance when choosing a codec. But as the following table and article shows, there are several other features worth taking into consideration when making that choice.

For example, users wanting good multiplatform compatibility and robustness (E.G, people sharing live recordings) would favour [[WavPack]] or [[FLAC]]. Another user, looking for the very highest compression available, would go with [[OptimFROG]]. Someone wanting portable support would use [[FLAC]] or [[ALAC]], and so on.

En fin, this is not a matter worth getting too worked up about. If you later find out the codec you chose isn't the best for your needs, you can just transcompress to another format, without risk of losing quality.

'''Note:''' for latest comparison of lossless compression, scroll down to the [[Lossless comparison#External links|Links section of this page]].

== Comparison Table ==


{| cellspacing="2" style="text-align:center; border:1px solid blue;"
|width="120px"|'''Features'''
| width="95px" style="background: #00FFFF" | FLAC
| width="95px" style="background: #00FFFF" | WavPack
| width="95px" style="background: #00FFFF" | TAK
| width="95px" style="background: #00FFFF" | Monkey's
| width="95px" style="background: #00FFFF" | OptimFROG
| width="95px" style="background: #00FFFF" | ALAC
| width="95px" style="background: #00FFFF" | WMA
|-
|align="left" style="background: #FFFF99" | Encoding speed
| style="background: #CCFFCC" | fast
| style="background: #00FF00" | very fast
| style="background: #00FF00" | very fast
| style="background: #CCFFCC" | fast
| style="background: #FF9900" | slow
| style="background: #FFFFFF" | average
| style="background: #FFFFFF" | average
|-
|align="left" style="background: #FFFF99" | Decoding speed
| style="background: #00FF00" | very fast
| style="background: #00FF00" | very fast
| style="background: #00FF00" | very fast
| style="background: #FFFFFF" | average
| style="background: #FFFFFF" | average
| style="background: #CCFFCC" | fast
| style="background: #FFFFFF" | average
|-
|align="left" style="background: #FFFF99" | Compression*
| style="background: #CCFFCC" | 58.70%
| style="background: #CCFFCC" | 58.0%
| style="background: #CCFFCC" | 57.0%
| style="background: #00FF00" | 55.50%
| style="background: #00FF00" | 54.70%
| style="background: #CCFFCC" | 58.50%
| style="background: #00FF00" | 56.30%
|-
|align="left" style="background: #FFFF99" | Flexibility**
| style="background: #00FF00" | very good
| style="background: #00FF00" | very good
| style="background: #00FF00" | very good
| style="background: #00FF00" | very good
| style="background: #00FF00" | very good
| style="background: #FF9900" | bad
| style="background: #FF9900" | bad
|-
|style="background: #FFFFFF" |  
|-
|align="left" style="background: #FFFF99" | Error handling
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FFFFFF" |  
| style="background: #00FF00" | yes
|-
|align="left" style="background: #FFFF99" | Seeking
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
|-
|align="left" style="background: #FFFF99" | Tagging
| style="background: #00FF00" | Vorbis tags
| style="background: #00FF00" | ID3/APE
| style="background: #CCFFCC" | APEv2 (exp.)
| style="background: #00FF00" | ID3/APE
| style="background: #00FF00" | ID3/APE
| style="background: #CCFFCC" | iTunes
| style="background: #CCFFCC" | ASF
|-
| align="left" style="background: #FFFF99" | Hardware support
| style="background: #00FF00" | very good
| style="background: #FF9900" | limited
| style="background: #FF9900" | no
| style="background: #FF9900" | limited
| style="background: #FF9900" | no
| style="background: #CCFFCC" | good
| style="background: #FF9900" | limited
|-
| align="left" style="background: #FFFF99" | Software support
| style="background: #00FF00" | very good
| style="background: #CCFFCC" | good
| style="background: #FFFFFF" | average
| style="background: #CCFFCC" | good
| style="background: #FFFFFF" | average
| style="background: #FFFFFF" | average
| style="background: #CCFFCC" | good
|-
| align="left" style="background: #FFFF99" | Hybrid/lossy
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | ReplayGain
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FFFFFF" | sort of
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | RIFF chunks
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FFFFFF" |  
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FFFFFF" |  
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | Streaming
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
|-
| align="left" style="background: #FFFF99" | Pipe support
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
|-
| align="left" style="background: #FFFF99" | Open source
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes (third-party)
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | Multichannel
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
|-
| align="left" style="background: #FFFF99" | High resolution
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
|-
| align="left" style="background: #FFFF99" | OS support
| style="background: #00FF00" | All
| style="background: #00FF00" | All
| style="background: #CCFFCC" | Win/Linux Wine
| style="background: #00FF00" | All
| style="background: #00FF00" | Win/Mac/Linux
| style="background: #00FF00" | All
| style="background: #CCFFCC" | Win/Mac
|}

''(table continued below)''

{| cellspacing="2" style="text-align:center; border:1px solid blue;"
|width="120px"|'''Features'''
| width="95px" style="background: #00FFFF" | Shorten
| width="95px" style="background: #00FFFF" | LA
| width="95px" style="background: #00FFFF" | TTA
| width="95px" style="background: #00FFFF" | LPAC
| width="95px" style="background: #00FFFF" | MPEG-4 ALS
| width="95px" style="background: #00FFFF" | MPEG-4 SLS
| width="95px" style="background: #00FFFF" | Real Lossless
|-
| align="left" style="background: #FFFF99" | Encoding speed
| style="background: #00FF00" | very fast
| style="background: #FF9900" | slow
| style="background: #00FF00" | very fast
| style="background: #FFFFFF" | average
| style="background: #FFFFFF" | average
| style="background: #FF9900" | slow
| style="background: #FF9900" | slow
|-
| align="left" style="background: #FFFF99" | Decoding speed
| style="background: #00FF00" | very fast
| style="background: #FF9900" | slow
| style="background: #CCFFCC" | fast
| style="background: #CCFFCC" | fast
| style="background: #CCFFCC" | fast
| style="background: #FF9900" | slow
| style="background: #CCFFCC" | fast
|-
| align="left" style="background: #FFFF99" | Compression*
| style="background: #FF9900" | 63.50%
| style="background: #00FF00" | 53.50%
| style="background: #CCFFCC" | 57.10%
| style="background: #CCFFCC" | 57.20%
| style="background: #CCFFCC" | 57.10%
| style="background: #CCFFCC" | ?
| style="background: #CCFFCC" | 57.0%
|-
| align="left" style="background: #FFFF99" | Flexibility**
| style="background: #FF9900" | bad
| style="background: #FFFFFF" | average
| style="background: #FF9900" | bad
| style="background: #FF9900" | bad
| style="background: #00FF00" | very good
| style="background: #FF9900" | bad
| style="background: #FF9900" | bad
|-
|! style="background: #FFFFFF" |  
|-
| align="left" style="background: #FFFF99" | Error handling
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FFFFFF" |  
|-
| align="left" style="background: #FFFF99" | Seeking
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | slow
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
|-
| align="left" style="background: #FFFF99" | Tagging
| style="background: #FF9900" | no
| style="background: #CCFFCC" | ID3v1
| style="background: #CCFFCC" | ID3
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #CCFFCC" | proprietary
|-
| align="left" style="background: #FFFF99" | Hardware support
| style="background: #FF9900" | limited
| style="background: #FF9900" | no
| style="background: #FF9900" | limited
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | Software support
| style="background: #00FF00" | very good
| style="background: #FF9900" | bad
| style="background: #FFFFFF" | average
| style="background: #FFFFFF" | average
| style="background: #FF9900" | bad
| style="background: #FF9900" | bad
| style="background: #FF9900" | bad
|-
| align="left" style="background: #FFFF99" | Hybrid/lossy
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | ReplayGain
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | RIFF chunks
| style="background: #FF9900" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FFFFFF" |  
| style="background: #FFFFFF" |  
| style="background: #FFFFFF" |  
|-
| align="left" style="background: #FFFF99" | Streaming
| style="background: #FF9900" | no
| style="background: #FFFFFF" |  
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
|-
| align="left" style="background: #FFFF99" | Pipe support
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #FFFFFF" |  
| style="background: #FFFFFF" |  
| style="background: #FFFFFF" |  
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | Open source
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | Multichannel
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | High resolution
| style="background: #FF9900" | no
| style="background: #FF9900" | no
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #00FF00" | yes
| style="background: #FF9900" | no
|-
| align="left" style="background: #FFFF99" | OS support
| style="background: #00FF00" | All
| style="background: #CCFFCC" | Win/Linux
| style="background: #00FF00" | All
| style="background: #CCFFCC" | Win/Linux/Sol
| style="background: #00FF00" | All
| style="background: #00FF00" | All
| style="background: #00FF00" | Win/Mac/Linux
|}

<nowiki>*</nowiki> The Compression ratio is calculated with the division of compressed size by uncompressed size * 100. So, lower is better.

Encoding speed, Decoding speed and Compression ratio are based on each encoder's default settings.

<nowiki>**</nowiki> Flexibility refers to the amount of encoding choices offered to the users (Fast/low compression, Slow/high compression and everything inbetween)

These are the most popular lossless codecs, in alphabetical order:

== Apple Lossless Audio Codec (ALAC) ==
http://www.apple.com/itunes/import.html

[[ALAC]] is a codec developed by Apple for usage in [[Apple iPod|iPod]] and AirPort Express.

=== ALAC pros ===
* Very fast decoding
* [[Open source]] (encoding and decoding via FFmpeg and [[CueTools|CUETools]], decoding only via [http://craz.net/programs/itunes/alac.html a standalone decoder])
* Hardware support ([[Apple iPod|iPod]], AirPort Express)
* Streaming support
* Tagging support (QT tags)
* Excellent hardware-software-lossy format integration with [[iTunes]]/iPod.
* Supports [[multichannel]] audio and [[high resolution]]s

=== ALAC cons ===
* Limited software support
* No hybrid/lossy mode

=== ALAC Other features ===
* Fits in the [[MP4]] container
* Can be used with the AirPort Express gadget

== Free Lossless Audio Codec (FLAC) ==
http://flac.sourceforge.net/

[[FLAC]] is a lossless codec developed by Josh Coalson. It's part of the Xiph multimedia portfolio, along with [[Ogg]], [[Vorbis]], [[Speex]] and [[Theora]].

=== FLAC pros ===
* [[Open source]]
* Very fast decoding
* Fast encoding
* Hardware support ([[Rio Karma|Karma]], Phatbox, etc.)
* Very good software support
* Error robustness
* Streaming support
* Supports [[multichannel]] audio and [[high resolution]]s
* Tagging support (FLAC tags)
* Supports [[RIFF]] chunks
* Pipe support
* [[ReplayGain]] compatible

=== FLAC cons ===
* No hybrid/lossy mode

=== FLAC Other features ===
* Supports embedded CUE sheets (with [http://flac.sourceforge.net/faq.html#general__no_cuesheet_tags limitations])
* Includes MD5 hashes for quick integrity checking
* Fits the [[Ogg]] and [[Matroska]] containers

== LosslessAudio (LA) ==
http://www.lossless-audio.com/

[[LA]] is a lossless codec developed by Michael Bevin.

=== LA pros ===
* Very high compression
* Tagging support ([[ID3v1]])
* Supports [[RIFF]] chunks
* Pipe support

=== LA cons ===
* Closed source
* Very slow encoding and decoding
* Doesn't support [[multichannel]] audio and [[high resolution]]s
* No hardware support
* No hybrid/lossy mode
* Bad software support
* Doesn't support [[ReplayGain]]

'''''It's important to mention that the LA foobar plugin is buggy and doesn't produce lossless streams!'''''

== Lossless Predictive Audio Coder (LPAC) ==
http://www.nue.tu-berlin.de/wer/liebchen/lpac.html

[[Lossless Predictive Audio Coder]] (LPAC) is a lossless codec developed by Tilman Liebchen. Development of it has been halted in favour of development of [[MPEG-4]] ALS.

=== LPAC pros ===
* Reasonable compression ratios
* [[High resolution]] audio support
* Supports [[RIFF]] chunks

=== LPAC cons ===
* Closed source
* No error robustness
* Slow seeking
* No tagging
* No [[multichannel]] support
* No hybrid/lossy mode
* No hardware support
* Doesn't support [[ReplayGain]]

== Monkey's Audio (APE) ==
http://www.monkeysaudio.com/

[[Monkey's Audio]] is a very efficient lossless compressor developed by Matt Ashland.

=== APE pros ===
* [[Open source]]
* High efficiency
* Good software support
* Simple and user friendly. Official GUI provided.
* Java version (multiplatform)
* Tagging support ([[ID3v1]], [[APE tags]])
* [[High resolution]] audio support
* Supports [[RIFF]] chunks (only in the GUI encoder)
* Pipe support (only in a [http://www.etree.org/shnutils/shntool/ special] version)

=== APE cons ===
* No [[multichannel]] support
* No error robustness
* No hybrid/lossy mode
* Limited hardware support (Rockbox, some Cowon players); poor battery life due to complicated decoding [http://www.rockbox.org/wiki/SoundCodecMonkeysAudio MP3 player benchmarks]
* Higher compression levels are extremely CPU intensive
* Doesn't support [[ReplayGain]]

=== APE Other features ===
* Includes MD5 hashes for quick integrity checking
* Supports APL image link files (similar to CUE sheets)

== MPEG-4 SLS ==
MPEG-4 SLS allows audio encoding from lossless scalable to [[AAC]].

=== SLS pros ===
* Transcoding to standard AAC or any higher lossy bitrate at the speed of copying a file
* Scalable decoding from lossless, to any bitrate down to the AAC core track
* Best lossless compression available when you count the AAC track (~5% gain over any other lossless compression + AAC)
* [[High resolution]] audio support
* Multi channel audio support
* [[Open source]] (MPEG-4 Reference code)
* Embedded in standard MP4 files, so supports same tagging and ReplayGain features as AAC does.

=== SLS cons ===
* No usable software available yet
* Pure lossless compression not the best there is
* Seems to be slow in encoding and decoding, but we have to wait for released software

=== SLS Other features ===
* Transform based lossless codec with optional LC AAC core track

== OptimFROG (OFR) ==
http://losslessaudiocompression.com/

[[OptimFROG]] is a lossless format developed by Florin Ghido to become the champion in audio compression.

=== OFR pros ===
* Very high compression
* Good software support
* Error robustness
* Streaming support
* Supports [[high resolution]]s
* Hybrid/lossy mode
* Tagging support ([[ID3]], [[APE tags]])
* Supports [[RIFF]] chunks
* [[ReplayGain]] compatible

=== OFR cons ===
* Closed source
* No [[multichannel]] audio support
* No hardware support
* Quite slow decoding

=== OFR Other features ===
* Supports 32bit float streams
* Includes MD5 hashes for quick integrity checking

== RealAudio Lossless (RAL) ==
http://www.realnetworks.com/products/codecs/realaudio.html

RealAudio lossless is the lossless codec developed by Real Networks for their multimedia portfolio

=== RAL pros ===
* Very fast decoding
* Streaming support
* Tagging support (proprietary)

=== RAL cons ===
* Closed source
* No [[multichannel]] and [[high resolution]] audio support
* Little software support (Real Player)
* No hardware support
* Compression efficiency not on par with other lossless codecs
* No hybrid/lossy mode
* No pipe support
* Doesn't support [[ReplayGain]]

== Shorten (SHN) ==
http://www.etree.org/shnutils/shorten/

[[Shorten]] is a very old and featureless lossless codec developed by Tony Robinson at SoftSound.

=== SHN pros ===
* [[Open source]]
* Fast decoding
* Very good software support
* Supports [[RIFF]] chunks
* Pipe support

=== SHN cons ===
* Quite inefficient
* No [[multichannel]] or [[high resolution]] audio support
* No hybrid/lossy mode
* No error robustness
* Not streamable
* No hardware support
* No native tagging
* Doesn't support [[ReplayGain]]

== True Audio (TTA) ==
http://www.true-audio.com/

[[TTA]] is a new lossless codec developed by a team of russian programmers.

=== TTA pros ===
* [[Open source]]
* Good efficiency
* Hardware support (obscure DVD player)
* Supports [[multichannel]] audio and [[high resolution]]s
* Tagging support ([[ID3]])
* [[ReplayGain]] compatible
* Error robustness

=== TTA cons ===
* No streaming support
* No hybrid/lossy mode
* Doesn't support [[RIFF]] chunks
* No pipe support

=== TTA Other features ===
* Fits the [[Matroska]] container

== WavPack (WV) ==
http://www.wavpack.com/

[[WavPack]] is a fast and featureful lossless codec developed by David Bryant.

=== WV pros ===
* [[Open source]]
* Very fast decoding
* Very fast encoding
* Good efficiency
* Error robustness
* Streaming support
* Hardware support ([http://www.rockbox.org/ RockBox])
* Supports [[multichannel]] audio and [[high resolution]]s
* Hybrid/lossy mode
* Tagging support ([[ID3v1]], [[APE tags]])
* Supports [[RIFF]] chunks
* Ability to create self extracting files for Win32 platform
* Pipe support
* Good software support
* [[ReplayGain]] compatible

=== WV cons ===
* Limited hardware player support

=== WV Other features ===
* Supports 32bit float streams
* Supports embedded CUE sheets
* Includes MD5 hashes for quick integrity checking
* Can encode in both symmetrical and assymmetrical modes.
* Fits the [[Matroska]] container

== Windows Media Audio Lossless (WMAL) ==
http://www.microsoft.com/windows/windowsmedia/9series/codecs/audio.aspx

WMA Lossless is the lossless codec developed by Microsoft to be featured in their Windows Media codec portfolio.

=== WMAL pros ===
* Streaming support
* Very good software support
* Hardware support (All Microsoft Zunes though older models may need firmware updates, also supported by some non-Microsoft devices like the [http://en.wikipedia.org/wiki/Gigabeat Gigabeat V and S line from Toshiba].
* Supports [[multichannel]] audio and [[high resolution]]s.
* Tagging support (proprietary)
* Pipe support

=== WMAL cons ===
* Closed source
* No hybrid/lossy mode
* Doesn't support [[RIFF]] chunks
* Doesn't support [[ReplayGain]]

=== WMAL Other features ===
* Fits the [[ASF]] container

== Oddball Formats ==
There are several old lossless formats that aren't being featured in the article above. Reasons are: lack of widespread support, lack of features, bad efficiency and, most importantly, it seems no one is really interested in them.

Most of those would have disappeared by now, but they are being preserved for posterity at [[User:Rjamorim|rjamorim]]'s [http://www.rjamorim.com/rrw/ ReallyRareWares]

=== Advanced Digital Audio (ADA) ===
* http://www.rjamorim.com/rrw/ada.html

=== Bonk ===
* http://www.logarithmic.net/pfh/bonk

=== Marian's a-Pac ===
* http://www.marian.de/en/downloads#APAC
* http://www.rjamorim.com/rrw/apac.html

=== AudioZip ===
* http://www.rjamorim.com/rrw/audiozip.html

=== Dakx WAV ===
* http://www.dakx.com/
* http://www.rjamorim.com/rrw/daxwav.html

=== Entis Lab MIO ===
* http://www.entis.gr.jp/eri/frame.html
* http://www.rjamorim.com/rrw/mio.html

=== LiteWave ===
* http://www.clearjump.com/products/LiteWave.html
* http://www.rjamorim.com/rrw/litewave.html

=== Pegasus SPS ===
* http://www.krishnasoft.com/sps.htm
* http://www.rjamorim.com/rrw/pegasussps.html

=== RK Audio (RKAU) ===
* http://www.rjamorim.com/rrw/split2000.html

=== Sonarc ===
* http://www.rjamorim.com/rrw/sonarc.html

=== VocPack ===
* http://www.rjamorim.com/rrw/vocpack.html

=== WavArc ===
* http://www.rjamorim.com/rrw/wavarc.html

=== WaveZip/MUSICompress ===
* http://members.aol.com/_ht_a/sndspace/
* http://www.rjamorim.com/rrw/wavezip.html

== See also ==
* [[Lossless]]

== External links ==
=== Other lossless compressions comparisons ===
''Sorted based on last '''update''' date.''

* [http://uclc.info/LossLess.pdf Johan De Bock's speed oriented comparison] - best choices speedwise are indicated in green, mostly electronic music (last updated 2006-07-22)
* [http://web.inter.nl.net/users/hvdh/lossless/lossless.htm Hans Heijden's] -- used as reference to build the table (last updated 2006-07-07)
* [http://synthetic-soul.co.uk/comparison/lossless/ Synthetic Soul's comparison] (last update 2007-07-28)
* [http://synthetic-soul.co.uk/comparison/josef/ Josef Pohm's comparison, hosted by Synthetic Soul] (last update 2006-05-29)
* [http://www.bobulous.org.uk/misc/lossless_audio_2006.html Bobulous' lossless audio comparison] — a look at six lossless formats in terms of speed and file size (last updated 2006-05-22)
* [http://uclc.info/lossless_audio_compression_test.htm Johan De Bock's size oriented comparison] - aimed only at the maximum compression setting for each codec (based on a somewhat limited set of samples, however) (last updated 2006-05-19)
* [http://guruboolez.free.fr/lossless/ Guruboolez'] -- comparing only classical music (last updated 2005-02-27)
* [http://members.home.nl/w.speek/comparison.htm Speek's] (last updated 2005-02-07)

=== More on lossless compressions ===
* [http://losslessaudioblog.com/ The Lossless Audio Blog] - by windmiller, is a reliable and complete source of news about lossless compression.
* Go to the [http://www.hydrogenaudio.org/forums/index.php?showtopic=33226 Hydrogenaudio thread] to discuss this article.

[[Category:Guides]]

Download page

2011-07-27T17:25:19Z

Notat: Replay Gain -> ReplayGain

All programs mentioned anywhere in the wiki can be downloaded here.
See also the [[:Category:Software|Software Category]] article for more software not listed here.

==CD Rippers==
===Windows===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ccccff; margin-bottom: 20px;"
|- style="background:#ccccff"
! style="width:150px;" | Name
! style="width:90px;" | License
! style="width:100px;" | Website
! style="width:300px;" | Description
|-
! align="left" | [[CDex]]
| GPL
| [http://cdexos.sourceforge.net/ here]
| align="left" | An open-source ripper for Windows that uses cdparanoia functionality
|
|- style="background-color: #eeeeee;"
! align="left" | [[DBpowerAMP with AccurateRip|DBpowerAMP]]
| Free, Shareware
| [http://www.dbpoweramp.com/ here]
| align="left" | A secure ripper for Windows that includes Accurate Stream functionality
|-
! align="left" | Deep Ripper
| GPL
| [http://www.deepburner.com/ here]
|
|- style="background-color: #eeeeee;"
! align="left" | [[EAC]]
| Free
| [http://www.exactaudiocopy.de/ here]
| align="left" | A secure ripper for Windows, C2 error pointers, Accurate Stream, etc.
|-
! align="left" | [[BonkEnc]]
| GPL
| [http://www.bonkenc.org/ here]
| align="left" | Ripper with [[Cdparanoia]] support. It's an open-source project.
|}

===Mac OS X===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #bbffbb; margin-bottom: 20px;"
|- style="background:#bbffbb;"
! style="width:150px;" | Name
! style="width:90px;" | License
! style="width:100px;" | Website
! style="width:350px;" | Description
|-
! align="left" | [[Max]]
| GPL
| [http://sbooth.org/Max/ here]
| align="left" | A secure ripper for OS X that uses additional cdparanoia functionality
|-
|- style="background-color:#eeeeee;"
! align="left" | [[XLD]]
| GPL
| [http://tmkk.hp.infoseek.co.jp/xld/index_e.html here]
| align="left" | X Lossless Decoder(XLD) is a tool for Mac OS X that is able to decode/convert/play various 'lossless' audio files. The supported audio files can be split into some tracks with cue sheet when decoding. Can convert between many lossless and lossy formats. Plugin oriented design, for easy exchange for new encoders.
|}
===Linux===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ffcccc; margin-bottom: 20px;"
|- style="background:#ffcccc;"
! style="width:150px;" | Name
! style="width:90px;" | License
! style="width:100px;" | Website
! style="width:320px;" | Description
|- style="background-color:#eeeeee;"
! align="left" | abcde
| GPL
|[http://www.hispalinux.es/~data/abcde.php here]
| align="left" | A command-line based ripper with cdparanoia functionality
|-
! align="left" | [[cdparanoia]]
| BSD, GPL
| [http://www.xiph.org/paranoia/ here]
| align="left" | One of the first secure standalone rippers for the Linux platform
|- style="background-color:#eeeeee;"
! align="left" | [[Grip]]
| GPL
| [http://www.nostatic.org/grip here]
| align="left" | An open-source Gnome interface ripper that uses cdparanoia functionality
|-
! align="left" | [[Rubyripper]]
| GPL
| [http://www.rubyforge.org/ here]
| align="left" | A secure ripper for the Linux that uses additional cdparanoia functionality
|}

==CD/DVD Writers==
===Windows===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ccccff; margin-bottom: 20px;"
|- style="background:#ccccff"
! style="width:185px;" | Name
! style="width:80px;" | Unicode
! style="width:90px;" | License
! style="width:100px;" | Website
! style="width:270px;" | Description
|-
! align="left" | BurnAtOnce
| N
| Free
| [http://www.burnatonce.com/ here]
| align="left" | CD writing application based upon CDRDAO
|
|- style="background-color: #eeeeee;"
! align="left" | [[Burrrn]] (CDA only)
| N
| Free
| [http://www.burrrn.net/ here]
|
|-
! align="left" | CDBurnerXP
|
| Free
| [http://www.cdburnerxp.se/ here]
|
|- style="background-color: #eeeeee;"
! align="left" | DeepBurner Free
| N
| GPL
| [http://www.deepburner.com/ here]
|
|-
! align="left" | DeepBurner Pro
|
| Shareware
| [http://www.deepburner.com/ here]
|
|- style="background-color: #eeeeee;"
! align="left" | Express Burn
| N
| Free
| [http://nch.com.au/burn/index.html here]
|
|-
! align="left" | Express Burn Plus
|
| Shareware
| [http://nch.com.au/burn/index.html here]
|
|-style="background-color: #eeeeee;"
! align="left" | Infra Recorder
| N
| GPL
| [http://infrarecorder.sourceforge.net/ here]
|
|-
! align="left" | [[Nero]]
| N
| Shareware
| [http://www.nero.com/ here]
| align="left" |
|-style="background-color: #eeeeee;"
! align="left" | SilentNight Micro-CD Burner
| N
| Free
| [http://www.silentnight2004.com/Download.html here]
|
|}

===Mac OS X===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #bbffbb; margin-bottom: 20px;"
|- style="background:#bbffbb;"
! style="width:130px;" | Name
! style="width:80px;" | Unicode
! style="width:90px;" | License
! style="width:100px;" | Website
! style="width:310px;" | Description
|-
! align="left" | [[DVD-Audio Tools]]
| Y
| GPL
| [http://dvd-audio.sourceforge.net/ here]
| align="left" | Open-source DVD-Audio authoring application
|- style="background-color: #eeeeee;"
! align="left" | [[FireStarter FX]]
| N
| Free
| [http://www.projectomega.org/subcat.php?lg=en&php=products_firestarter here]
| align="left" | Free OS X Cocoa CD writing application
|-
! align="left" | [[X-CD-Roast]]
| N
| Free
| [http://www.xcdroast.org/xcdr098/xcdrosX.html here]
| align="left" | New OS X port of this Linux CD writing application
|- style="background-color: #eeeeee;"
! align="left" | Burn
| N
| Free
| [http://burn-osx.sourceforge.net/Pages/English/home.html/ here]
| align="left" | Versatile CD/DVD authoring application
|}

===Linux===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ffcccc; margin-bottom: 20px;"
|- style="background:#ffcccc;"
! style="width:130px;" | Name
! style="width:80px;" | Unicode
! style="width:90px;" | License
! style="width:100px;" | Website
! align="center" style="width:260px;" | Description
|-
! align="left" | CDRDAO
| N
| GPL
| [http://www.cdrdao.org/ here]
| align="left" | Cdrdao records audio or data CD-Rs in disk-at-once (DAO) mode
|- style="background-color:#eeeeee;"
! align="left" | DVD-Audio Tools
| Y
| GPL
| [http://dvd-audio.sourceforge.net/ here]
| align="left" | Open-source DVD-Audio authoring application
|-
! align="left" | [[Gnome Baker]]
| N
| GPL
| [http://www.gnomefiles.org/app.php?soft_id=291 here]
| align="left" | Popular open-source Gnome interface CD/DVD writing application
|- style="background-color:#eeeeee;"
! align="left" | [[K3b]]
| N
| GPL
| [http://www.k3b.org/ here]
| align="left" | Popular open-source KDE CD writing application for Linux platform
|-
! align="left" | [[X-CD-Roast]]
| Y
| GPL
| [http://www.xcdroast.org here]
| align="left" | New open-source Gnome interface CD/DVD writing application
|- style="background-color:#eeeeee;"
! align="left" | [[Brasero]]
| N
| GPL
| [http://projects.gnome.org/brasero/ here]
| align="left" | Brasero is a application to burn CD/DVD for the Gnome Desktop.(Gnome Default)
|}

==Multimedia Players==
===Windows===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ccccff; margin-bottom: 20px;"
|- style="background:#ccccff"
! style="width:120px;" | Name
! style="width:100px;" | License
! style="width:100px;" | Website
! align="center" style="width:220px;" | Description
|-
! align="left" | [[foobar2000]]
| Free, BSD
| [http://www.foobar2000.org/ here]
| align="left" | Advanced tagging, plugin capabilities, and kernel streaming support
|- style="background-color: #eeeeee;"
! align="left" | [[MediaMonkey]]
| Free, Shareware
| [http://www.mediamonkey.com/ here]
| align="left" | Supports many Winamp plugins
|-
! align="left" | MusikCube
| BSD
| [http://www.musikcube.com/ here]
| align="left" | Supports dynamic playlists and advanced SQL capabilities
|- style="background-color: #eeeeee;"
! align="left" | VUplayer
| Free
| [http://www.vuplayer.com/ here]
| align="left" | Supports many popular digital audio codecs and MOD tracker formats
|-
! align="left" | [[Winamp]]
| Free, Shareware
| [http://www.winamp.com/ here]
| align="left" | Popular audio player for Windows
| align="left" |
|- style="background-color: #eeeeee;"
! align="left" | [[wxMusik]]
| GPL
| [http://musik.berlios.de/ here]
| align="left" |A cross-platform open-source audio player
|-
! align="left" | [[VLC]]
| Free
| [http://www.videolan.org/vlc// here]
| align="left" | VLC media player is a highly portable multimedia player and multimedia framework capable of reading most audio and video formats as well as DVDs, Audio CDs VCDs, and various streaming protocols.
|}

===Mac OS X===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #bbffbb; margin-bottom: 20px;"
|- style="background:#bbffbb;"
! style="width:120px;" | Name
! style="width:100px;" | License
! style="width:100px;" | Website
! style="width:220px;" | Description
|-
! align="left" | Cog
| GPL
| [http://cogosx.sourceforge.net/ here]
| align="left" | An open-source digital audio player for OS X.
|- style="background-color: #eeeeee;"
! align="left" | [[wxMusik]]
| GPL
| [http://musik.berlios.de/ here]
| align="left" |A cross-platform open-source audio player
|-
! align="left" | Play
| GPL
| [http://sbooth.org/Play/ here]
| align="left" |Play is an application for playing and managing audio files.
|- style="background-color: #eeeeee;"
! align="left" | [[VLC]]
| Free
| [http://www.videolan.org/vlc// here]
| align="left" | VLC media player is a highly portable multimedia player and multimedia framework capable of reading most audio and video formats as well as DVDs, Audio CDs VCDs, and various streaming protocols.
|}

===Linux===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ffcccc; margin-bottom: 20px;"
|- style="background:#ffcccc;"
! style="width:120px;" | Name
! style="width:100px;" | License
! style="width:100px;" | Website
! style="width:220px;" | Description
|- style="background-color: #eeeeee;"
! align="left" | [[Amarok]]
| GPL
| [http://amarok.kde.org/ here]
| align="left" | Popular open-source KDE audio player similiar to Foobar2000
|-
! align="left" | [[wxMusik]]
| GPL
| [http://musik.berlios.de/ here]
| align="left" |A cross-platform open-source audio player
|- style="background-color: #eeeeee;"
! align="left" | [[XMMS]]
| GPL
| [http://www.xmms.org/ here]
| align="left" | Popular open-source audio player similiar to Winamp
|-
! align="left" | [[VLC]]
| Free
| [http://www.videolan.org/vlc// here]
| align="left" | VLC media player is a highly portable multimedia player and multimedia framework capable of reading most audio and video formats as well as DVDs, Audio CDs VCDs, and various streaming protocols.
|}

===PocketPC===
''These players may not play all your media files. Check their websites for the format support.''
* GSPlayer: http://hp.vector.co.jp/authors/VA032810/
* MortPlayer: http://www.sto-helit.de/
* TCPMP: http://tcpmp.corecodec.org/about

==Tagging Utilities==
===Windows===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ccccff; margin-bottom: 20px;"
|- style="background:#ccccff"
! style="width:150px;" | Name
! style="width:100px;" | License
! style="width:100px;" | Website
! style="width:270px;" | Description
|-
! align="left" | Abander TagControl
| Shareware
| [http://www.softartstudio.com/tagcontrol/ here]
|
|- style="background-color: #eeeeee;"
! align="left" | AudioShell
| Free
| [http://www.softpointer.com/AudioShell.htm here]
| align="left" | Integrates with Windows Explorer
|-
! align="left" | Frontah
| Free
| [http://home.vxu.se/mdati00/frontah/ here]
| align="left" | Transcode and tag editor for ID3v1.x, ID3v2.x, Lyrics3, Vorbis Comment, APEv1 & APEv2 tags. Supports ANSI, UTF8 and UTF16 text encoding depends on tag type.
|- style="background-color: #eeeeee;"
! align="left" | Magic MP3 Tagger
| Shareware
| [http://www.magic-tagger.com here]
| align="left" | Optimized for automatic music identification
|-
! align="left" | [[MediaMonkey]]
| Free, Shareware
| [http://www.mediamonkey.com/ here]
| align="left" | Also a Media Player & Library
|- style="background-color: #eeeeee;"
! align="left" | MetatOGGer
| Free
| [http://www.luminescence-software.org/ here]
| align="left" | Tags MP3 ([[ID3]]), Ogg files (Vorbis comment, including Ogg FLAC and Speex), Musepack, Windows Media, WavPack et Monkey's Audio
|-
! align="left" | MP3 Book Helper
| Free
| [http://mp3bookhelper.sourceforge.net/ here]
| align="left" | Tags [[ID3v1]], ID3v2.3, and Vorbis comments. Features: FreeDB, unicode, guessing and matching, and supporting PAR, SFV, SV, and NFO generation.
|- style="background-color: #eeeeee;"
! align="left" | [[MP3tag]]
| Free
| [http://www.mp3tag.de/ here]
| align="left" | Tags all files supporting [[ID3]], [[APEv2]], and [[Vorbis_Comment|Vorbis Comments]], not only MP3s
|-
! align="left" | [http://www.mp3-tag.com/ MP3 Tag Editor]
| Shareware
| [http://www.mp3-tag.com/ here]
| align="left" | Software to edit tags in audio files of [[MP3]], [[WMA]], [[OGG]], [[ASF]], and other music format.
|- style="background-color: #eeeeee;"
! align="left" | Mp3/Tag Studio
| Shareware
| [http://www.magnusbrading.com/mp3ts/ here]
| align="left" | Supports ID3v1 & v2 '''only'''. Powerful matching and fancy filters
|-
! align="left" | [[Tag.exe]]
| GPL
| [http://www.synthetic-soul.co.uk/tag/ here]
| align="left" | Command-line universal tagger for Windows
|- style="background-color: #eeeeee;"
! align="left" | Tag & Rename
| Shareware
| [http://www.softpointer.com/tr.htm here]
|
|-
! align="left" | TagScanner
| Free/Donate
| [http://xdev.narod.ru/tagscan_e.htm here]
|
|- style="background-color: #eeeeee;"
! align="left" | The GodFather
| Card/Donate
| [http://users.otenet.gr/~jtcliper/tgf/ here]
|
|-
! align="left" | [http://wmptagext.sourceforge.net/download.html WMPTSE]
| Free/Donate
| [http://wmptagext.sourceforge.net here]
| align="left" | Software to integrate other tag format than [[ID3]] into Microsoft Windows Media Player.
|}

===Mac OS X===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #bbffbb; margin-bottom: 20px;"
|- style="background:#bbffbb;"
! style="width:150px;" | Name
! style="width:100px;" | License
! style="width:100px;" | Website
! style="width:270px;" | Description
|-
! align="left" | Tag
| GPL
| [http://sbooth.org/Tag/ here]
| align="left" | An open-source tagging application for OS X
|}

===Linux===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ffcccc; margin-bottom: 20px;"
|- style="background:#ffcccc;"
! style="width:150px;" | Name
! style="width:100px;" | License
! style="width:100px;" | Website
! style="width:270px;" | Description
|-
! align="left" | EasyTAG
| GPL
| [http://easytag.sourceforge.net/ here]
| align="left" | Gnome tagging utility
|- style="background-color: #eeeeee;"
! align="left" |
|}

==Encoders, Decoders, Etc.==
All basic tools needed to make use of the audio formats supported here.

===[[MP3]]===
* [[LAME]] encoder/decoder: [http://www.rarewares.org/mp3.html download pre-compiled binaries here]. Also check the [[Lame Compiles|Latest recommended version]] page.
* [[MP3Gain]], a ReplayGain-like utility: [http://mp3gain.sourceforge.net/download.php download here]

===Ogg [[Vorbis]]===
Currently, all recommended Ogg Vorbis utilities are available at the [http://www.rarewares.org/ogg.html Rarewares Ogg Vorbis page]. The following tools are important:

* '''OggEnc2''': A command-line Ogg Vorbis encoder that can be used with most CD rippers.

* '''OggDec''': Command-line decoder.

* '''[[OggDropXPd]]''': An easy to use, drag'n'drop encoder/decoder with support for automatic tagging, renaming and playlist creation on encoding.

* ''Encoding DLLs'': For encoding within CDex or WinLame.

* '''VorbisGain''': The [[ReplayGain]] utility for Ogg Vorbis.

In addition, the [[Lancer]] suite — a highly SSE-optimized suite of utilities and libraries — are available at [http://homepage3.nifty.com/blacksword/ this page] ''(in Japanese)''. See [[Lancer#Platform-specific Builds|this section]] for information about the different builds.

===[[Musepack]] (MPC)===
* [http://www.musepack.net/index.php?pg=win Download MPC for Windows]
* [http://www.musepack.net/index.php?pg=lin Download MPC for Linux]
* [http://www.musepack.net/index.php?pg=osx Download MPC for Mac OS X]
* [http://www.musepack.net/index.php?pg=src Download MPC source code]

* [http://forum.musepack.net/showthread.php?t=395 Forum announcement of SV8 release]

===[[FLAC]]===
* CoolEdit / Adobe Audition Filter supporting FLAC: [http://www.vuplayer.com/other.php download here]
* Various FLAC-related utilities (incl. ReplayGain utility): [http://flac.sourceforge.net/download.html FLAC's SourceForge Download page]

==Transcoders==
''Note: Although these tools may convert from one encoding to another, please remember that [[transcoding]] to any [[lossy]] encoding will result in a degraded quality.''
* BeSweet: http://besweet.notrace.dk/
* [[BonkEnc]]
* dBpowerAMP Music Converter (dMC): http://www.dbpoweramp.com/dmc.htm
* [[foobar2000]] (needs 3rd-party encoders)
* MediaCoder: http://www.rarewares.org/mediacoder/
* Omni Encoder: http://omniencoder.autobotcity.net/
* [[Winamp]]
* WinLAME: http://winlame.sourceforge.net/

==Processing utilities==
===Windows===
{| border="0" cellpadding="0" cellspacing="1" style="text-align:center; border:2px solid #ccccff; margin-bottom: 20px;"
|- style="background:#ccccff;"
! style="width:120px;" | Name
! style="width:100px;" | License
! style="width:100px;" | Website
! style="width:400px;" | Description
|- style="background-color: #eeeeee;"
! align="left" | [[lossyWAV]]
| GPL
| [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=504087 here]
| align="left" | lossyWAV is a lossy pre-processor for [[Wikipedia:Pulse-code modulation|PCM]] (uncompressed) WAV files. It reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly.
|}

==Drivers==
===ASPI===
* Ahead Nero ASPI Driver: [ftp://ftp6.nero.com/wnaspi32.dll download]
* Adaptec Windows ASPI Package: [http://www.adaptec.com/worldwide/support/suppdetail.jsp?sess=no&prodkey=ASPI-4.70 official website]
* ForceASPI [http://radified.com/ASPI/forceaspi.htm radified.com]
* ASPI4all [http://www.cdr-zone.com/software/aspi_layers/aspi4all.html CDR-Zone.COM]
* FrogAspi [http://www.frogaspi.org/ official website]
* VOB ASAPI Driver 1.3: [http://www.rarewares.org/files/ASAPI.exe download]

===Sound===
* ALSA Project [http://www.alsa-project.org/ official website]
* kX Project [http://kxproject.lugosoft.com/ official website]
* ZonaISIS [http://www.hispasonic.com/zonaisis/index.htm unofficial]
* I have a dream ... [http://members.aol.com/cridi/ unofficial]

==Links==
* [http://www.reactos.org/wiki/index.php/Untested_%28open_source%29_software_list Open source softwares @ ReactOS wiki]
* [http://www.rarewares.org/ RareWares]

[[Category:Software]]

WavPack

2011-07-27T17:24:51Z

Notat: Replay Gain -> ReplayGain

'''WavPack''' is a free, open source [[lossless]] audio compression format developed by David Bryant.

== Description ==
WavPack (pronounced "wave-pack") allows users to compress (and restore) all [[PCM]] audio formats including 8, 16, and 24-bit ints; 32-bit floats; [[mono]], [[stereo]], and [[multichannel]]; [[sampling rate]]s from 6 to 192 kHz. Like other lossless compression schemes the data reduction varies with the source, but it is generally between 25 % and 50 % for typical popular music and somewhat better than that for classical music and other sources with greater dynamic range.

WavPack also incorporates a unique "hybrid" mode that provides all the advantages of lossless compression with an additional bonus. Instead of creating a single file, this mode creates both a relatively small, high-quality lossy file that can be used all by itself, and a "correction" file that (when combined with the lossy file) provides full lossless restoration. For some users this means never having to choose between lossless and lossy compression!

== Feature Summary ==
* Fast and efficient encoding and decoding
* [[Open source]], released under a BSDish license
* Multiplatform
* Hardware support
* Error robustness
* Streaming support
* Supports multichannel audio and high resolutions
* Hybrid/lossy mode
* Tagging support ([[ID3v1]], [[APE]])
* Supports [[RIFF]] chunks
* Supports embedded CUE sheets
* Includes MD5 hashes for quick integrity checking
* Ability to create self extracting files for Win32 platform
* [[ReplayGain]] compatible

== History ==
David Bryant started development on WavPack in mid-1998, with the release of version 1.0. This first version compressed and decompressed audio losslessly, nothing else, but by then it already featured one of the best efficiency versus speed ratio among lossless encoders.

Very soon after the release of version 1.0, Bryant released v. 2.0, which featured lossy encoding (using only quantization for data reduction – no psychoacoustic process was applied to the stream).

In 1999, the developer released version 3.0, which featured novelties such as a fast mode (with reduced compression ratio), compression of RAW files and error detection using CRC checksums.

WavPack development is still going on, and a major feature added in late 3.x versions is the hybrid mode, where the encoder generates a lossy file + a correction file, so that both can be decompressed back to the original PCM stream.

WavPack 4 has been recently released. It included important changes, such as fast seeking, multichannel support, high resolution audio support, etc. turning it into one of the most full featured and modern lossless audio compressors.

== Software support ==
=== Players ===
* NullSoft [[Winamp]] (plugin with ReplayGain & Media Library support) and Winamp-compatible players
* [[foobar2000]] Advanced Audio Player (official encoding/decoding addon, with ReplayGain & Cuesheets support)
* [http://www.vuplayer.com/vuplayer.htm VUPlayer] (official plugin, supports encoding)
* [[Windows Media Player]] and other directshow-based players (MPC, TCMP, RadLight) (with [http://corewavpack.corecodec.org/ CoreWavPack] directshow filter)
* [http://koti.welho.com/hylinen/apollo/ Apollo] Audio Player (plugin with ReplayGain support)
* [http://www.un4seen.com/xmplay.html XMplay] (official plugin)
* [http://cogosx.sourceforge.net/ Cog] Audio player for MacOS X.
* [[XMMS]] (with Kuniklo's plugin)
* [http://fondriest.frederic.free.fr/realisations/lamip/ LAMIP] (official plugin)
* [http://mpxplay.net/ MPXplay] for DOS!
* [http://aqualung.sourceforge.net/ Aqualung] for GNU/Linux

=== Frontends ===
* Custom [http://members.home.nl/w.speek/wavpack.htm Windows Frontend] (by Speek)
* [http://www.unifront.boereck.de/ UniversalFront] by Böreck
* [http://home.vxu.se/mdati00/frontah/ Frontah] by Madah
* [http://www.webearce.com.ar/ MAREO] by Kwanbis

=== Converters ===
'''Note:''' ''Several players, like foobar2000 and VUplayer, can also convert from other formats to WavPack)''
* [http://www.dbpoweramp.com/ dBpowerAMP] Music Converter / Audio Player / CD Writer (official addon)
* [http://www.board-24.de/ GX:Transcoder] Music converter

=== Editors ===
* [[Adobe Audition]] and Cool Edit (filter with 32-bit floats & extra info save support)

=== CD writers/rippers ===
* [http://www.ahead.de Ahead Nero Burning Rom]
* [http://www.burrrn.net Burrrn] Audio CD burner
* [[Exact Audio Copy]] CD Ripper
* [http://cdexos.sourceforge.net CDex] CD ripper

=== Taggers ===
* [http://www.mp3tag.de/en/index.html Mp3tag] Universal Tag Editor
* [http://users.otenet.gr/~jtcliper/tgf/ The GodFather] Tagger / Music manager
* [[Tag.exe|Case's Tag]] command line tagger

=== Other tools ===
* [http://www.burrrn.net/mrq/ Mr. QuestionMan]
* [http://www.bitattack.ro/ai/ Audio Identifier]
* [http://www.bunkus.org/videotools/mkvtoolnix/ mkvtoolnix] – tool to multiplex WavPack streams inside the Matroska container
''It's worth mentioning the [[Matroska]] guys decided to concentrate on WavPack as the lossless compressor of choice for their container. Quite an honor... :-)''

== Hardware Support ==
* iRiver iHP-120/iHP-140 with the open source [http://www.rockbox.org Rockbox firmware]
* [http://www.rokulabs.com/products/photobridge/features.php Roku PhotoBridge HD] (with [http://www.wavpack.com/downloads.html plugin])

== Technology description ==
To ensure high-speed operation, WavPack uses a very simple predictor that is implemented entirely in integer math. In its "fast" mode the prediction is simply the arithmetic extrapolation of the previous two samples. For example, if the previous two samples were -10 and 20, then the prediction would be 50. For the default mode a simple adaptive factor is added to weigh the influence of the earlier sample on the prediction. In our example the resulting prediction could then vary between 20 for no influence to 50 for full influence. This weight factor is constantly updated based on the audio data's changing spectral characteristics, which is why it is called "adaptive".

The prediction generated is then subtracted from the actual sample to be encoded to generate the error value. In mono mode this value is sent directly to the coder. However, stereo signals tend to have some correlation between the two channels that can be further exploited. Therefore, two error values are calculated that represent the difference and average of the left and right error values. In the "fast" mode of operation these two new values are simply sent to the coder instead of the left and right values. In the default mode, the difference value is always sent to the coder along with one of the other three values (average, left, or right). An adaptive algorithm continuously determines the most efficient of the three to send based on the changing balance of the channels.

The developer has developed a unique data encoder for WavPack that he believes is better than Rice coding in two different areas. It is impossible to encode more efficiently than Rice coding because it represents the optimal bit coding (sometimes known as the Huffman code) for this type of data. WavPack's encoder is slightly less efficient than this, but only by about 0.15 bits/sample (or less than 1% for 16-bit data). The first advantage of WavPack's coder is that it does not require the data to be buffered ahead of encoding, instead it converts each sample directly to bitcodes. This is more computationally efficient and it is better in some applications where coding delay is critical. The second advantage is that it is easily adaptable to lossy encoding because all significant bits (except the implied "one" MSB) are transmitted directly. In this way it is possible to only transmit, for example, the 3 most significant bits (with sign) of each sample. In fact, it is possible to transmit only the sign and implied MSB for each sample with an average of only 3.65 bits/sample.

This coding scheme is used to implement the "lossy" mode of WavPack. In the "fast" mode the output of the non-adaptive decorrelator is simply rounded to the nearest codable value for the specified number of bits. In the default mode the adaptive decorrelator is used (which reduces the average noise about 1 dB) and also both the current and the next sample are considered in choosing the better of the two available codes (which reduces noise another 1 dB).

The developer has decided to not use any floating-point arithmetic in WavPack's data path because he believes that integer operations are less susceptible to subtle chip to chip variations that could corrupt the lossless nature of the compression, the Pentium floating point bug being a blatant example of this. It is possible that a lossless compressor that used floating-point math could generate different output when running on that faulty Pentium. Even disregarding actual bugs, floating-point math is complicated enough that there could be subtle differences between "correct" implementations that could cause trouble for this type of application. To further ensure confidence in the integrity of WavPack's compression, the encoder includes a 32-bit error detection code to the generated streams.

WavPack source code is very portable. It has been compiled on several Unices (Linux, Mac OS X, Solaris, FreeBSD, OpenBSD, NetBSD, Compaq Tru64, HP-UX...) as well as Windows, DOS and OpenVMS. It works on architectures such as x86, ARM, PowerPC, SPARC, DEC Alpha, PA-RISC, MIPS, Motorola 68k...

== External links ==
* [http://www.wavpack.com/ Official website]
* [http://www.rarewares.org/lossless.html Unofficial multiplatform versions] at RareWares
* [http://www.rjamorim.com/rrw/wavpack.html Historical versions] at ReallyRareWares
* [[Lossless_comparison|Lossless Codec Comparison]] by Rjamorim
* [[EAC_and_WavPack | Configuring EAC and Wavpack]]

[[Category:Codecs]]
[[Category:Lossless]]

IRiver H-Series

2011-07-27T17:24:22Z

Notat: Replay Gain -> ReplayGain

The '''H-series''' is a series of portable harddisk based audio player created by Korean company iRiver.

= Overview =
== Common features ==
These features are common for all players in the H-series:
* Supports playback of [[MP3]], [[WMA]], [[ASF]], and [[WAV]]
* USB 2.0
* FM Radio
* Microphone/dictaphone
* Real-time encoding to MP3
* H100 and H300 series support [[Ogg Vorbis]]

== H100-series ==
The H100-series consist of the H110, H120 and H140, with capacities of 10GB, 20GB and 40GB respectively.

* Digital optical-in/out
* Microphone/line-in recording in .wav or [[MP3]] format

With the open source [http://www.rockbox.org Rockbox firmware], the H100 series also has the following features:
* User can switch back to the original firmware at boot time
* [[FLAC]] and [[WavPack]] support
* True gapless playback
* [[ReplayGain]] support on some formats
* Crossfade
* On-the-fly playlist
* Speech menus and directories (Talkbox) for visually impaired users
* No [[WMA]] support, user must switch back to original firmware to decode WMA.

== H300-series ==
The H300 series includes the H320 and the H340, which are 20 GB and 40 GB, respectively. The H300 series is available in a North American version and an international version, with some slight differences noted below. The H300 line has been '''discontinued'''.

* 2" Color TFT screen
* FM Recording
* USB On The Go (international models only)
* DRM capability (North American models only)
* Video playback supported at 10 FPS (international models only. North American models can be modified to support video playback by upgrading to international firmware 1.25 or higher; however, installation of international firmware will permanently disable DRM support)

== H10 Series ==
* Available in 5 GB, 6 GB and 20 GB models
* Play for sure compatible
* Color screen

== External links ==
* [http://www.iriver.com iRiver: Homepage]
* [http://www.iriver.us/ Mistic River - Site for iriver enthusiasts]
* [http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iriver_H110_H115_H120_H140 IRiver H1xx Rockbox Port]

[[Category:Digital Audio Players]]

Vorbis

2011-07-27T17:24:02Z

Notat: Replay Gain -> ReplayGain

{{featured}}
{{Codec Infobox
| name = Ogg Vorbis
| logo = [[Image:Fish logo.png]]
| type = lossy
| purpose = General audio compression at bitrates ~64–400 kbps
| maintainer = Christopher Montgomery, Xiph Community
| recommended_encoder = aoTuV
| recommended_text = aoTuV Beta 5
| website = http://www.vorbis.com/
}}

= Introduction =
'''Vorbis''' (commonly used inside the [[Ogg]] container) is a fully open, non-proprietary, patent-free (subject to [http://www.hydrogenaudio.org/forums/index.php?showtopic=13531 speculation]), and royalty-free, general-purpose compressed audio format for mid to high quality (8 khz–48.0 kHz, 16+ bit, [[multichannel]]) audio and music at fixed and variable bitrates from 16 to >256 kbps/channel. This places vorbis in the same competitive class as audio representations such as MPEG-4 ([[AAC]]), and similar to, but higher performance than [[MP3]], TwinVQ ([[VQF]]), [[WMA]] and [[PAC]]. Vorbis is the first of a planned family of Ogg multimedia coding formats being developed as part of Xiph.org's ogg multimedia project.

Informal listening test suggests Vorbis to be comparable to MPEG-4 [[AAC]] at most bitrates and [[Musepack]] at 128 kbps. Transparency is generally reached at about 150–170 kbps (-q 5) (with some exceptions). The encoder is reasonably young and unoptimized, so further improvements can always be expected.

Unfortunately, Xiph.org has failed to improve Vorbis at a steady rate since its initial 1.0 release in July 2002 (due to other developement projects and time constraints). Since then development has been led by other coders such as [http://sjeng.org/vorbisgt3.html Garf] and [http://www.geocities.jp/aoyoume/aotuv/ Aoyumi]. Aoyumi's '''[[aoTuV]]''' series of encoders was incorporated into the September 2004 release of 1.1, which brought about the first quality improvements across the board for 2 years. Aoyumi's Beta 4.51 was found to be very good, so it was re-branded into aoTuV Release 1 and it was the recommended encoder until June 2007. The latest tuning is aoTuV beta 5, which improves further on the low-bitrate quality without sacrificing compression, and it is currently the recommended Vorbis encoder at Hydrogenaudio.

At the time being, Aoyumi's tuning (since aoTuV Release 1 up to aoTuV Beta 5) has not been incorporated yet into the 'official' Vorbis line.

Vorbis has had success with many recent video game titles employing Vorbis as opposed to MP3 (with Epic Games' Unreal Tournament 2003 and Unreal Tournament 2004, the PC port of Microsoft's Halo and Uru being notable examples). (Ogg) Vorbis is also an official part of the [http://www.openal.org/extensions.html OpenAL] API extension library, used in many popular [http://www.openal.org/titles.html computer games]. On April 10, 2006, [http://www.radgametools.com/ RAD Game Tools] integrated (Ogg) Vorbis support to their Miles Sound System (MSS), which has been used in over 3,200 games worldwide. This ensures that future games utilizing MSS will have the capability to play (Ogg) Vorbis files. Check out [http://wiki.xiph.org/index.php/Games_that_use_Vorbis xiph wiki] for a full list of games confirmed to use (Ogg) Vorbis.

Vorbis was recently adopted in May of 2010 as the open source codec for Google's new [http://www.webmproject.org/ WebM] project. WebM is combination of the BSD licensed VP8 video codec, Vorbis, and the [[webm]] container a subset of the [[Matroska]] container. It is expected to obtain widespread adoption with a major backing by many hardware based chip manufactures and with the release of Google's new mobile Android platform and Google TV by the year 2011.

'''Before encoding files using (Ogg) Vorbis, check out the [[Recommended Ogg Vorbis|Recommended (Ogg) Vorbis]] article to determine what encoder to use and what settings are recommended by Hydrogenaudio.'''

== Pros ==
* (Ogg) Vorbis specification is in the public domain; it is free for commercial or noncommercial use, under both (LGPL and BSD licenes)
* Easy to use high-level API (Application Programming Interface)
* Good all-round performance (>48 kbps – a leading codec at [http://www.rjamorim.com/test/multiformat128/results.html 128 kbps])
* Well written [http://www.xiph.org/ogg/vorbis/docs.html specs]
* Supported by most portable (Ogg) [[Vorbis#Supporting Digital Audio Players|DAPs]]
* Suitable for internet-streaming (via [http://www.icecast.org/ Icecast] and other methods)
* Fully [[gapless]] playback
* High potential for further tuning
* Structured to allow the design for a hybrid filterbank

== Cons ==
* Limited official development (third-party developement is always encouraged)
* Some implementations are more computationally intensive to decode than MP3 (depending upon the architecture and [[Tremor]] optimizations).

= Technical Information =
* Multiple block sizes for window switching including overlap (powers of two only) ''(128/1024, 256/2048, 512/4096)''
* Customly designed [[window function]] is applied similiar to the sine window. it has (good sidelobe rejection)
:<math>w_k = \sin{(\frac{\pi}{2} \cdot sin^2[(\pi\div2n \cdot (k+0.5))]}</math>
* Psychoacoustics masking is exploited via an ([[ATH]] model)
* Masking curves are computed from an ''emperically'' adjusted set of [http://www.zainea.com/masking2.htm Ehmer Curves]
* Modified Discrete Cosine Transform ([[MDCT]]) is used for noise analysis
* Fast Fourier Transform ([[FFT]]) is used for tonal analysis
* Global masking curve is a mixture between calculated FFT+MDCT curves and ATH curves overlayed
* Floor 1 or the noise-floor (envelope) is calculated using the global masking curve & piecewise linear approximation divided by spectrum to generate the residue (fine detail). The Levinson-Durbin [http://www.data-compression.com/speech.html#ana LPC model] in Floor 0 is no longer used, however the code still exists
* [[Noise normalization]] is applied to compensate for energy lost in certain frequency bands due to quantization (rounding).
* The channels are [[channel coupling|coupled]] ''strictly'' by residue using ([http://us.xiph.org/ogg/vorbis/doc/stereo.html point/phase stereo] and lossless)
* Multistage [[Vector quantization]] is used for coding the noise-floor and residue backend using ''trained'' codebooks.
* [[Huffman coding]] is used to minimize vector codeword redundancy

= Software =
=== Encoders ===
* [[Oggenc]] official command-line encoder (Win32/Posix)
* [[OggDropXPd|OggdropXPd]] advanced drag-and-drop encoder by John33 (Win32)
* [[Lancer]] SSE-optimized vorbis encoder utility and libraries by BlackSword (Win32/Posix)
* [http://www.saunalahti.fi/cse/foobar2000/ foo_vorbisenc] vorbis encoder library for foobar2000 (Win32)

=== Decoders ===
* [http://www.rarewares.org/ogg.html OggDec] for Windows, by John33, a very featureful command line decoder (Win32)
* [[Ogg123]] for Unix systems (GPL), a very simple to use command-line player. (Win32/Posix)
* [http://www.illiminable.com/ogg/ illiminable Ogg Directshow Filters] also plays Speex, Theora and FLAC (Win32)
* [http://www.xiph.org/quicktime/ XiphQT] (Xiph's QuickTime Components) allows playback in [[QuickTime]]/[[iTunes]] (Win32/MacPPC/MacIntel)
* [http://corevorbis.corecodec.org/ CoreVorbis] DirectShow filter (Win32)

=== ReplayGain ===
* [http://www.rarewares.org/ogg.html VorbisGain] to apply [[ReplayGain]] on Vorbis files (Win32)
** Instructions to integrate VorbisGain into foobar2000, Winamp, and Windows Explorer can be found in [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=41880&view=findpost&p=396612 this HA thread] ''A precompiled script of the procedure (in RAR format) can be found in [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=45196&view=findpost&p=397803 this HA posting]''
** RPM packages for VorbisGain are available [http://rpm.pbone.net/index.php3?stat=3&search=vorbisgain&srodzaj=3 here], and source code for VorbisGain is available [http://sjeng.org/vorbisgain.html here]
* [http://www.rarewares.org/quantumknot/vorbisgain.gz VorbisGain Static GCC 4 compile] (Posix)
* [[foobar2000]]'s ''ReplayGain scanner'' supports (Ogg) Vorbis files
* [[Winamp]]'s ''ReplayGain Analyzer'' supports (Ogg) Vorbis files
* [[MediaMonkey]]'s ''Volume Levelling'' supports (Ogg) Vorbis files

=== Splitters ===
The following utilities are used to splice Vorbis streams without decoding/re-encoding.

* [http://www.free-codecs.com/download/Ogg_Cutter.htm Ogg Cutter] (Win32)
* [http://mp3splt.sourceforge.net/ mp3splt] (Win32)
* [http://www.xiph.org/downloads/ vcut] (CLI tool part of the official vorbis-tools package) (Win32/Posix)
* [http://sourceforge.net/projects/ogg-cut/ (Ogg) Vorbis Stream Cutter (ogg-cut)] (Posix)

=== Taggers ===
Most tagger supporting (Ogg) Vorbis are listed in [[download page#Tagging Utilities|the download page]].
* [http://www.rarewares.org/files/ogg/vorbiscomment-1.1.1.zip Vorbis Comment]

= Supported Digital Audio Players =
The following list contains some players that support Vorbis playback.
* [[Apple iPod]] with [[Rockbox]] firmware – check out this [http://www.hydrogenaudio.org/forums/index.php?s=32eeac65958144db631c8a739b41983c&showtopic=40992 HA thread]
* [http://www.ifreemax.com/ FreeMax] FW-960
* [http://www.iaudiophile.net/ iAudio] [[IAudio M3|M3]], M5, U2, G3, X5, I5, 7, D2, F2, T2, A3, Q5W, A2; VorbisGain support only on X5/M5 with [[Rockbox]] firmware
* [[iRiver H-Series]] with [[Rockbox]] firmware
* [[MPIO H-Series]]
* [[Neuros]] with [[Rockbox]] firmware
* [[Rio Karma]]
* [http://www.samsung.com/Products Samsung]
* [http://www.slimdevices.com/ Slim Devices: Squeezebox] External player
* [http://www.yepp.co.kr/ Yepp] YP-T6, YP-T7, YP-C1, YP-F1, YP-53 (Firmware 1.200), other..

A longer list can be found at [http://wiki.xiph.org/index.php/PortablePlayers xiph's wiki].

'''Important note:''' There may be players out there that support (Ogg) Vorbis, although they are not marketed as such.

= External links =
The following links contain information surrounding the (Ogg) Vorbis codec that can be found on Hydrogenaudio and elsewhere throughout the web.

=== Hydrogenaudio Wiki ===
* [[Ogg]] (Container)
* [[Listening Tests#Multiformat Tests|Listening tests comparing Vorbis against MP3, AAC, WMA, etc.]]
* [[Recommended Ogg Vorbis|Recommended settings for encoding with Vorbis]] and its related [http://www.hydrogenaudio.org/forums/index.php?showtopic=15049 HA thread]
* [[EAC and Ogg Vorbis|Configuring EAC and Vorbis as an external command-line encoder]]

=== Websites ===
* [http://www.vorbis.com Vorbis official website] (updated continually)
* [http://en.wikipedia.org/wiki/Vorbis Vorbis at Wikipedia]
* [http://www.playogg.org/ PlayOgg initiative] by the [http://www.fsf.org/ Free Software Foundation]
* [http://www.audiocoding.com/modules/wiki/?page=Ogg+Vorbis (Ogg) Vorbis at AudioCoding]
* [http://www.rarewares.org/ogg.html (Ogg) Vorbis binaries at Rarewares]
* [http://www.geocities.jp/aoyoume/aotuv/ Aoyumi's homepage of tuned versions of Vorbis encoder and current beta binaries]
* [http://homepage3.nifty.com/blacksword/index_e.htm The (Ogg) Vorbis Acceleration Project] – Archer/[[Lancer]] homepage for optimized versions of aoTuV Vorbis encoder and other SSE optmizations
* [http://www.xiph.org/ Xiph.org Foundation]

=== Scientific/R&D ===
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=20132&st=0 Noise Normalization and HF Boost problem solution that ultimatly lead to the aoTuV tunings (HA Thread)]
* [http://www.free-comp-shop.com/vorbis.pdf Keith Wright rendition of understanding the MDCT in Vorbis by defining it's basic trig properties (PDF)]
* [http://www.mp3-tech.org/programmer/docs/embedded_vorbis_thesis.pdf (Ogg) Vorbis decoder for an embedded system (Master Thesis in PDF)]
* [http://wiki.xiph.org/index.php/Bounties Xiph.org Vorbis bounties]

[[Category:Lossy]]

Free Lossless Audio Codec

2011-07-27T17:23:32Z

Notat: Replay Gain -> ReplayGain

{{Codec Infobox
| name = FLAC
| logo = [[Image:FLAC logo.gif]]
| type = lossless
| purpose = Popular open source patent free lossless compression scheme.
| maintainer = Josh Coalson, Xiph Community
| recommended_encoder = FLAC encoder
| recommended_text = FLAC v1.2.1
| website = http://flac.sourceforge.net/
}}
'''Free Lossless Audio Codec''' ('''FLAC''') is a codec for lossless audiocompression.
Grossly oversimplified, FLAC is similar to [[MP3]], but [[lossless]], meaning that audio is compressed in FLAC without any loss in quality. This is similar to how Zip works, except with FLAC you will get much better compression because it is designed specifically for audio, and you can play back compressed FLAC files in your favorite player (or your car or home stereo, if supported) just like you would an MP3 file.

== General aspects of the format ==
FLAC is freely available and supported on most operating systems, including Windows, UNIX (Linux, *BSD, Solaris, OS X, IRIX), BeOS, OS/2, and Amiga. There are build systems for autotools, MSVC, Watcom C, and Project Builder.

The FLAC project consists of:
* the stream format
* reference encoders and decoders in library form
* flac, a command-line program to encode and decode FLAC files
* metaflac, a command-line metadata editor for FLAC files
* input plugins for various music players

When it's said that FLAC is ''free'', it means more than just that it is available at no cost. It means that the specification of the format is fully open to the public to be used for any purpose, although the FLAC project reserves the right to set the FLAC specification and certify compliance. It also means that neither the FLAC format nor any of the implemented encoding/decoding methods are covered by any known patent. And it means that all the source code is available under open-source licenses. It is the first truly open and free lossless audio format.

Some claim FLAC is the most widely used lossless compression format on UNIX systems, though it seems more likely that shn retains that honor on all OS platforms. FLAC files also can be placed inside an Ogg container using libOggFLAC and libOggFLAC++.

== Features ==
* '''Lossless:''' The encoding of audio (PCM) data incurs no loss of information, and the decoded audio is bit-for-bit identical to what went into the encoder. Each frame contains a 16-bit CRC of the frame data for detecting transmission errors. The integrity of the audio data is further insured by storing an MD5 signature of the original unencoded audio data in the file header, which can be compared against later during decoding or testing.
* '''Fast:''' FLAC is asymmetric in favor of decode speed. Decoding requires only integer arithmetic, and is much less compute-intensive than for most perceptual codecs. Real-time decode performance is easily achievable on even modest hardware.
* '''Hardware support:''' Because of FLAC's free reference implementation and low decoding complexity, FLAC is currently the only lossless codec that has any kind of hardware support.
* '''Streamable:''' Each FLAC frame contains enough data to decode that frame. FLAC does not even rely on previous or following frames. FLAC uses sync codes and CRCs (similar to MPEG and other formats), which, along with framing, allow decoders to pick up in the middle of a stream with a minimum of delay.
* '''Seekable:''' FLAC supports fast sample-accurate seeking. Not only is this useful for playback, it makes FLAC files suitable for use in editing applications.
* '''Flexible metadata:''' New metadata blocks can be defined and implemented in future versions of FLAC without breaking older streams or decoders. Currently there are metadata types for tags, cue sheets, and seek tables. Applications can write their own APPLICATION metadata once they register an ID.
* '''Suitable for archiving:''' FLAC is an open format, and there is no generation loss if you need to convert your data to another format in the future. In addition to the frame CRCs and MD5 signature, flac has a verify option that decodes the encoded stream in parallel with the encoding process and compares the result to the original, aborting with an error if there is a mismatch.
* '''Convenient CD archiving:''' FLAC has a ''cue sheet'' metadata block for storing a CD table of contents and all track and index points. For instance, you can rip a CD to a single file, then import the CD's extracted cue sheet while encoding to yield a single file representation of the entire CD. If your original CD is damaged, the cue sheet can be exported later in order to burn an exact copy.
* '''Error resistant:''' Because of FLAC's framing, stream errors limit the damage to the frame in which the error occurred, typically a small fraction of a second worth of data. Contrast this with some other lossless codecs, in which a single error destroys the remainder of the stream.

== Pros ==
* Portable to many systems
* Source open and freely licenced
* Hardware support (PhatBox, Kenwood MusicKeg, Rio Karma, etc. See below)
* Streaming support
* Extremely fast decoding
* Supports multichannel and high resolution streams
* Supports [[ReplayGain]]
* Supports cue-sheet (with some limitations)
* Gaining wide use as successor to [[Shorten]]

== Cons ==
* Compresses less efficiently than other popular modern compressors ([[Monkey's Audio]], [[OptimFROG]])
* Higher compression modes slow, for little gain over the default setting.

== Hardware and software that support FLAC ==
For a more comprehensive list see the [http://flac.sourceforge.net/links.html FLAC links page].

=== Hardware ===
==== Home stereo ====
* [http://www.request.com/us/ AudioReQuest] music servers
* [http://www.avegasystems.com/ Avega Systems]' wireless [http://www.avegasystems.com/_documents/Oyster_Specifications.pdf Oyster] loudspeakers
* Digital Techniques' "iStereo" [http://www.digitaltechniques.com/M300A_Overview.html M300A Digital Music Player]
* Escient's [http://www.escient.com/ FireBall servers (E2-40/160/300, DVDM-300)]
* [http://www.hermstedt.com/english/hifidelio/hifidelio.html Hifidelio]
* [http://www.imuse.us/ iMuse] audio/video media servers
* Meda Systems' [http://www.medainc.com/ Bravo servers]
* The [http://www.cesweb.org/attendees/show_floor/product_locator/product_details.asp?prodid=5181 MS300 Music Server] by McIntosh Laboratory
* Olive's [http://www.olive.us/ Symphony] wireless digital music center
* [http://www.phatnoise.com/products/homeplayer/index.php PhatNoise Home Digital Media Player]
* [http://www.numark.com/ Numark]'s DJ equipment (HDX and CDX turntables, HDMIX mixer)
* [http://www.mock.com/receiver/ Rio Reciever] and Dell Digital Audio Receiver
* [http://www.rokulabs.com/products/photobridge/features.php Roku PhotoBridge HD] (with [http://homepage.ntlworld.ie/p.mc.quillan/FLAC_V0.7.zip plugin])
* [http://www.skipjam.com/ SkipJam]'s networked audio/video devices
* [http://www.sonos.com/ Sonos Digital Music System]
* Slim Devices' [http://www.slimdevices.com/pi_transporter.html Transporter] and [http://www.slimdevices.com/pi_squeezebox.html Squeezebox] networked audio players
* [http://www.z500series.com/ Zensonic Z500 Networked DVD Media Player]
* Ziova's [http://www.ziova.com/cs510.php CS510] and [http://www.ziova.com/cs505.php CS505] network media players

==== Car stereo ====
* [http://www.phatnoise.com/products/digitalmediaplayers/kenwood_music_keg.php Kenwood Music Keg]
* [http://www.phatnoise.com/products/digitalmediaplayers/index.php PhatBox]

==== Portable ====
* [[Apple iPod]] with [[Rockbox]] firmware
* Bluedot's [http://www.digitalworldtokyo.com/2006/07/bluedot_pmp_runs_linux_loves.php BMP-1430]
* Green Apple's portable media player: [http://www.apod.com.cn/show_products.asp?photoID=437 AP3000]
* [[iAudio M3]], M5 and X5
* [[iRiver]] iHP-120/iHP-140 with [[Rockbox]] firmware
* [[Iwod G10]]
* [[Rio Karma]]
* TrekStor's [http://www.trekstor.de/en/products/detail_mp3.php?pid=66 Vibez]

=== Software ===
==== Players ====
* [http://koti.welho.com/hylinen/apollo/ Apollo]
* [http://cogosx.sourceforge.net Cog] — for Mac OS X
* [[foobar2000]]
* [[JRiver Media Center]]
* [http://fondriest.frederic.free.fr/realisations/lamip/ LAMIP]
* [[MediaMonkey]]
* [http://www.mplayerhq.hu/ MPlayer]
* [http://www.mythtv.org/ MythTV]
* [http://www.quinnware.com/ QCD] ([http://www.quinnware.com/list_plugins.php?type=input plugin])
* [http://www.videolan.org/ VLC]
* [http://www.vuplayer.com/vuplayer.htm VUPlayer]
* [[Winamp]]
* [[Windows Media Player]] and other directshow-based players (MPC, TCMP, RadLight) (with [http://www.illiminable.com/ogg/ Illiminable's directshow filters] or [http://corecodec.org/projects/coreflac CoreFLAC])
* [http://xine.sourceforge.net/ Xine]
* [[XMMS]]
* [http://www.un4seen.com/ XMplay]

==== Frontends (Windows) ====
* [http://www.uninformative.com/flacattack/ Flacattack]
* Custom [http://members.home.nl/w.speek/flac.htm Windows Frontend] (by Speek)
* [http://www.unifront.boereck.de/ UniversalFront] by Böreck
* [http://home.vxu.se/mdati00/frontah/ Frontah] by Madah
* [http://www.webearce.com.ar/ MAREO] by Kwanbis

==== Frontends (Mac) ====
* [http://www.danrules.com/macflac/ MacFLAC]
* [http://www.sbooth.org/Max/ Max]
* [http://members.rogers.com/beamsplitter/ RipBeak]
* [http://www.versiontracker.com/dyn/moreinfo/macosx/21952 xACT]

==== Converters ====
* [http://www.dbpoweramp.com/ dBpowerAMP] Music Converter / Audio Player / CD Writer
* [http://www.mediamonkey.com/ MediaMonkey] Music Manager / Audio Player / CD Writer
* [http://www.germanixsoft.de/ GX:Transcoder] Music converter

==== Editors ====
* [[Adobe Audition]]
* [http://www.goldwave.com/ GoldWave]

==== CD writers/rippers ====
* [http://www.ahead.de Ahead Nero Burning Rom]
* [http://arson.sourceforge.net/ Arson]
* [http://www.burnatonce.com/ burnatonce]
* [http://www.burrrn.net Burrrn] Audio CD burner
* [[Exact Audio Copy]] CD Ripper
* [http://cdexos.sourceforge.net CDex] CD ripper
* [http://www.cdwave.com/ CD Wave]
* [http://www.mediamonkey.com/ MediaMonkey] - [[MediaMonkey]] CD ripper/writer
* [http://cdburnerxp.se/ CDburner XP] CD writer

==== Taggers ====
* [http://www.saunalahti.fi/cse/files/Tag.zip Case's Tag] command line tagger
* [http://users.otenet.gr/~jtcliper/tgf/ The GodFather] Tagger / Music manager
* [http://www.mp3tag.de/en/index.html Mp3tag] Universal Tag Editor
* [http://sbooth.org/Tag/ Tag] — for Mac OS X 10.4 (Tiger)
* [http://flac.sourceforge.net/documentation.html#metaflac metaflac] - for general metadata (including Vorbis comments) maintenance
* [http://www.mediamonkey.com MediaMonkey] - [[MediaMonkey]] Tagger / Music manager (Including Multiple and Linked Album Art support)

==== Other tools ====
* [http://www.burrrn.net/mrq/ Mr. QuestionMan]
* [http://www.bitattack.ro/ai/ Audio Identifier]
* [http://www.bunkus.org/videotools/mkvtoolnix/ mkvtoolnix] - tool to multiplex FLAC streams inside the Matroska container
* [http://flac.sourceforge.net/documentation.html#metaflac metaflac] - for general metadata (including Vorbis comments) maintenance, also to calculate [[ReplayGain]] values for FLAC files lacking such

...and many more; see the [http://flac.sourceforge.net/links.html#software FLAC software section] and [http://flac.sourceforge.net/download.html#extras download section] for a more comprehensive list.

== Frequently asked questions ==
''Question:'' Does the compression level affect decompression speed?

''Short Answer'': No.

''Long Answer'': In truth, the compression level does affect the decompression speed, but the difference between the various compress levels can barely be measured and is too small to be noticed, even on low-end machines.

''Question:'' What is the best compression level for encoding my music?

''Short Answer'': The default setting, 5.

''Long Answer'': Encoding at the default setting will give the best balance between compression and encoding speed. Encoding at 8 can more than quadruple the encoding time, while having an insignificant effect on compression.

== See also ==
* [[Lossless]]
* [[Lossless comparison]]

== Externals links ==
* [http://flac.sourceforge.net/ FLAC website]
* [http://flac.sourceforge.net/download.html FLAC download]
* [http://flac.sourceforge.net/format.html Detailed description of the FLAC format]
* [http://flac.sourceforge.net/documentation.html FLAC documentation]
* [http://flac.sourceforge.net/faq.html FLAC FAQ]
* [http://people.ucsc.edu/~rswilson/flactest Omion's FLAC "File Size vs. Decoding Speed" test] - a very thorough test on [[Free Lossless Audio Codec#Frequently Asked Questions|the influence of the chosen encoding level on the decoding speed of FLAC]]; the only one so far to have covered FLAC's --super-secret-totally-impractical-compression-level to this extent as well.

[[Category:Lossless]]
[[Category:Encoder/Decoder]]

Winamp

2011-07-27T17:23:10Z

Notat: Replay Gain -> ReplayGain

{{Software Infobox|
|name = Winamp
|logo =
|screenshot = [[Image:Winamp-screenshot.png|250px]]
|caption = Modern Winamp skin
|maintainer = Team Nullsoft
|stable_release = [http://www.winamp.com/player/ 5.54]
|preview_release = None
|operating_system = Windows
|use = Media Player
|license = freeware
|website = [http://www.winamp.com/ www.winamp.com]
}}

{{stub}}

'''Winamp''' is a music player for Windows developed by Nullsoft with a feature-reduced freeware edition. The commercial "Pro" version is also available.

The main advantages of Winamp is its ease of use. In addition, it is skinnable, and extensible using plugins. As of version 5.2, it fully supports multi-user (i.e. each user of your computer may have their own skin, playlist, and other settings).

You can download Winamp at [http://www.winamp.com/ winamp.com]. While you are there, you might also check their [http://www.winamp.com/skins/ skin library] and [http://www.winamp.com/plugins/ plugin library]. And also check the very useful [http://forums.winamp.com/ community (forums)] where new plugins are announced and publicly tested, and very minor updates (i.e. 0.001 version increment) are posted.

== Features ==
=== Free ===
* Crippled CD burning (~8×)
* Crippled CD ripping (~8×)
* [[AAC]], [[MP4]], [[FLAC]] ([http://win32builds.sourceforge.net/flake/index.html Flake] encoder), [[WAV]], [[WMA]] encoding
* [[Transcoding]] of the different audio formats
* [[ReplayGain]] support
* ReplayGain scanner to apply Album Gain or Track Gain to the tags
* Media library
* Full Unicode support

=== Pro ===
The Pro version, which can be bought online, comes with some additional features compared to the free version.
* Burn CDs at full speed
* Rip CDs at full speed
* additional [[MP3]] encoding

== Supported formats ==
=== Playback ===
Directly supported formats (i.e. provided with installer) include: [[MP1]], [[MP2]], [[MP3]], [[WAV]], [[AAC]], [[WMA]], (Ogg) [[Vorbis]], [[MIDI]], [[FLAC]], and [[Module]]

Plugins also exist for many other formats, such as [[TTA]], [[WavPack]], [[Musepack]], [[TAK]]. Go to Winamp's [http://www.winamp.com/plugins/ plugin library] to download.

== Supported languages ==
* English

== Supported platforms ==
* Win 2000
* Win XP
* Win Vista [http://www.winamp.com/player/faq#35 with issues]

== Recommended plugins ==
* '''Playlist Separator''' – provides a customizable separator line to delimit albums in a long playlist.
* '''MojoMaster''' – sexy dancer visualization.

== External links ==
* [http://www.winamp.com Winamp: Homepage]
* [http://winamp.com/player/ Winamp: Download]
* [http://winamp.com/about/story.php More information]

[[Category:Media Players]]

Foobar2000:Foobar2000

2011-07-27T17:22:47Z

Notat: Replay Gain -> ReplayGain

{{title|foobar2000}}

{{Software Infobox|
|name = foobar2000
|logo = [[Image:foobar2000 Logo.png|48px]]
|screenshot = [[Image:Foobar2000-1.0-default-ui.png|250px]]
|caption = Screenshot of foobar2000 v1.0 using the default user interface
|maintainer = Peter Pawlowski
|stable_release = 1.1.7
|preview_release = 1.1.8 beta 2
|operating_system = Windows
|use = Media Player
|license = Proprietary, BSD
|website = [http://www.foobar2000.org/ www.foobar2000.org]
}}

'''foobar2000''' is an advanced freeware audio player for the Windows platform. Some of the basic features include full unicode support, ReplayGain support and native support for several popular audio formats.

'''The latest stable version is:''' [http://www.foobar2000.org/download v1.1.7]

'''The latest beta version is:''' [http://www.foobar2000.org/download v1.1.8 beta 2]

== Platforms ==

foobar2000 has been written specifically for the Windows platform and there are no plans to port it to any others. However, while not officially supported, it is known to run on [http://www.hydrogenaudio.org/forums/index.php?showtopic=54933 Linux] and [http://www.hydrogenaudio.org/forums/index.php?showtopic=77261 Mac OS X] through Wine and WineBottler, respectively.

== Features ==
* Powerful open component architecture allowing third-party developers to extend functionality of the player, including the ability to fully replace the user interface.
* Full Unicode support: File names, user interface, tagging, etc.
* [[ReplayGain]] support: Both playback and writing ReplayGain information to file tags.
* [[Gapless playback]].
* Advanced [[tagging]] capabilities - through built-in [[foobar2000:Properties|Properties dialog]] and various optional tagging-related components.
* Built-in [[foobar2000:Preferences:Media Library|Media Library]] functionality.
** Intuitive [[foobar2000:Query syntax|query syntax]] for searching the Media Library.
** [[foobar2000:Autoplaylist|Autoplaylist]] support: Generate dynamically updating playlists based on queries.
* [[foobar2000:Preferences:General:Keyboard Shortcuts|Customizable keyboard shortcuts]].
* Support for transcoding all supported audio formats using the [[Foobar2000:Converter|Converter component]] (requires external command-line encoder executables for different output formats).
* [[Secure_ripping|Secure]] [[foobar2000:Ripping CDs|CD ripping]].
* Streaming support.
* Efficient handling of large playlists.
* [[foobar2000:Components/Default_user_interface_%28foo_ui_std%29|User interface]] with simple configuration to create even complex layouts quickly and easily.
* Highly customizable display of track information using [[foobar2000:Titleformat_Introduction|title formatting scripts]].

== Supported Audio Formats ==
Native Support ("out-of-the-box"):
* [[MP1]], [[MP2]], [[MP3]], [[MP4]], [[Musepack]], [[AAC]], [[Ogg Vorbis]], [[FLAC]] / Ogg FLAC, [[Speex]], [[WavPack]], [[WAV]], [[AIFF]], [[AU|AU/SND]], [[CDDA]], [[WMA]], [[Matroska]].

Supported through optional components:
* [[TTA]], [[Monkey's Audio]], [[ALAC]], [[MOD]], [[SPC]], [[Shorten]], [[OptimFROG]], [[AC3]], [[DTS]], [[PSF]], [[NSF]], [[XID]], [[XA]], [[MMS]], [[RSTP]], [[TAK]], [[AMR]], etc.

In addition, foobar2000 can also play music directly from compressed ZIP and without requiring the user to extract the files prior to playing. More archive formats supported through additional components: [http://kode54.foobar2000.org/ JMA], [http://kode54.foobar2000.org/ LHA].

== Using foobar2000 ==
* [http://www.foobar2000.org/FAQ foobar2000 FAQ]
* [[foobar2000:components|foobar2000 Components]]
* [[foobar2000:Directories|foobar2000 Directories]]
* [[foobar2000:Encouraged Tag Standards|foobar2000 Encouraged Tag Standards]]
* [[foobar2000:FAQ|foobar2000 FAQ (unofficial)]]
* [http://wiki.hydrogenaudio.org/index.php?title=Category:Foobar2000_Guides foobar2000 Guides (category)]
* [http://wiki.hydrogenaudio.org/index.php?title=Category:Foobar2000_Preferences foobar2000 Preferences (category)]
=== Technical Information ===
* [[foobar2000:ID3 Tag Mapping|ID3 Tag Mapping]]

=== Specific Guides ===

'''Preferences'''
* [[foobar2000:Preferences|Preferences Dialog]]

'''Metadata'''
* [[foobar2000:Properties|Tag editing: the Properties dialog]]
* [[foobar2000:Query syntax|Query Syntax]]: details of Syntax for querying metadata.
* [[foobar2000:Metadata Compatibility|Metadata Compatibility]]: information about compatibility with metadata written by other applications

'''Title formatting'''
* [[foobar2000:Title Formatting Introduction|Introduction to titleformat scripts]]
* [[foobar2000:Title Formatting Reference|Titleformat Reference]]: reference guide to all fields and functions
* [[foobar2000:Titleformat Examples|Titleformat Examples]]: user-submitted code for various purposes; submit your own!

'''Others'''
* [[foobar2000:File operations|File operations dialog]]: move, copy, rename, and delete files from within foobar2000
* [[foobar2000:Commandline Guide|Commandline Usage]]

=== External Guides ===
* [http://foobar2000.audiohq.de/ Frank Bicking's German-language guide].
* [http://foobar2000.xrea.jp/ fb2k Wiki Page] for Japanese users.
* [http://winamp2foobar.blogspot.com Winamp To Foobar Guide] with information relevant for general users also.

== Important Links ==
=== Official Site ===
* [http://www.foobar2000.org foobar2000.org: Homepage]
* [http://www.foobar2000.org/download foobar2000.org: Download]
* [http://www.foobar2000.org/components foobar2000.org: Components]

=== Community ===
* [http://forums.foobar2000.org/ Official foobar2000 forum]
* [http://foobar-users.de/ German Support Forum]
* [http://foobar2000.pl/ Polish Support Forum]
* [http://www.fforum.ru/index.php?showforum=59 Russian-language forum]
* [http://www.foobar2000.ru/forum/ Another Russian-language forum]

=== Appearance ===
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=61333 Default UI .fth Thread] Fast way to clone another's DUI Configuration.
* [[foobar2000:Preferences:Columns UI/Appearance|Columns UI appearance customization guides]]
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=31027 Columns UI configurations]

[[Category:foobar2000]]
[[Category:Media Players|foobar2000]]
[[Category:CD Rippers]]
[[Category:Software]]
[[Category:Tag editors]]

Muine

2011-07-27T17:22:15Z

Notat:

'''Muine''' is open source music player for Linux/GNOME. Features automatic album-cover fetching, multiple artist and performer tags per song, [[ReplayGain]] support and more.

== Supported formats ==
* [[MP3]], (Ogg) [[Vorbis]], [[FLAC]]

== Supported languages ==
* English, French, German, Spanish, Portugese, Japanese, Korean, Norwegian, Swedish, Polish and more.

== Supported platforms ==
* Linux/BSD
* [http://muine.gooeylinux.org/download.shtml Additional requirements]

== External links ==
* [http://muine.gooeylinux.org Homepage]

[[Category:Media Players]]

Topic Index

2011-07-27T17:22:02Z

Notat:

* For a more structured 'table of contents', use the '''[[Main Page#Categories|Categories List]]'''.
* Please see [http://www.hydrogenaudio.org/forums/index.php?showtopic=12979&st=25&p=247441&#entry247441 this thread] for a discussion of the future structure of this wiki. If you have thoughts, comments, suggestions, etc., please join in this discussion. In the meantime, please feel free to fill in gaps in the information below.
* See also [http://www.hydrogenaudio.org/forums/index.php?showtopic=28658 the style related discussion thread] in the forums.

= General Information =
== General Guides ==
* [[Create a long-term archive]]
* [[Secure ripping|Secure Ripping]]
* [[Enabling DMA]]
* [[Choosing_the_best_codec.|Choosing the best codec]]
* [[Lossless_comparison|Lossless Comparison]]

== EAC Guides ==
* Configuring [[EAC Drive Configuration|EAC and CD-ROM Drives]]
* Configuring [[EAC and Lame]]
* Configuring [[EAC and AAC | EAC and Nero AAC]]
* Configuring [[EAC and Ogg Vorbis | EAC and Vorbis]]
* Configuring [[EAC and Musepack]]
* Configuring [[EAC and WavPack]]
* Configuring [[EAC and FLAC]]
* Configuring [[EAC and Monkey's Audio]]
* Configuring [[EAC and Cue Sheets]]
* Configuring EAC and [[REACT]]

== CDex Guides ==
* Configuring [[CDex Drive Configuration|CDex and CD-ROM Drives]]
* Configuring [[CDex and FLAC]]

== AAC Guides ==
* [[AAC_FAQ|AAC FAQ]] frequently asked questions in reguard to AAC the latest industry standard.
* [[AAC encoders|AAC Encoders]] known AAC encoder/decoder implementations and configuring them (Apple Itunes, Nero AAC, etc)
* [[Linux and Nero AAC]] a short guide for configuring Nero AAC encoder to run under Linux.

== Vorbis Guides ==
* [[Recommended_Ogg_Vorbis|Recommended encoders and settings for Vorbis]].
* [[Lancer|Ogg Vorbis Acceleration Project]] information reguarding optimized Vorbis binaries.
* [[OggDropXPd|OggDropXPd]] guide for encoding with John 33's popular drag-n-drop frontend.
* [[Compiling_aoTuV|Compiling AoTuV]] compiling the AoTuV binaries under Linux.

= Audio Codecs =
== [[Lossy]] ==
* [[Advanced Audio Coding]] (AAC)
* [[AC3]]
* [[ATRAC3]]
* [[DTS]]
* [[MP2]]
* [[MP3]]
* [[Musepack]] (MPC, MP+)
* (Ogg) [[Vorbis]]
* [[QDesign]]
* [[VQF]]
* [[Windows Media Audio]] (WMA)

== [[Lossless]] ==
* [[ALAC|Apple Lossless]]
* [[ALS|Audio Lossless Coding]]
* [[DTS-HD|DTS Master Audio]]
* [[Free Lossless Audio Codec]] (FLAC)
* [[Lossless Audio]] (LA)
* [[Lossless Predictive Audio Compression]] (LPAC)
* [[Monkey's Audio]]
* [[OptimFROG]]
* [[Lossless comparison#RealAudio Lossless|RealAudio Lossless]]
* [[Shorten]]
* [[TTA|True Audio]]
* [[WavPack]]
* [[Windows Media Audio|WMA Lossless]]

= [[Metadata]] (Tags) =
* [[APEv1]]
* [[APEv2]]
* [[ID3v1]]
* [[ID3v1.1]]
* [[ID3v2]]
* [[Vorbis Comment]]

= Media Extractors =
== CD Extractors ==
* [[Audiograbber]] (Win32)
* [[CDex]] (Win32)
* [[cdparanoia]] (Posix)
* [[dBpowerAMP with AccurateRip]] (Win32)
* [[Exact_Audio_Copy|Exact Audio Copy]] (Win32)
* [[Grip]] (Posix)
* [[iTunes]] (Win32/Mac OS/X)
* [[MediaMonkey]] (Win32)
* [[Max]] (Mac OS/X)
* [[XLD]] (Mac OS/X)
* [[PlexTools]] (Win32)
* [[Rubyripper]] (Posix/Mac OS/X)

== DVD Extractors ==
* [http://pessoal.onda.com.br/rjamorim/SetupDVDDecrypter_3.5.4.0.exe DVD Decrypter] (Win32)
* DVD-A / CPPM Decrypter (Win32/Posix)

= Media Players =
== Windows ==
* [[Apollo]]
* [[dBpowerAMP]]
* [[Foobar2000:Foobar2000|foobar2000]]
* [[iTunes]]
* [[MediaMonkey]]
* [[musikCube]]
* [[Quintessential Player]]
* [[VUplayer]]
* [[Winamp]]
* [[Windows Media Player]]
* [[wxMusik]]
* [[XMPlay]]
* [[WMPTSE]] (with WMP)

== Linux/BSD ==
* [[Amarok]]
* [[BMP]]
* [[JuK]]
* [[LAMIP]]
* [[Muine]]
* [[Music Player Daemon (MPD)]]
* [[Quod Libet]]
* [[Rhythmbox]]
* [[wxMusik]]
* [[XMMS]]

== Mac OS X (Non-BSD Specific) ==
* [[iTunes]]
* [[skiTunes]]
* [[Whamb]]

== Other ==
* [[CL-Amp]] (BeOS)

= Audio Editors =
== Windows ==
* [[Adobe Audition]] (previously known as ''Cool Edit'')
* [[Audacity]]
* [[Goldwave]]
* [http://www.sonymediasoftware.com/products/soundforgefamily.asp Sony Sound Forge] (Previously released by Sonic Foundry)

== Linux/BSD ==
* [[Ardour]]
* [[Audacity]]
* [[ReZound]]

== Mac OS X (Non-BSD Specific) ==
* [[Ardour]]
* [[Audacity]]

== Other ==
* [http://timidity.sourceforge.net/ Timidity++] (MIDI to PCM (WAV) converter) Timidity++ synthesizes MIDI files (sequences) in real-time using Gravis UltraSound Soundfont patches (loosly based upon Wavetable Synthesis) to common digital audio file formats such as, WAV, AU, AIFF, Ogg Vorbis, FLAC, etc. Useful for those who want to bypass FM Synthesizers on their sound card's to hear MIDI sequence as it was intended to be heard.)

= Testing Software =
== Subjective Perceptual ==
* [[ABC/HR]]
* [[PCABX]]

== Objective ==
''Note: Might be good to put something here about the problems of quality comparisons using graphs, frequency sweeps, etc.''

* [[EAQUAL]]
* [[Rightmark_Audio_Analyzer|Rightmark Audio Analyzer]]

= Audio Hardware =
== PC Audio ==
* [[Terratec EWX 24/96]]
* [[M-Audio Audiophile 24/96]]
* [[M-Audio Revolution 5.1]]
* [[M-Audio Revolution 7.1]]
* [[Chaintech AV-710]]
* [[E-MU 0404 24/192]]
* [[ASUS Xonar D1]]
* [[ASUS Xonar D2/PM]]

== Notebook Audio ==
* [[Echo Indigo IO 24/96]]

== Firewire ==
* [[E-MU 1212M 24/192]]
* [[M-Audio Firewire 410]]

== HiFi ==
* [[M-Audio Fast Track USB]]
* [[Slim Devices Squeezebox]]
* [[Slim Devices Transporter]]
* [[Hermstedt AG Hifidelio]]
* [[Olive Musica]]

== MIDI Interfaces ==
* M-Audio MIDISport Uno 1x1
* M-Audio MIDISport 2x2
* MOTU 5x5 Micro Lite
* MOTU Fastlane USB

== Digital Audio Players ==
=== Portable Flash ===
''(These players make use of a internal flash drive.)''
* [[Apple iPod]] Nano
* [[Apple iPod]] Shuffle
* Creative MuVo
* iRiver iFP Series
* MPIO lFP Series
* [[Rio Carbon]]

=== Portable HD ===
''(These players make use of a internal harddrive.)''
* [[Apple iPod]] with ''([http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iriver_H110_H115_H120_H140 Rockbox firmware])''
* [[Archos Jukebox with Rockbox Software]]
* [[Cowon iAudio]] with ''([http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iAudio_X5 Rockbox firmware])''
* [[iRiver H-Series]] with ''([http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iriver_H110_H115_H120_H140 Rockbox firmware])''
* [[MPIO H-Series]]
* [[Neuros]]
* [[Rio Karma]]
* [[Sandisk]] with ''([http://www.rockbox.org/twiki/bin/view/Main/TargetStatus#iAudio_X5 Rockbox firmware])''

=== Portable CD ===

=== Car Players ===
''(Car stereos that can read MP3, Vorbis, WMA, etc.).''
* [[Aiwa CDC-MP3]]
* [[Yakumo Ultrasound]]

===DVD Players===
* [[Neuston's Maestro DVX-1201]]

=== Firmware ===
* [[Rockbox]]

= Audio Theory =
== Analog Audio ==
* [[Tube Amplifiers]]
* [[Vinyl_Playback_and_Recording|Vinyl Audio]]

== Digital Audio ==
* [[Solid State Amplifiers]]
* [[ReplayGain]]

== Testing Methodology ==
* [[ABX]]
* [[EAQUAL]]

= Audio Development =
''note: Let's start with basic development tools (compilers, engineering tools, dev. libraries) until we think of more tools to add. I am also adding external links to books, tutorials, etc under resources.''

== Tools ==
* [http://www.mathworks.com/products/matlab/ MATLAB 7.0] commercial software for algorithmic design, developement, engineering, and scientific computing. (multi-platform support)
* [http://www.octave.org/ GNU Octave] open-source alternative software (GPL) to MATLAB for numerical computations, engineering, and scientific computing. (multi-platform support)
* [http://www.fftw.org/ FFTW] Is a C subroutine library for computing the Discrete Fourier transform (DFT) in one or more dimensions on real and complex inputs.
* [http://gcc.gnu.org/ GCC] THE GNU compiler collection for C, C++, Objective-C, Fortran, Java, and Ada.
* [http://www.gnu.org/software/emacs/emacs.html GNU Emacs] an extensible, customizable, self-documenting real-time display editor. Great for writing all types of source code especially on Unix. (multi-platform support)
* [http://www.bloodshed.net/devcpp.html DevCPP] free front-end IDE and compiler for the C and C++ languages. Delphi and C source code available. (Win 9x, NT, 2000, and XP)

== Resources ==
* [http://www.hydrogenaudio.org/forums/index.php?showforum=30 Scientific/R&D Forums] for Psychoacoustic, DSP, Electrical Engineering, theory, and coding related questions. (most questions are generally answered)
* [http://www.aes.org/ AES] The Audio Engineering Society website. Home of year-round world AES conferences.
* [http://www.dspguru.com/info/books/favor.htm DSP Tutorials] this site provides another good introduction in to the area of DSP.
* [http://www.musicdsp.org/archive.php?classid=2 Music-DSP] source-code archive for analysis, filters, effects and synthesis. (C, C++, and Java code)
* [http://www.itakura.nuee.nagoya-u.ac.jp/HRTF/ HRTF] A database of measurements and research papers on Head Related Transfer Functions for 3D-Audio. (PDF, Audio)
* [http://www.midi.org/about-midi/specshome.shtml MIDI Specifications] MIDI 1.0, the new MusicXMF specification, and SP-MIDI for third generation 3GPP mobile devices (PDF)
* [http://www.gamedev.net/reference/articles/article2008.asp OpenAL] a beginners tutorial on writing code using OpenAL for audio programming in computer games and other applications. (C, C++).
* [http://www.alsa-project.org/ ALSA Project] (Advanced Linux Sound Architecture) bringing audio and MIDI capabilities to Linux.
* [http://www.engmath.dal.ca/courses/engm6610/notes/notes.html A Really friendly guide to Wavelets] A good introduction to wavelets aimed towards engineer, requires a fair amount of background knowledge.

== Books/Research ==
* [http://www.amazon.com/gp/product/3540231595/qid=1135380559/sr=1-3/ref=sr_1_3/102-1730075-7300931?s=books&v=glance&n=283155 Psychoacoustics - Facts and Models] author's Zwicker, Fastl, and Hugo, revised 2005 third edition. The book for comprehensive psychoacoustics models and figures.
* [http://www.eas.asu.edu/~spanias/papers/paper-audio-tedspanias-00.pdf Perceptual Audio Coding] authors A. Painter and T. Spanias. A comprehensive paper on percepual audio coding (PDF)
* [http://www.amazon.com/gp/product/0780334493/103-2094923-9567001?v=glance&n=283155&%5Fencoding=UTF8&me=ATVPDKIKX0DER&no=283155&st=books Speech Communications Human and Machine] this book provides a good introduction to speech coding, inlcuding anaylsis, recognition, and perception. This text is a very good introduction for beginners.
* [http://www.dspguide.com/ Scientist and Engineer's Guide to DSP] author Steve Smith, a great guide for beginners new to the subject of DSP (free online text)(PDF)
*[http://www.amazon.com/exec/obidos/tg/detail/-/0792391810/ref=ase_theinternetdatac/103-9882844-5344648?v=glance&s=books Vector Quantization] authors Gersho and Gray. Good read for understanding how VQ and arithmetic coding work.

= Audio Resources =
== Websites ==
''Note: Let's include a small description to the side for now, so that we have something to work with when this section becomes large enough for its own page''

* http://www.audiocoding.com (Page with a wiki on technical audio topics, homepage of FAAC and FAAD2, also has an AAC forum.)
* http://www.ff123.net (Lots of general information on various MP3 implementations, test samples, testing methodology information, homepage of ABC/HR)
* http://www.head-fi.org (general information/board about head phones and portable audio players)
* http://www.rarewares.org (Downloads for many audio and media tools)
* http://www.rjamorim.com/rrw/ (Download old versions of foobar2000 and other audio and media tools)
* http://www.rockbox.org/ (Open-source jukebox firmware for numerous DAP and architectures, GNU/GPL License).
* http://www.dapreview.net/ (Reviews of some of the most popular digital audio players out there)
* http://www.anythingbutipod.com/ (Thorough reviews of some of the most popular digital audio players out there)

== Articles/Debates ==
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=31759&st=0 DVD-A vs. SACD debate]
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=38041&st=0 Subjective vs. Objective testing]
* [http://www.ambisonic.net/pdf/ambidvd2001.pdf 5.1 surround vs. Ambisonics comparison]

== Listening Tests ==
* [http://www.rjamorim.com/test/ Roberto's listening tests]
* [[Listening_Tests|Inventory of several listening tests, mainly on HA.org]]

= Other Topics =
== Video ==
* [[MPEG-4 Visual]]
* [[Real Video]]
* [[Theora]]
* [[Tarkin]]
* [[Snow]]
* [[VP6]]
* [[Windows Media Video]]

== [[Container format]]s ==
* [[ASF]]
* [[AVI]]
* [[Matroska]]
* [[MOV]]
* [[MP4]]
* [[Ogg]]

= Glossary =
* [[Glossary_Of_Audio_Terms|Glossary of Audio Terms]]

= Introduction & User Guides =
''A starting place for new users to audio, with guides to compression and CD ripping and a glossary of all common terms.''

* [[Glossary Of Audio Terms]]
* [[FAQ]]
* [[Audio format guide]]
* Ripping Guides
** [[EAC]] (Win32)
** [[CDex]] (Win32)
** [[DBpowerAMP with AccurateRip]] (Win32)
** [[Plextools]] (Win32)
** [[Max]] (Mac OS/X)
** [[XLD]] (Mac OS/X)
** [[Rubyripper]] (Posix/Mac OS/X)
* [[Tagging]]
* [[Replay Gain]]



= Audio Codecs =
''Pros/cons, Recommended settings, Useful tools, etc.''

*'''[[:Category:Codecs|The Technical/Codecs Category]]'''


= Container Formats =
''What is a [[container format]]?''

* [[Matroska]]
* [[MP4]]
* [[Ogg]]


= Audio Hardware & CD Ripping =
*''CD Tools, Secure Ripping, Soundcard Quality''
** [[Secure ripping]]
** Ripping Guide
*** [[EAC]]
*** [[CDex]]
*** [[DBpowerAMP with AccurateRip]]
*** [[Plextools]]
** [[CD copy protection]]
** [[CD Hardware]]
* Vinyl records and turntables
** [[Introduction to Vinyl|Introduction]]
** [[Advantages of Vinyl]]
** [[Disadvantages of Vinyl]]
** [[Vinyl Myths]]
** [[Purchasing Vinyl LPs and Components|Purchasing]]
** Record Player Components
*** [[Turntable]]
*** [[Cartridge]]
*** [[Phono preamplifier]]
** [[Evaluating Vinyl Sound Quality]]
** [[Vinyl Playback and Recording|Playback and Recording]]
** [[Vinyl Maintenance|Maintenance]]
** [[Vinyl Forum Posts and FAQs|FAQs]]
** [[Vinyl Glossary|Glossary]]
** [[Vinyl Links|Links]]
** [[Vinyl Mastering|Mastering]]
* [[Soundcard|Soundcards]]
* [[Other hardware]]



= Tests =
* [[EAC Vs CDex SecureMode]] (by Pio2001)
* [[EAC Vs CDex SecureMode II]] (by westgroveg)
* [[Listening Tests]]


=Downloads=
''Where to obtain the software discussed in HAK.''

* [[Download page]]


= Using HAK =
* [[Help:Contents|Wiki User Guide]]
* Play around at the [[Hydrogenaudio Knowledgebase:Sandbox|Sandbox]] to try your formatting skills. Everything goes here and everything can/may be deleted.
* Contributors should read [[Help:Editing|editing help]].

Replaygain

2011-07-27T17:20:41Z

Notat: Redirected page to ReplayGain

#REDIRECT [[ReplayGain]]

ReplayGain

2011-07-27T17:20:21Z

Notat: rename

'''ReplayGain''' is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the '''perceived''' loudness of audio data.

ReplayGain allows the loudness of each song within a collection of songs to be consistent. This is called 'Track Gain' (or 'Radio Gain' in earlier parlance). It also allows the loudness of a specific sub-collection (an "album") to be consistent with the rest of the collection, while allowing the dynamics from song to song on the album to remain intact. This is called 'Album Gain' (or 'Audiophile Gain' in earlier parlance). This is especially important when listening to classical music albums, because quiet tracks need to remain a certain degree quieter than the louder ones.

ReplayGain is different from [[Normalization|peak normalization]]. Peak normalization merely ensures that the peak amplitude reaches a certain level. This does not ensure equal loudness. The ReplayGain technique measures the ''effective power'' of the waveform (i.e. the RMS power after applying an "equal loudness contour"), and then adjusts the amplitude of the waveform accordingly. The result is that Replay Gained waveforms are usually more uniformly amplified then peak-normalized waveforms.

==Target loudness==
The target loudness of almost all ReplayGain utilities is 89 dB SPL (an early departure from the proposal, endorsed by its author<ref>[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=83397&view=findpost&p=721854 Does Replay gain work differtly in Media monkey]</ref>) — the ReplayGain proposal and SMPTE recommendation are 6dB lower.<ref>[http://www.mars.org/mailman/public/mad-dev/2004-February/000993.html ReplayGain discussion at mad-dev]</ref>

==Clipping==
Audio is generally recorded such that the loudest sounds don't clip, but the use of ReplayGain can cause clipping if the average volume of a song is below the target level. That is, upon playback, the volume of a quiet song is increased, so the parts of the song with above-average loudness, especially in the bass frequencies, will exceed the limits of the format and will be distorted. Whether this distortion is audible depends on the sounds in question, and the listener's sensitivity.

Implementations deal with the risk of clipping in different ways. Some have a "pre-amp" feature which reduces (or boosts) the original audio's level by a certain amount before doing whatever is needed for ReplayGain. Some have a "prevent clipping" feature to reduce the amount of ReplayGain adjustment to whatever amount would keep clipping from occurring, based on peak info stored in the file's metadata (thus reducing the effectiveness of ReplayGain). Some recommend using a compressor/limiter DSP to prevent or reduce clipping, regardless of whether it was caused by ReplayGain.

== Implementations ==
There are different ReplayGain implementations, each with its own uses and strength. Most use [[metadata]] to indicate the level of the volume change that the player should make. Some modify the audio data itself, and optionally use metadata as well. There are advantages and disadvantages to both methods.

In the metadata method, information on both types of ReplayGain (Track Gain and Album Gain) can be stored. The volume-change information can be very precise. If audio data was also changed, the metadata can contain "undo" info. Not all audio players/decoders know how to read and use ReplayGain information stored in metadata. And there's no standard for where and how ReplayGain info is stored; each implementation uses different formats and puts the info in different locations.

In the audio data method, the file's actual audio data is modified so that its natural/default playback volume is at the target level. In this scenario, only one type of ReplayGain (Track Gain or Album Gain) can be applied. If no "undo" info is saved somewhere, it may not be possible to restore the original audio data. Limitations of the audio file format may prevent precise (finely tuned) gain adjustments with this method. For example, MP3 and AAC files can only be losslessly modified in 1.5 dB steps. Depending on the audio file format, the process may also be lossy in the sense that it could irreversibly push a signal above the format's maximum amplitude (resulting in clipping) or below the minimum (resulting in silence).

=== MP3Gain ===
[[MP3Gain]] is an implementation of ReplayGain. It can be used to just analyze files & recommend changes or to also modify the gain. If modifying the gain, it always modifies the global gain fields in the MP3 audio data. It can add somewhat precise metadata, including undo info. The gain can be modified to any target dB, or it can be changed by a specified amount. For balance correction, user-specified changes can even be made on just one channel in simple L/R stereo-mode files (not joint stereo).

* Format: [[MP3]]
* Method: Audio + Meta (in APE tag), or Audio only
* APE tag fields (ASCII bytes):
** <code>MP3GAIN_MINMAX ###,###</code> - minimum & maximum global gain values for this file. 3 digits, zero-padded if necessary.
** <code>MP3GAIN_ALBUM_MINMAX ###,###</code> - minimum & maximum global gain values across a set of files scanned as an album. Optional.
** <code>MP3GAIN_UNDO +###,+###,N</code> - the global gain adjustment to restore the original values in the left and right channels, respectively, followed by an indicator of whether to wrap at the extremes (<code>N</code> means no, <code>W</code> means yes). The adjustment values are 3 digits, zero-padded, preceded by a sign (<code>+</code> or <code>-</code>).
** <code>REPLAYGAIN_TRACK_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Examples: <code>+0.424046</code> and <code>-10.38500</code>
** <code>REPLAYGAIN_TRACK_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Example: <code>0.149923</code>
** <code>REPLAYGAIN_ALBUM_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Optional.
** <code>REPLAYGAIN_ALBUM_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Optional.
* Limitations: Although the metadata, if written, contains precise adjustment & peak values, the audio data modifications are limited to 1.5dB steps and may become irreversible (however, that's a very rare condition; see the [http://www.hydrogenaudio.org/forums/lofiversion/index.php/t34154.html "mp3gain is NOT lossless" forum thread])
* http://mp3gain.sourceforge.net/

=== AACGain ===
[[AACGain]] is a modified version of MP3Gain that works on both MP3 and AAC files.

* Format: [[MP3]], [[AAC]] (with or without MP4 container)
* Method: Audio + Meta, or Audio only
* Limitations: Limited to 1.5dB steps mode, may become irreversible (same caveat as for MP3Gain)
* http://altosdesign.com/aacgain/

=== [[LAME]] ===
* Method: Header ([http://gabriel.mp3-tech.org/mp3infotag.html mp3infotag])
* Notes:
** Tags added during encoding; not supported by any player yet; Track Gain only
** Replay Gaining MP3's are usually done using MP3Gain (see [[ReplayGain#MP3Gain|above]]) or [[ReplayGain#foobar2000 ReplayGain scanner|foobar2000]]
* http://lame.sourceforge.net/

=== [[Musepack]] ReplayGain ===
* Method: Header (similar to Meta data method)
* Notes: ReplayGain values are stored in the header and ReplayGain is part of the Musepack specifications; therefore any Musepack decoder that does not support ReplayGain can be considered broken.
* http://www.musepack.net/

=== VorbisGain ===
* Format: (Ogg) [[Vorbis]]
* Method: Meta (in [[Vorbis comment]])
* http://www.sjeng.org/vorbisgain.html
** new compiles of VorbisGain at [http://www.rarewares.org/ogg.html www.rarewares.org]
:'''''Note:''' Andavari has provided a very useful script to integrate VorbisGain, which is a CLI tool, into Windows Explorer. Please (Ogg) [[Vorbis#Replay Gain|check this section]].

=== FLAC / METAFLAC ===
* Format: [[Free Lossless Audio Codec|FLAC]]
* Method: Meta (in [[Vorbis comment]])
* http://flac.sf.net

=== WavPack / WVGAIN ===
* Format: [[WavPack]]
* Method: Meta (in [[APEv2]] tag)
* http://www.wavpack.com

=== Wavegain ===
* Format: waveform
* Method: Audio
* Limitations: Irreversible
* http://www.rarewares.org/files/others/wavegain.zip

=== [[foobar2000]] ReplayGain scanner ===
* Format:
** [[MP3]]: Values written to [[ID3v2]] (default) or [[APEv2]] tags. A separate function can be invoked to apply the tagged Track or Album Gain to the MP3 global gain fields (as MP3Gain does, but requiring tags first), and to rewrite the tags to account for the peak change and compensate for the difference from 89 dB. The 89 dB reference level for tags isn't configurable, but the reference level applied to the global gain fields is (it's under Preferences > Advanced > Tools > ReplayGain Scanner > Target MP3 alteration volume level).
** [[Musepack]]: Values written to header.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain).
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** Modules ([[MOD]] etc.): Optionally saved into [[APEv2]] tags.
* http://foobar2000.org

=== [[MediaMonkey]] ===
* Format:
** [[MP3]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in MediaMonkey's MDB database.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[WAV]]: Values stored in MediaMonkey's MDB database.
** [[MPC]]: Internal gain Structure.
* In addition to tags, all ReplayGain values are also stored in MediaMonkey's MDB database
* Album/Audiophile ReplayGain not supported until v3.0 (Dec 2007); support during burning & ripping added in 3.1 (Jun 2009)
* Also capable of (irreversibly) changing the volume of MP3 tracks, similar to [[MP3Gain]]
* http://www.mediamonkey.com/

=== [[Winamp]] ReplayGain scanner===
* Format:
** [[MP3]]: Values written to [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in Windows Media Audio tags.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]
** [[TAK]]: Values written to [[APEv2]] tags.
* Support Album/Track Gain

== Players support ==
ReplayGain being present in the specs of FLAC, Musepack, and APE formats, any player that support those formats usually support ReplayGain.

The situation with MP3 is rather different, as it was not part of the MP3 specs. The APEv2 tags metadata implementation is somewhat becoming the de-facto standard.

=== Windows ===
* [[foobar2000]] supports ReplayGain in all possible aspects.
* [[Winamp]] supports ReplayGain in album or track mode.
* [[MediaMonkey]] supports track ReplayGain only
* [[XMPlay]] recently implemented ReplayGain

''...and probably others.''

=== Linux ===
* [[XMMS]]. Reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], [[Musepack]], (Ogg) [[Vorbis]] ..
:For [[MP3]], use the CVS version of the [http://xmms-mad.sourceforge.net/ xmms-mad] mp3 plugin (it's not yet released as binary, furthermore not available in distribs' versions for now. Meanwhile binaries are available here: [http://perso.crans.org/~krempp/xmms-mad/ custom binaries])
* [[amarok]]. By using the amarok-script [http://kde-apps.org/content/show.php?content=26073 ReplayGain]
:And possibly others, since [http://developer.kde.org/~wheeler/taglib.html TagLib] added support for [[APEv2]] tags in [[MP3]] files, players using this library (like [[amaroK]] and [[JuK]]) might support that kind of ReplayGain tags in the near future.
* [http://www.sacredchao.net/quodlibet Quod Libet] reads ReplayGain from (Ogg) [[Vorbis]], [[MP3]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:Requires support to be enabled (via the appropriate python bindings and libraries) for the above formats. Does not support ReplayGain values stored in [[APEv2]] tags in [[MP3]]s. ReplayGain values are stored in RVA2 id3v2.4 frames. See the [http://www.sacredchao.net/quodlibet/wiki/Development/ID3Notes Quod Libet RVA2 / ReplayGain notes].
* [http://www.musicpd.org/ Music Player Daemon] (MPD) reads ReplayGain from (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:foobar2000-style TXXX frames in [[MP3]]s are also supported in the latest development releases.
* [http://www.mplayerhq.hu/ MPlayer]. Mplayer support for ReplayGain is codec dependent.
:Codecs that are known to support ReplayGain: vorbis
:Because of this, you need to prioritize the codecs that support it, or choose it individually on the command line. To add it to the command line, add an -ac [codec] option after each file that you want to choose the codec for, or at the beginning to make it apply to all files listed. To prioritize the codecs by default, list them in a line in mplayer.conf:
ac=[codec],[othercodec],vorbis,mad,

=== Portable devices ===
[http://www.rockbox.org/ Rockbox] supports ReplayGain (in album or track mode) for most formats, including WMA, MP1/2/3, AAC, ALAC, Musepack, Monkey's Audio, Wavpack, FLAC and Vorbis. Note that ReplayGain is only supported when using the respective codec's native tagging format. For example: ReplayGain stored in APEv2 tags is not supported for MP3, rather ID3v2.x tags are expected.

Sandisk Sansa Fuze with firmware 1.02.26 and 2.02.26

Sandisk Sansa Clip+

The iPod features ''Soundcheck'', which seems to produce roughly the same normalization gains as ReplayGain, but doesn't provide an Album Gain.

=== Hi-Fi ===
Slim Devices a company owned by Logitech Inc, supports ReplayGain on both of their hi-end audiophile players, known as the [[Slim Devices Transporter|Transporter]] and the [[Slim Devices Squeezebox|Squeezebox]].

==Notes==
<references/>

== See also ==
* [[ReplayGain specification]]

== External links ==
* [http://en.wikipedia.org/wiki/Replay_Gain ReplayGain] at Wikipedia
* [http://www.replaygain.org/ ReplayGain - A Proposed Standard], the original proposal, now out of date with respect to current practice
* [http://www.bobulous.org.uk/misc/Replay-Gain.html ReplayGain using foobar2000] (how to use ReplayGain in Windows using foobar2000).
* [http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html ReplayGain in Linux] (how to use ReplayGain in Linux using foobar2000 and Wine, or using metaflac or vorbisgain).

[[Category:Technical]]
[[Category:Metadata]]

Replay Gain

2011-07-27T17:17:05Z

Notat: name change

#REDIRECT [[ReplayGain]]

Replaygain

2011-07-27T17:17:02Z

Notat: name change

Replay Gain

2011-07-27T17:14:56Z

Notat: Replay Gain -> ReplayGain

Talk:ReplayGain specification

2011-07-23T16:35:37Z

Notat: moved Talk:ReplayGain specification to Talk:ReplayGain 1.0 specification: Add version number

#REDIRECT [[Talk:ReplayGain 1.0 specification]]

Talk:Original ReplayGain specification

2011-07-23T16:35:37Z

Notat: moved Talk:ReplayGain specification to Talk:ReplayGain 1.0 specification: Add version number

ReplayGain specification

2011-07-23T16:35:37Z

Notat: moved ReplayGain specification to ReplayGain 1.0 specification: Add version number

#REDIRECT [[ReplayGain 1.0 specification]]

Original ReplayGain specification

2011-07-23T16:35:37Z

Notat: moved ReplayGain specification to ReplayGain 1.0 specification: Add version number

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Revised ReplayGain specification

2011-07-23T16:31:41Z

Notat: Created page with "Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all r…"

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2011-05-10T15:34:43Z

Notat: /* Reference level */ spell out small ordinals

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2011-05-10T15:33:52Z

Notat: /* Reference level */ grammar

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to 2 channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2011-05-10T15:31:29Z

Notat: /* Statistical processing */ punctuation around note

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, in ReplayGain calibrates to 2 channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2011-05-10T15:29:06Z

Notat: /* Loudness filter */ legend

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref>, so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, in ReplayGain calibrates to 2 channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2011-05-10T13:01:08Z

Notat: /* Acknowledgements */ update link to original proposal

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz. The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref>, so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, in ReplayGain calibrates to 2 channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Replay Gain

2011-04-15T15:04:21Z

Notat: new specification is complete

'''Replay Gain''' is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the '''perceived''' loudness of audio data.

Replay Gain allows the loudness of each song within a collection of songs to be consistent. This is called 'Track Gain'(or 'Radio Gain' in earlier parlance). It also allows the loudness of a specific sub-collection (an "album") to be consistent with the rest of the collection, while allowing the dynamics from song to song on the album to remain intact. This is called 'Album Gain' (or 'Audiophile Gain' in earlier parlance). This is especially important when listening to classical music albums, because quiet tracks need to remain a certain degree quieter than the louder ones.

Replay Gain is different from [[Normalization|peak normalization]]. Peak normalization merely ensures that the peak amplitude reaches a certain level. This does not ensure equal loudness. The Replay Gain technique measures the ''effective power'' of the waveform (i.e. the RMS power after applying an "equal loudness contour"), and then adjusts the amplitude of the waveform accordingly. The result is that Replay Gained waveforms are usually more uniformly amplified then peak-normalized waveforms.

==Target loudness==
The target loudness of almost all Replay Gain utilities is 89 dB SPL (an early departure from the proposal, endorsed by its author<ref>[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=83397&view=findpost&p=721854 Does Replay gain work differtly in Media monkey]</ref>) — the Replay Gain proposal and SMPTE recommendation are 6dB lower.<ref>[http://www.mars.org/mailman/public/mad-dev/2004-February/000993.html Replay Gain discussion at mad-dev]</ref>

==Clipping==
Audio is generally recorded such that the loudest sounds don't clip, but the use of Replay Gain can cause clipping if the average volume of a song is below the target level. That is, upon playback, the volume of a quiet song is increased, so the parts of the song with above-average loudness, especially in the bass frequencies, will exceed the limits of the format and will be distorted. Whether this distortion is audible depends on the sounds in question, and the listener's sensitivity.

Implementations deal with the risk of clipping in different ways. Some have a "pre-amp" feature which reduces (or boosts) the original audio's level by a certain amount before doing whatever is needed for Replay Gain. Some have a "prevent clipping" feature to reduce the amount of Replay Gain adjustment to whatever amount would keep clipping from occurring, based on peak info stored in the file's metadata (thus reducing the effectiveness of Replay Gain). Some recommend using a compressor/limiter DSP to prevent or reduce clipping, regardless of whether it was caused by Replay Gain.

== Implementations ==
There are different Replay Gain implementations, each with its own uses and strength. Most use [[metadata]] to indicate the level of the volume change that the player should make. Some modify the audio data itself, and optionally use metadata as well. There are advantages and disadvantages to both methods.

In the metadata method, information on both types of Replay Gain (Track Gain and Album Gain) can be stored. The volume-change information can be very precise. If audio data was also changed, the metadata can contain "undo" info. Not all audio players/decoders know how to read and use Replay Gain information stored in metadata. And there's no standard for where and how Replay Gain info is stored; each implementation uses different formats and puts the info in different locations.

In the audio data method, the file's actual audio data is modified so that its natural/default playback volume is at the target level. In this scenario, only one type of Replay Gain (Track Gain or Album Gain) can be applied. If no "undo" info is saved somewhere, it may not be possible to restore the original audio data. Limitations of the audio file format may prevent precise (finely tuned) gain adjustments with this method. For example, MP3 and AAC files can only be losslessly modified in 1.5 dB steps. Depending on the audio file format, the process may also be lossy in the sense that it could irreversibly push a signal above the format's maximum amplitude (resulting in clipping) or below the minimum (resulting in silence).

=== MP3Gain ===
[[MP3Gain]] is an implementation of Replay Gain. It can be used to just analyze files & recommend changes or to also modify the gain. If modifying the gain, it always modifies the global gain fields in the MP3 audio data. It can add somewhat precise metadata, including undo info. The gain can be modified to any target dB, or it can be changed by a specified amount. For balance correction, user-specified changes can even be made on just one channel in simple L/R stereo-mode files (not joint stereo).

* Format: [[MP3]]
* Method: Audio + Meta (in APE tag), or Audio only
* APE tag fields (ASCII bytes):
** <code>MP3GAIN_MINMAX ###,###</code> - minimum & maximum global gain values for this file. 3 digits, zero-padded if necessary.
** <code>MP3GAIN_ALBUM_MINMAX ###,###</code> - minimum & maximum global gain values across a set of files scanned as an album. Optional.
** <code>MP3GAIN_UNDO +###,+###,N</code> - the global gain adjustment to restore the original values in the left and right channels, respectively, followed by an indicator of whether to wrap at the extremes (<code>N</code> means no, <code>W</code> means yes). The adjustment values are 3 digits, zero-padded, preceded by a sign (<code>+</code> or <code>-</code>).
** <code>REPLAYGAIN_TRACK_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Examples: <code>+0.424046</code> and <code>-10.38500</code>
** <code>REPLAYGAIN_TRACK_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Example: <code>0.149923</code>
** <code>REPLAYGAIN_ALBUM_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Optional.
** <code>REPLAYGAIN_ALBUM_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Optional.
* Limitations: Although the metadata, if written, contains precise adjustment & peak values, the audio data modifications are limited to 1.5dB steps and may become irreversible (however, that's a very rare condition; see the [http://www.hydrogenaudio.org/forums/lofiversion/index.php/t34154.html "mp3gain is NOT lossless" forum thread])
* http://mp3gain.sourceforge.net/

=== AACGain ===
[[AACGain]] is a modified version of MP3Gain that works on both MP3 and AAC files.

* Format: [[MP3]], [[AAC]] (with or without MP4 container)
* Method: Audio + Meta, or Audio only
* Limitations: Limited to 1.5dB steps mode, may become irreversible (same caveat as for MP3Gain)
* http://altosdesign.com/aacgain/

=== [[LAME]] ===
* Method: Header ([http://gabriel.mp3-tech.org/mp3infotag.html mp3infotag])
* Notes:
** Tags added during encoding; not supported by any player yet; Track Gain only
** Replay Gaining MP3's are usually done using MP3Gain (see [[Replay Gain#MP3Gain|above]]) or [[Replay Gain#foobar2000 Replay Gain scanner|foobar2000]]
* http://lame.sourceforge.net/

=== [[Musepack]] Replay Gain ===
* Method: Header (similar to Meta data method)
* Notes: Replay Gain values are stored in the header and Replay Gain is part of the Musepack specifications; therefore any Musepack decoder that does not support Replay Gain can be considered broken.
* http://www.musepack.net/

=== VorbisGain ===
* Format: (Ogg) [[Vorbis]]
* Method: Meta (in [[Vorbis comment]])
* http://www.sjeng.org/vorbisgain.html
** new compiles of VorbisGain at [http://www.rarewares.org/ogg.html www.rarewares.org]
:'''''Note:''' Andavari has provided a very useful script to integrate VorbisGain, which is a CLI tool, into Windows Explorer. Please (Ogg) [[Vorbis#Replay Gain|check this section]].

=== FLAC / METAFLAC ===
* Format: [[Free Lossless Audio Codec|FLAC]]
* Method: Meta (in [[Vorbis comment]])
* http://flac.sf.net

=== WavPack / WVGAIN ===
* Format: [[WavPack]]
* Method: Meta (in [[APEv2]] tag)
* http://www.wavpack.com

=== Wavegain ===
* Format: waveform
* Method: Audio
* Limitations: Irreversible
* http://www.rarewares.org/files/others/wavegain.zip

=== [[foobar2000]] Replay Gain scanner ===
* Format:
** [[MP3]]: Values written to [[ID3v2]] (default) or [[APEv2]] tags. A separate function can be invoked to apply the tagged Track or Album Gain to the MP3 global gain fields (as MP3Gain does, but requiring tags first), and to rewrite the tags to account for the peak change and compensate for the difference from 89 dB. The 89 dB reference level for tags isn't configurable, but the reference level applied to the global gain fields is (it's under Preferences > Advanced > Tools > ReplayGain Scanner > Target MP3 alteration volume level).
** [[Musepack]]: Values written to header.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]: Uses its own iTunes-compatible tagging system (though iTunes does not support Replay Gain).
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** Modules ([[MOD]] etc.): Optionally saved into [[APEv2]] tags.
* http://foobar2000.org

=== [[MediaMonkey]] ===
* Format:
** [[MP3]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in MediaMonkey's MDB database.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[WAV]]: Values stored in MediaMonkey's MDB database.
** [[MPC]]: Internal gain Structure.
* In addition to tags, all Replay Gain values are also stored in MediaMonkey's MDB database
* Album/Audiophile Replay Gain not supported until v3.0 (Dec 2007); support during burning & ripping added in 3.1 (Jun 2009)
* Also capable of (irreversibly) changing the volume of MP3 tracks, similar to [[MP3Gain]]
* http://www.mediamonkey.com/

=== [[Winamp]] Replay Gain scanner===
* Format:
** [[MP3]]: Values written to [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in Windows Media Audio tags.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]
** [[TAK]]: Values written to [[APEv2]] tags.
* Support Album/Track Gain

== Players support ==
Replay Gain being present in the specs of FLAC, Musepack, and APE formats, any player that support those formats usually support Replay Gain.

The situation with MP3 is rather different, as it was not part of the MP3 specs. The APEv2 tags metadata implementation is somewhat becoming the de-facto standard.

=== Windows ===
* [[foobar2000]] supports Replay Gain in all possible aspects.
* [[Winamp]] supports Replay Gain in album or track mode.
* [[MediaMonkey]] supports track Replay Gain only
* [[XMPlay]] recently implemented Replay Gain

''...and probably others.''

=== Linux ===
* [[XMMS]]. Reads Replay Gain from [[Free Lossless Audio Codec|FLAC]], [[Musepack]], (Ogg) [[Vorbis]] ..
:For [[MP3]], use the CVS version of the [http://xmms-mad.sourceforge.net/ xmms-mad] mp3 plugin (it's not yet released as binary, furthermore not available in distribs' versions for now. Meanwhile binaries are available here: [http://perso.crans.org/~krempp/xmms-mad/ custom binaries])
* [[amarok]]. By using the amarok-script [http://kde-apps.org/content/show.php?content=26073 Replay Gain]
:And possibly others, since [http://developer.kde.org/~wheeler/taglib.html TagLib] added support for [[APEv2]] tags in [[MP3]] files, players using this library (like [[amaroK]] and [[JuK]]) might support that kind of Replay Gain tags in the near future.
* [http://www.sacredchao.net/quodlibet Quod Libet] reads Replay Gain from (Ogg) [[Vorbis]], [[MP3]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:Requires support to be enabled (via the appropriate python bindings and libraries) for the above formats. Does not support Replay Gain values stored in [[APEv2]] tags in [[MP3]]s. Replay Gain values are stored in RVA2 id3v2.4 frames. See the [http://www.sacredchao.net/quodlibet/wiki/Development/ID3Notes Quod Libet RVA2 / Replay Gain notes].
* [http://www.musicpd.org/ Music Player Daemon] (MPD) reads Replay Gain from (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:foobar2000-style TXXX frames in [[MP3]]s are also supported in the latest development releases.
* [http://www.mplayerhq.hu/ MPlayer]. Mplayer support for Replay Gain is codec dependent.
:Codecs that are known to support Replay Gain: vorbis
:Because of this, you need to prioritize the codecs that support it, or choose it individually on the command line. To add it to the command line, add an -ac [codec] option after each file that you want to choose the codec for, or at the beginning to make it apply to all files listed. To prioritize the codecs by default, list them in a line in mplayer.conf:
ac=[codec],[othercodec],vorbis,mad,

=== Portable devices ===
[http://www.rockbox.org/ Rockbox] supports Replay Gain (in album or track mode) for most formats, including WMA, MP1/2/3, AAC, ALAC, Musepack, Monkey's Audio, Wavpack, FLAC and Vorbis. Note that Replay Gain is only supported when using the respective codec's native tagging format. For example: Replay Gain stored in APEv2 tags is not supported for MP3, rather ID3v2.x tags are expected.

Sandisk Sansa Fuze with firmware 1.02.26 and 2.02.26

Sandisk Sansa Clip+

The iPod features ''Soundcheck'', which seems to produce roughly the same normalization gains as Replay Gain, but doesn't provide an Album Gain.

=== Hi-Fi ===
Slim Devices a company owned by Logitech Inc, supports Replay Gain on both of their hi-end audiophile players, known as the [[Slim Devices Transporter|Transporter]] and the [[Slim Devices Squeezebox|Squeezebox]].

==Notes==
<references/>

== See also ==
* [[ReplayGain specification]]

== External links ==
* [http://en.wikipedia.org/wiki/Replay_Gain Replay Gain] at Wikipedia
* [http://www.replaygain.org/ Replay Gain - A Proposed Standard], the original proposal, now out of date with respect to current practice
* [http://www.bobulous.org.uk/misc/Replay-Gain.html Replay Gain using foobar2000] (how to use Replay Gain in Windows using foobar2000).
* [http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html Replay Gain in Linux] (how to use Replay Gain in Linux using foobar2000 and Wine, or using metaflac or vorbisgain).

[[Category:Technical]]
[[Category:Metadata]]

Original ReplayGain specification

2011-03-20T17:17:08Z

Notat: add archive.org link

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz. The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref>, so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, in ReplayGain calibrates to 2 channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2011-03-20T17:01:36Z

Notat: correct RG spelling in caption

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz. The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref>, so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, in ReplayGain calibrates to 2 channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org original ReplayGain proposal] was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2011-03-20T16:57:02Z

Notat: remove broken reference from quote

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz. The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref>, so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, in ReplayGain calibrates to 2 channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Possible Replay Gain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org original ReplayGain proposal] was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />