Hydrogenaudio Knowledgebase - User contributions [en]

LossyWAV

2026-01-24T14:43:39Z

Skamp: removed lossyWV support from caudec

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaud.io/forums/index.php?showuser=42400 Nick.C]
| stable_release = [https://hydrogenaud.io/index.php/topic,112649 1.4.2]
| preview_release = -
| operating_system = [[Wikipedia:Microsoft Windows|Windows]], [[Wikipedia:Linux|Linux]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaud.io/forums/index.php?showtopic=107081 1.4.0 release thread] [http://www.hydrogenaud.io/forums/index.php?showtopic=109239 1.5.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF WAVE|WAV]] file format. Proposed by [http://www.hydrogenaud.io/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaud.io/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least significant bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaud.io/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaud.io/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaud.io/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling (signal processing)|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaud.io/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaud.io/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaud.io/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512 --'''merge-blocks'''
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaud.io/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.4.2, Copyright (C) 2007-2016 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate of the losslessly compressed output
dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, probably not transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10.0; E=7.5; H=5.0; S=2.5; C=0.0; P=-2.5;
X=-5.0.
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --altspread [n] disables 'old' sperading mechanism in favour of 'new'
mechanism (default spreading uses both 'old' and 'new'
mechanisms). Takes an optional parameter, n, which relates
to the proportion of adjacent bins taken into account when
calculating spread average for a particular bin (0<=n<=1;
default = 0.768544).
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=7; default=3,
i.e. 32, 64 & 1024 samples. n = 2, remove 32 sample FFT;
n > 3 add 16; n > 4, add 128; n > 5, add 256, n > 6, add
512) n.b. FFT lengths stated are for 44.1/48kHz audio,
higher sample rates will automatically increase all FFT
lengths as required.
-D, --dynamic <n> select minimum_bits_to_keep_dynamic to n bits (default
2.71 at -q X and 5.00 at -q I, 1.0 <= n <= 7.0.
--feedback [n] enable experimental bit removal / adaptive noise shaping
noise limiter. Tuning has been carried out at -q X and
should have a negligible effect at -q S. Optional setting
(0.0 <= n <= 10.0, default = 0.0) automatically selects
the following parameters (0 = least effect, 10 = most):
r, round <n> limit deviation from expected added noise due to rounding
(-2.0 <= n <= 2.0, default = 0.0).
n, noise <n> limit added noise due to adaptive noise shaping
(-2.5 <= n <= 7.5, default = 0.0).
a, aclips <n> number of permissible exceedences of adaptive noise
shaping level limit (0 <= n <= 64, default = 32).
A, alevel <n> adaptive noise shaping level limit (-2.0 <= n <= 2.5,
default = 0.0).
V, verbose enable more detailed feedback information in output.
-I, --ignore-chunk-sizes.
ignore 'RIFF' and 'data' chunk sizes in input.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(12500 <= n <= 20000*; default=16000).
*: for 44.1/48 kHz audio. Upper limit for audio of
other sampling rates is limited to sample-rate x 45.35%
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0 <= n <= 16; default = 3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
-n, --noskew disable application of low frequency level reduction prior
to determination of bits-to-remove.
--scale <n> factor to scale audio by; (0.03125 < n <= 8.0; default=1).
-s, --shaping modify settings for noise shaping used in bit-removal:
a, altfilter enable alternative adaptive shaping filter method.
A, average set factor of shape modification above upper calculation
frequency limit (0.00000 <= n <= 1.00000)
c, cubic enable cubic interpolation when defining filter shape
e, extra additional white noise to add during creation of filter
f, fixed disable adaptive noise shaping (use fixed shaping)
h, hybrid enable hybrid alternative to default adaptive noise shaping
method. Uses all available calculated analyses to create
the desired noise filter shape rather than only those for
1.5ms and 20ms FFT analyses.
n, nowarp disable warped noise shaping (use linear frequency shaping)
o, off disable noise shaping altogether (use simple rounding)
s, scale <n> change effectiveness of noise shaping (0 < n <= 2; default
= 1.0)
t, taps <n> select number of taps to use in FIR filter (8 <= n <= 256;
default = 64)
w, warp enable cubic interpolation when creating warped filter
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
3<=n<=28, limited to bits-per-sample - 3).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist [all] enable frequency analysis display of input data. Use of
'all' parameter displays all calculated analyses.
-H, --histogram show sample value histogram (input, lossy and correction).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Tyge Lovset for the C++ translation initiative.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%~1" flac -d "%~1" --stdout --silent|lossywav - --stdout --quality standard ^
--stdinname "%~1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag ^
--fromfile "%~1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===Linux / OS X support: lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c). The latest stable version (1.3.0 at the time of writing) is fully supported.

[https://github.com/gcocatre/caudec caudec] is a command-line tool that can encode and decode lossyWAV / lossyFLAC files, using the a linux or macOS build (see the POSIX version below).

There is also a [http://github.com/MoSal/lossywav-for-posix lossyWAV for POSIX] port available on GitHub that does not require any Wine emulation.

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension: lossy.flac
Parameters: /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|c:\"program files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyTAK settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension: lossy.tak
Parameters: /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|c:\"program files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension: lossy.wv
Parameters: /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|c:\"program files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension: lossy.wma
Parameters: /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|c:\"program files"\bin\wmaencode.exe - %d --codec lsl --ignorelength
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaud.io/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[Transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Will it low-pass filter my audio?
*'''Short answer:''' No. The frequency limit is for the analysis only. LossyWAV cannot low-pass filter your audio.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaud.io/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=109239 lossyWAV 1.5.0 development thread]
*[http://www.hydrogenaud.io/forums/index.php?showtopic=107081 lossyWAV 1.4.0 release thread] - Release of version 1.4.0 on 02 Oktober 2014
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaud.io/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaud.io/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaud.io/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaud.io/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[index.php?title=Category:Software]]

LossyWAV

2026-01-24T14:39:15Z

Skamp: Updated lossyWAV support with caudec

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaud.io/forums/index.php?showuser=42400 Nick.C]
| stable_release = [https://hydrogenaud.io/index.php/topic,112649 1.4.2]
| preview_release = -
| operating_system = [[Wikipedia:Microsoft Windows|Windows]], [[Wikipedia:Linux|Linux]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaud.io/forums/index.php?showtopic=107081 1.4.0 release thread] [http://www.hydrogenaud.io/forums/index.php?showtopic=109239 1.5.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF WAVE|WAV]] file format. Proposed by [http://www.hydrogenaud.io/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaud.io/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least significant bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaud.io/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaud.io/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaud.io/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling (signal processing)|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaud.io/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaud.io/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaud.io/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512 --'''merge-blocks'''
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaud.io/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.4.2, Copyright (C) 2007-2016 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate of the losslessly compressed output
dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, probably not transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10.0; E=7.5; H=5.0; S=2.5; C=0.0; P=-2.5;
X=-5.0.
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --altspread [n] disables 'old' sperading mechanism in favour of 'new'
mechanism (default spreading uses both 'old' and 'new'
mechanisms). Takes an optional parameter, n, which relates
to the proportion of adjacent bins taken into account when
calculating spread average for a particular bin (0<=n<=1;
default = 0.768544).
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=7; default=3,
i.e. 32, 64 & 1024 samples. n = 2, remove 32 sample FFT;
n > 3 add 16; n > 4, add 128; n > 5, add 256, n > 6, add
512) n.b. FFT lengths stated are for 44.1/48kHz audio,
higher sample rates will automatically increase all FFT
lengths as required.
-D, --dynamic <n> select minimum_bits_to_keep_dynamic to n bits (default
2.71 at -q X and 5.00 at -q I, 1.0 <= n <= 7.0.
--feedback [n] enable experimental bit removal / adaptive noise shaping
noise limiter. Tuning has been carried out at -q X and
should have a negligible effect at -q S. Optional setting
(0.0 <= n <= 10.0, default = 0.0) automatically selects
the following parameters (0 = least effect, 10 = most):
r, round <n> limit deviation from expected added noise due to rounding
(-2.0 <= n <= 2.0, default = 0.0).
n, noise <n> limit added noise due to adaptive noise shaping
(-2.5 <= n <= 7.5, default = 0.0).
a, aclips <n> number of permissible exceedences of adaptive noise
shaping level limit (0 <= n <= 64, default = 32).
A, alevel <n> adaptive noise shaping level limit (-2.0 <= n <= 2.5,
default = 0.0).
V, verbose enable more detailed feedback information in output.
-I, --ignore-chunk-sizes.
ignore 'RIFF' and 'data' chunk sizes in input.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(12500 <= n <= 20000*; default=16000).
*: for 44.1/48 kHz audio. Upper limit for audio of
other sampling rates is limited to sample-rate x 45.35%
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0 <= n <= 16; default = 3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
-n, --noskew disable application of low frequency level reduction prior
to determination of bits-to-remove.
--scale <n> factor to scale audio by; (0.03125 < n <= 8.0; default=1).
-s, --shaping modify settings for noise shaping used in bit-removal:
a, altfilter enable alternative adaptive shaping filter method.
A, average set factor of shape modification above upper calculation
frequency limit (0.00000 <= n <= 1.00000)
c, cubic enable cubic interpolation when defining filter shape
e, extra additional white noise to add during creation of filter
f, fixed disable adaptive noise shaping (use fixed shaping)
h, hybrid enable hybrid alternative to default adaptive noise shaping
method. Uses all available calculated analyses to create
the desired noise filter shape rather than only those for
1.5ms and 20ms FFT analyses.
n, nowarp disable warped noise shaping (use linear frequency shaping)
o, off disable noise shaping altogether (use simple rounding)
s, scale <n> change effectiveness of noise shaping (0 < n <= 2; default
= 1.0)
t, taps <n> select number of taps to use in FIR filter (8 <= n <= 256;
default = 64)
w, warp enable cubic interpolation when creating warped filter
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
3<=n<=28, limited to bits-per-sample - 3).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist [all] enable frequency analysis display of input data. Use of
'all' parameter displays all calculated analyses.
-H, --histogram show sample value histogram (input, lossy and correction).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Tyge Lovset for the C++ translation initiative.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%~1" flac -d "%~1" --stdout --silent|lossywav - --stdout --quality standard ^
--stdinname "%~1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag ^
--fromfile "%~1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===Linux / OS X support: lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c). The latest stable version (1.3.0 at the time of writing) is fully supported.

[https://github.com/gcocatre/caudec caudec] is a command-line tool that can encode and decode lossyWAV files (lossyFLAC, lossyWV), using the a linux or macOS build (see the POSIX version below).

There is also a [http://github.com/MoSal/lossywav-for-posix lossyWAV for POSIX] port available on GitHub that does not require any Wine emulation.

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension: lossy.flac
Parameters: /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|c:\"program files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyTAK settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension: lossy.tak
Parameters: /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|c:\"program files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension: lossy.wv
Parameters: /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|c:\"program files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension: lossy.wma
Parameters: /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|c:\"program files"\bin\wmaencode.exe - %d --codec lsl --ignorelength
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaud.io/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[Transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Will it low-pass filter my audio?
*'''Short answer:''' No. The frequency limit is for the analysis only. LossyWAV cannot low-pass filter your audio.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaud.io/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=109239 lossyWAV 1.5.0 development thread]
*[http://www.hydrogenaud.io/forums/index.php?showtopic=107081 lossyWAV 1.4.0 release thread] - Release of version 1.4.0 on 02 Oktober 2014
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaud.io/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaud.io/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaud.io/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaud.io/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaud.io/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[index.php?title=Category:Software]]

Revised ReplayGain specification

2026-01-23T19:07:28Z

Skamp: Make it clear that the REPLAYGAIN_REFERENCE_LOUDNESS tag goes against the ReplayGain philosophy of having a single target, not a moveable one.

''This is a proposed update to the [[ReplayGain 1.0 specification|original ReplayGain specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [https://hydrogenaudio.org/index.php/topic,129032.0.html dedicated thread on Hydrogenaudio].'' --[[User:Skamp|Skamp]] 22:10 CET, January 22, 2026

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The [[ReplayGain 1.0 specification|original ReplayGain specification]] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-5<ref>https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en</ref> and was published in December 2023. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by the original ReplayGain algorithm. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG specification.

The computation required for BS.1770 loudness measurement is reduced compared to the original RG technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG algorithm in modeling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

Recent RG implementations use BS.1770 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

Classic ReplayGain is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with classic RG, newer RG uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to classic RG.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[ReplayGain 2.0 specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise an album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments''', same for other Ogg codecs
* .opus (Ogg Opus) – '''Vorbis comments''' available
** standard {{code|R128_TRACK_GAIN}} and {{code|R128_ALBUM_GAIN}} (MUST adjust for -23 LUFS) comment keys may be preferable (used by {{code|loudgain}})
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .asf, .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
** {{code|loudgain}} instead uses native ASF/WMA attributes (itself a key-value storage) via TagLib, which is more sensible
* .wav (Windows PCM) – No metadata support (use .bwf instead)
** ID3 RIFF chunk possible (used by {{code|loudgain}})
* .wv (WavPack) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

===De-Facto extensions===
MusicBrainz Picard and [http://github.com/Moonbase59/loudgain#tags-written-andor-deleted LoudGain] also support the following additional tags, named using the same conventions:

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!Value format
!Purpose
|-
|Track range
|REPLAYGAIN_TRACK_RANGE
|[-]a.bb dB
|rowspan=2 | Indicates dynamics (R-128 Loudness Range / LRA), may guide pre-amplification
|-
|Album range
|REPLAYGAIN_ALBUM_RANGE
|[-]a.bb dB
|-
|Reference loudness
|<s>REPLAYGAIN_REFERENCE_LOUDNESS</s>
|[-]a.bb LUFS
| Use alternative reference levels; change ref levels without re-scanning the file.
--
This field is harmful and deviates from the standard, which has only one reference target. The effective playback level is changed within the player and does not require any form of rescanning.
|}

==== Proposed extension(s) ====

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!Value format
!Purpose
|-
|Algorithm
|REPLAYGAIN_ALGORITHM
|ITU-R BS.1770
|Method to produce values
|}

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[ReplayGain 2.0 specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>


===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

ReplayGain

2026-01-23T09:12:44Z

Skamp:

'''ReplayGain''' is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the '''perceived''' loudness of audio data.

ReplayGain allows the loudness of each song within a collection of songs to be consistent. This is called 'Track Gain' (or 'Radio Gain' in earlier parlance). It also allows the loudness of a specific sub-collection (an "album") to be consistent with the rest of the collection, while allowing the dynamics from song to song on the album to remain intact. This is called 'Album Gain' (or 'Audiophile Gain' in earlier parlance). This is especially important when listening to classical music albums, because quiet tracks need to remain a certain degree quieter than the louder ones.

ReplayGain is different from [[Normalization|peak normalization]]. Peak normalization merely ensures that the peak amplitude reaches a certain level. This does not ensure equal loudness. The ReplayGain technique measures the ''effective power'' of the waveform (i.e. the RMS power after applying an "equal loudness contour"), and then adjusts the amplitude of the waveform accordingly. The result is that Replay Gained waveforms are usually more uniformly amplified than peak-normalized waveforms.

==Target loudness==
The target loudness of all ReplayGain utilities is 89 dB SPL when replayed in an SMPTE RP 200 calibrated system (an early departure from the proposal, endorsed by its author<ref>[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=83397&view=findpost&p=721854 Does Replay gain work differtly in Media monkey]</ref>) — the ReplayGain proposal and SMPTE recommendation are 6 dB lower.<ref>[http://www.mars.org/mailman/public/mad-dev/2004-February/000993.html ReplayGain discussion at mad-dev]</ref> The target loudness may be more commonly known and understood as '''-18''' '''[https://en.wikipedia.org/wiki/LUFS LUFS]''' (''Loudness Units relative to Full Scale'').

Some utilities have realized the inadequacies of the classic ReplayGain loudness calculation, switching to a more modern algorithm ([https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en ITU-R BS.1770]). However, the way it was integrated was extremely ''ad hoc'', at least until a draft of a [[ReplayGain 2.0 specification|revised specification]] started being written.

==Clipping==
Audio is generally recorded such that the loudest sounds don't clip, but the use of ReplayGain can cause clipping if the average volume of a song is below the target level. That is, upon playback, the volume of a quiet song is increased, so the parts of the song with above-average loudness, especially in the bass frequencies, will exceed the limits of the format and will be distorted. Whether this distortion is audible depends on the sounds in question, and the listener's sensitivity.

Implementations deal with the risk of clipping in different ways. Some have a "pre-amp" feature which reduces (or boosts) the original audio's level by a certain amount before doing whatever is needed for ReplayGain. Some have a "prevent clipping" feature to reduce the amount of ReplayGain adjustment to whatever amount would keep clipping from occurring, based on peak info stored in the file's metadata (thus reducing the effectiveness of ReplayGain). Some recommend using a compressor/limiter DSP to prevent or reduce clipping, regardless of whether it was caused by ReplayGain.

An alternative that may reduce the risk of clipping is the [https://tech.ebu.ch/docs/r/r128.pdf EBU R 128] recommendation of a -23 LUFS target, but that's a different standard, and some may find the additional reduction in volume excessive, particularly if it leads to maxing out volume on user hardware. [[Opus]] in particular has adopted that standard, using a global gain as well as <code>R128_TRACK_GAIN</code> / <code>R128_ALBUM_GAIN</code> tags. With ReplayGain, a simple (though somewhat radical) solution is to lower the preamp value by 5 dB (or whatever one feels comfortable with) in the playback software. The RG target will always be -18 LUFS, regardless of a user's volume or preamp settings.

== Implementations ==
There are different ReplayGain implementations, each with its own uses and strength. Most use [[metadata]] to indicate the level of the volume change that the player should make. Some modify the audio data itself, and optionally use metadata as well. There are advantages and disadvantages to both methods.

In the metadata method, information on both types of ReplayGain (Track Gain and Album Gain) can be stored. The volume-change information can be very precise. If audio data was also changed, the metadata can contain "undo" info. Not all audio players/decoders know how to read and use ReplayGain information stored in metadata. And there's no standard for where and how ReplayGain info is stored; each implementation uses different formats and puts the info in different locations.

In the audio data method, the file's actual audio data is modified so that its natural/default playback volume is at the target level. In this scenario, only one type of ReplayGain (Track Gain or Album Gain) can be applied. If no "undo" info is saved somewhere, it may not be possible to restore the original audio data. Limitations of the audio file format may prevent precise (finely tuned) gain adjustments with this method. For example, MP3 and AAC files can only be losslessly modified in 1.5 dB steps. Depending on the audio file format, the process may also be lossy in the sense that it could irreversibly push a signal above the format's maximum amplitude (resulting in clipping) or below the minimum (resulting in silence).

=== Old versus new algorithms ===
Since the ReplayGain standard does not define tags to specify which algorithm was used (classic or ITU-R BS.1770) or what target was set (RG's -18 LUFS, EBU R 128's -23 LUFS, or any other target set by the user or some piece of software), there may be confusion as to how the results were produced. Typically, utilities that ship with reference encoders (FLAC / metaflac, Vorbis / vorbisgain, WavPack / wvgain, Musepack / mpcgain…) use the original RG algorithm, which can produce values that differ from newer tools by several decibels in certain cases. Generally speaking, it is recommended to run other utilities or players that implement the ITU-R BS.1770 algorithm, although it may not be obvious which algorithm they use at first glance. Their documentation may provide that information.

==== RG1, RG2 ====
Although there are many references online, and within ReplayGain scanners, about version 1 vs. version 2 of the ReplayGain standard, at the time of writing this, there is (admittedly) only one ReplayGain standard. The core principle, as well as the 4 tags from the original specification, have not changed. More tags have been proposed but they are still subject to debate. As a rule of thumb, tools that advertise "ReplayGain 2" compliance employ the newer, more accurate ITU-R BS.1770 algorithm. [[foobar2000]] labels it "EBU R128" but it essentially means the same thing. Should a better algorithm be developed in the future, it will still work towards fulfilling ReplayGain's original goals, and probably write the same tags.

=== MP3Gain ===
[[MP3Gain]] is an implementation of classic ReplayGain. It can be used to just analyze files & recommend changes or to also modify the gain. If modifying the gain, it always modifies the global gain fields in the MP3 audio data. It can add somewhat precise metadata, including undo info. The gain can be modified to any target dB, or it can be changed by a specified amount. For balance correction, user-specified changes can even be made on just one channel in simple L/R stereo-mode files (not joint stereo).

* Format: [[MP3]]
* Method: Audio + Meta (in APE tag), or Audio only
* APE tag fields (ASCII bytes):
** <code>MP3GAIN_MINMAX ###,###</code> - minimum & maximum global gain values for this file. 3 digits, zero-padded if necessary.
** <code>MP3GAIN_ALBUM_MINMAX ###,###</code> - minimum & maximum global gain values across a set of files scanned as an album. Optional.
** <code>MP3GAIN_UNDO +###,+###,N</code> - the global gain adjustment to restore the original values in the left and right channels, respectively, followed by an indicator of whether to wrap at the extremes (<code>N</code> means no, <code>W</code> means yes). The adjustment values are 3 digits, zero-padded, preceded by a sign (<code>+</code> or <code>-</code>).
** <code>REPLAYGAIN_TRACK_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Examples: <code>+0.424046</code> and <code>-10.38500</code>
** <code>REPLAYGAIN_TRACK_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Example: <code>0.149923</code>
** <code>REPLAYGAIN_ALBUM_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Optional.
** <code>REPLAYGAIN_ALBUM_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Optional.
* Limitations: Although the metadata, if written, contains precise adjustment & peak values, the audio data modifications are limited to 1.5dB steps and may become irreversible (however, that's a very rare condition; see the [https://hydrogenaud.io/index.php/topic,34154.0.html "mp3gain is NOT lossless" forum thread])
* http://mp3gain.sourceforge.net/

=== AACGain ===
[[AACGain]] is a modified version of MP3Gain that works on both MP3 and AAC files.

* Format: [[MP3]], [[AAC]] (with or without MP4 container)
* Method: Audio + Meta, or Audio only
* Limitations: Limited to 1.5dB steps mode, may become irreversible (same caveat as for MP3Gain)
* http://aacgain.altosdesign.com/

=== [[LAME]] ===
* Method: Header ([http://gabriel.mp3-tech.org/mp3infotag.html mp3infotag])
* Notes:
** Uses the classic RG algorithm.
** Tags added during encoding; not supported by any player yet; Track Gain only
** Replay Gaining MP3's is usually done using MP3Gain (see [[ReplayGain#MP3Gain|above]]) or [[ReplayGain#foobar2000 ReplayGain scanner|foobar2000]]
* http://lame.sourceforge.net/

=== [[Musepack]] ReplayGain ===
* Method: Header (similar to Meta data method)
* Notes: Uses the classic RG algorithm. ReplayGain values are stored in the header and ReplayGain is part of the Musepack specifications; therefore any Musepack decoder that does not support ReplayGain can be considered broken.
* http://www.musepack.net/

=== VorbisGain ===
* Format: (Ogg) [[Vorbis]]
* Method: Meta (in [[Vorbis comment]])
* Uses the classic RG algorithm.
* http://www.sjeng.org/vorbisgain.html
** new compiles of VorbisGain at [http://www.rarewares.org/ogg.html www.rarewares.org]
:'''''Note:''' Andavari has provided a very useful script to integrate VorbisGain, which is a CLI tool, into Windows Explorer. Please (Ogg) [[Vorbis#ReplayGain|check this section]].''

=== FLAC / METAFLAC ===
* Format: [[Free Lossless Audio Codec|FLAC]]
* Method: Meta (in [[Vorbis comment]])
* Uses the classic RG algorithm.
* http://flac.sf.net

=== WavPack / WVGAIN ===
* Format: [[WavPack]]
* Method: Meta (in [[APEv2]] tag)
* Uses the classic RG algorithm.
* http://www.wavpack.com

=== Wavegain ===
* Format: waveform
* Method: Audio
* Uses the classic RG algorithm.
* Limitations: Irreversible
* http://www.rarewares.org/others.php#wavegain

=== MusicPlayer ===
* Custom implementation, inspired by MP3Gain.
* Format: any that FFmpeg supports
* Method: Audio
* Limitations: Doesn't modify the files at all. Stores the value in own database. Used only for playback.
* https://github.com/albertz/music-player

=== [[foobar2000]] ReplayGain scanner ===
* Since v1.1.6, defaults to ITU-R BS.1770 analysis (although it labels it EBU R128), but can be configured to use the "Classic ReplayGain" algorithm instead. The ITU-R BS.1770 implementation uses a reference level of -18 LUFS instead of -23, in order to retain compatibility with the ReplayGain standard.
* Format:
** [[MP3]]: Values written to [[ID3v2]] (default) or [[APEv2]] tags.
** [[Musepack]]: Values written to header.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[Opus]]: Values written to [[Vorbis comment]] and/or file header.
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags. As with MP3, it is also an option to apply gain via a separate function.
** [[MP4]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain). Can write chosen Gain value to Apple's SoundCheck
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[TAK]]: Values written to [[APEv2]] tags.
** [[TTA]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** [[WMA]]: Values written to WMA tags.
** Modules ([[MOD]] etc.): Optionally saved into [[APEv2]] tags.
** Any non-taggable format or format supported by FFmpeg can store RG values in a database or external tag with a component.
** A separate function can be invoked to apply the tagged Track or Album Gain to the global gain fields in MP3, MP4 (AAC), or Opus files, and rewrite any existing tags to account for the peak change and compensate for the difference from 89 dB. The 89 dB reference level for tags isn't configurable, but the reference level applied to the global gain fields is.
** Can automatically copy Track or Album gain values to Apple's SoundCheck tag in MP4 files or any format that supports ID3v2 to effectively add ReplayGain support to Apple's players.
* https://foobar2000.org/

=== [[MediaMonkey]] ===
* Format:
** [[MP3]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in MediaMonkey's MDB database.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[WAV]]: Values stored in MediaMonkey's MDB database.
** [[MPC]]: Internal gain Structure.
* In addition to tags, all ReplayGain values are also stored in MediaMonkey's MDB database
* Album/Audiophile ReplayGain not supported until v3.0 (Dec 2007); support during burning & ripping added in 3.1 (Jun 2009)
* Also capable of (irreversibly) changing the volume of MP3 tracks, similar to [[MP3Gain]]
* http://www.mediamonkey.com/

=== [[Winamp]] ReplayGain scanner===
* Format:
** [[MP3]]: Values written to [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in Windows Media Audio tags.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]
** [[TAK]]: Values written to [[APEv2]] tags.
* Support Album/Track Gain

=== [[loudgain]] ===
* Format:
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** MP2, [[MP3]]: Values written to [[ID3v2]] tags (ID3v2.3/ID3v2.4 selectable).
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** (Ogg) [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** (Ogg) [[Speex]]: Values written to [[Vorbis comment]].
** [[Opus]]: Values written to [[Vorbis comment]], based on -23 LUFS Opus standard. Only <code>R128_TRACK_GAIN</code> and <code>R128_ALBUM_GAIN</code> are written, but the calculated ''true peak'' value can still be used to reduce the gain values ([[Clipping]] prevention).
** [[MP4]], [[M4A]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain). ReplayGain values are stored under <code>----:com.apple.iTunes:…</code>. This is for [[AAC]] and [[ALAC]] in [[MPEG-4]] containers.
** [[ASF]], [[Windows Media Audio|WMA]]: Values written to WMA tags, no prefix.
** [[WAV]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] (ID3v2.3/ID3v2.4 selectable) format. Using the <code>bext</code> chunk (for BWF v2) isn’t (yet) supported, but won’t be destroyed on writing.
** [[Audio Interchange File Format|AIFF]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] format.
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[Monkey's Audio]] (APE): Values written to [[APEv2]] tags.
* Follows EBU R128, ITU BS.1770 and the [[ReplayGain 2.0 specification|revised ReplayGain specification]].
* ''Never'' touches the actual audio data but ''only writes RG2 tags''.
* Uses ''true peak'' values calculated by oversampling to 192 kHz, using a custom polyphase FIR interpolator that will oversample 4x for sample rates < 96 kHz, 2x for sample rates < 192 kHz and leave the signal unchanged for 192 kHz.
* ''Clipping prevention'' can be used to lower the ReplayGain values to a safe margin (default -1 dBTP, can be changed).
* Many options for special cases: force RG tags upper-/lowercase, add extra tags (LRA, Reference loudness), strip unwanted tag types (APEv2 from MP2/MP3, ID3 from WavPack), tab-delimited table output for analysis with CSV file.
* ''Linux'' Free and Open Source software, can be installed on ''MacOS'' using ''HomeBrew'', on ''Windows 10'' using the Linux ''bash''.
* Also installs a <code>rgbpm</code> bash script for mass-tagging, which can be adapted to the user’s needs.
* '''Warning:''' Loudgain relies on standard libraries like ''TagLib''. Linux distros (except rolling releases) sometimes deliver outdated libraries, so be sure you use the latest version of ''TagLib''. Version 1.11.1 had a nasty bug for a while that [https://hydrogenaud.io/index.php/topic,118085.msg974957.html#msg974957 could corrupt Ogg Vorbis files]. This has been fixed in the meantime but the TagLib version not updated. Loudgain comes with a (slower) static version called <code>loudgain.static</code> in the repo’s <code>/bin</code> folder that doesn’t expose the bug and can also be used on older Linux versions (like Ubuntu 14.04, Linux Mint 17).
* https://github.com/Moonbase59/loudgain
* Bug tracker: https://github.com/Moonbase59/loudgain/issues

=== [[rsgain]] ===
rsgain is a newer ReplayGain command line utility designed with a "batteries included" philosophy, use the newer ITU-R BS.1770 algorithm.

Features:
* Cross-platform Windows / macOS / Linux
* Supports all popular audio formats
* Simplified "Easy Mode" command line syntax supports recursive, directory-based scanning
* Multithreaded scanning option that provides significant speed improvement with full library scans
* Option to skip files with existing ReplayGain metadata
* Scan presets allow the user to save advanced settings for consistent use

== Players support ==
ReplayGain being present in the specs of the FLAC, Musepack, and APE formats, any player that support those formats usually supports ReplayGain.

The situation with MP3 is rather different, as it was not part of the MP3 specs. The APEv2 tags metadata implementation is somewhat becoming the de-facto standard.

=== Windows ===
* [[foobar2000]] supports ReplayGain in all possible aspects.
* [[Winamp]] supports ReplayGain in album or track mode.
* [[MediaMonkey]] supports ReplayGain, with many configuration options.
* [[XMPlay]] recently implemented ReplayGain
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.

''...and probably others.''

=== Linux ===
* [[XMMS]]. Reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], [[Musepack]], (Ogg) [[Vorbis]] ..
:For [[MP3]], use the CVS version of the [http://xmms-mad.sourceforge.net/ xmms-mad] mp3 plugin (it's not yet released as binary, furthermore not available in distribs' versions for now. Meanwhile binaries are available here: [http://perso.crans.org/~krempp/xmms-mad/ custom binaries])
* [[amarok]]. By using the amarok-script [http://kde-apps.org/content/show.php?content=26073 ReplayGain]
:And possibly others, since [http://developer.kde.org/~wheeler/taglib.html TagLib] added support for [[APEv2]] tags in [[MP3]] files, players using this library (like [[amaroK]] and [[JuK]]) might support that kind of ReplayGain tags in the near future.
* [http://www.sacredchao.net/quodlibet Quod Libet] reads ReplayGain from (Ogg) [[Vorbis]], [[MP3]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:Requires support to be enabled (via the appropriate python bindings and libraries) for the above formats. Does not support ReplayGain values stored in [[APEv2]] tags in [[MP3]]s. ReplayGain values are stored in RVA2 id3v2.4 frames. See the [http://www.sacredchao.net/quodlibet/wiki/Development/ID3Notes Quod Libet RVA2 / ReplayGain notes].
* [http://www.musicpd.org/ Music Player Daemon] (MPD) reads ReplayGain from (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:foobar2000-style TXXX frames in [[MP3]]s are also supported in the latest development releases.
* [http://www.mplayerhq.hu/ MPlayer]. Mplayer support for ReplayGain is codec dependent.
:Codecs that are known to support ReplayGain: vorbis
:Because of this, you need to prioritize the codecs that support it, or choose it individually on the command line. To add it to the command line, add an -ac [codec] option after each file that you want to choose the codec for, or at the beginning to make it apply to all files listed. To prioritize the codecs by default, list them in a line in mplayer.conf:
ac=[codec],[othercodec],vorbis,mad,
* [http://idjc.sourceforge.net/ IDJC] (Internet DJ Console) reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], (Ogg) FLAC, (Ogg) [[Vorbis]], MP2 (audio), [[MP3]], [[Opus]], but only the ''lowercase'' tags. There is a [https://sourceforge.net/p/idjc/bugs/100/ ticket] open to handle tags case-insensitively.
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.
* [https://www.videolan.org/vlc/ VLC] supports ReplayGain in many file formats, but usually only the ''uppercase'' variant of the tags.
* [https://kodi.tv/ KODI] reads ReplayGain from nearly all formats, but usually only the ''lowercase'' variant of the tags.

=== Portable devices ===
[http://www.rockbox.org/ Rockbox] supports ReplayGain (in album or track mode) for most formats, including WMA, MP1/2/3, AAC, ALAC, Musepack, Monkey's Audio, Wavpack, FLAC and Vorbis. Note that ReplayGain is only supported when using the respective codec's native tagging format. For example: ReplayGain stored in APEv2 tags is not supported for MP3, rather ID3v2.x tags are expected.

Sandisk Sansa Fuze with firmware 1.02.26 and 2.02.26

Sandisk Sansa Clip+

The iPod features ''Soundcheck'', which seems to produce roughly the same normalization gains as ReplayGain, but doesn't provide an Album Gain.

=== Hi-Fi ===
Slim Devices, a company owned by Logitech Inc, supports ReplayGain on both of their hi-end audiophile players, known as the [[Slim Devices Transporter|Transporter]] and the [[Slim Devices Squeezebox|Squeezebox]].

BluOS also supports ReplayGain with the selection of album- or track-gain and a so called Smart option that decides between the two by itself.
NAD devices that use BluOS consequently also support ReplayGain.

==Notes==
<references/>

== See also ==
* [[ReplayGain specification]]

== External links ==
* [http://en.wikipedia.org/wiki/Replay_Gain ReplayGain] at Wikipedia
* [http://www.bobulous.org.uk/misc/Replay-Gain.html ReplayGain using foobar2000] (how to use ReplayGain in Windows using foobar2000).
* [http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html ReplayGain in Linux] (how to use ReplayGain in Linux using foobar2000 and Wine, or using metaflac or vorbisgain).

[[index.php?title=Category:Technical]]
[[index.php?title=Category:Metadata]]

ReplayGain

2026-01-23T09:09:58Z

Skamp: Clarification about possible clipping with the ReplayGain target level.

'''ReplayGain''' is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the '''perceived''' loudness of audio data.

ReplayGain allows the loudness of each song within a collection of songs to be consistent. This is called 'Track Gain' (or 'Radio Gain' in earlier parlance). It also allows the loudness of a specific sub-collection (an "album") to be consistent with the rest of the collection, while allowing the dynamics from song to song on the album to remain intact. This is called 'Album Gain' (or 'Audiophile Gain' in earlier parlance). This is especially important when listening to classical music albums, because quiet tracks need to remain a certain degree quieter than the louder ones.

ReplayGain is different from [[Normalization|peak normalization]]. Peak normalization merely ensures that the peak amplitude reaches a certain level. This does not ensure equal loudness. The ReplayGain technique measures the ''effective power'' of the waveform (i.e. the RMS power after applying an "equal loudness contour"), and then adjusts the amplitude of the waveform accordingly. The result is that Replay Gained waveforms are usually more uniformly amplified than peak-normalized waveforms.

==Target loudness==
The target loudness of all ReplayGain utilities is 89 dB SPL when replayed in an SMPTE RP 200 calibrated system (an early departure from the proposal, endorsed by its author<ref>[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=83397&view=findpost&p=721854 Does Replay gain work differtly in Media monkey]</ref>) — the ReplayGain proposal and SMPTE recommendation are 6 dB lower.<ref>[http://www.mars.org/mailman/public/mad-dev/2004-February/000993.html ReplayGain discussion at mad-dev]</ref> The target loudness may be more commonly known and understood as '''-18''' '''[https://en.wikipedia.org/wiki/LUFS LUFS]''' (''Loudness Units relative to Full Scale'').

Some utilities have realized the inadequacies of the classic ReplayGain loudness calculation, switching to a more modern algorithm ([https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en ITU-R BS.1770]). However, the way it was integrated was extremely ''ad hoc'', at least until a draft of a [[ReplayGain 2.0 specification|revised specification]] started being written.

==Clipping==
Audio is generally recorded such that the loudest sounds don't clip, but the use of ReplayGain can cause clipping if the average volume of a song is below the target level. That is, upon playback, the volume of a quiet song is increased, so the parts of the song with above-average loudness, especially in the bass frequencies, will exceed the limits of the format and will be distorted. Whether this distortion is audible depends on the sounds in question, and the listener's sensitivity.

Implementations deal with the risk of clipping in different ways. Some have a "pre-amp" feature which reduces (or boosts) the original audio's level by a certain amount before doing whatever is needed for ReplayGain. Some have a "prevent clipping" feature to reduce the amount of ReplayGain adjustment to whatever amount would keep clipping from occurring, based on peak info stored in the file's metadata (thus reducing the effectiveness of ReplayGain). Some recommend using a compressor/limiter DSP to prevent or reduce clipping, regardless of whether it was caused by ReplayGain.

An alternative that may reduce the risk of clipping is the [https://tech.ebu.ch/docs/r/r128.pdf EBU R 128] recommendation of a '''-23''' '''LUFS''' target, but that's a different standard, and some may find the additional reduction in volume excessive, particularly if it leads to maxing out volume on user hardware. [[Opus]] in particular has adopted that standard, using a global gain as well as <code>R128_TRACK_GAIN</code> / <code>R128_ALBUM_GAIN</code> tags. With ReplayGain, a simple (though somewhat radical) solution is to lower the preamp value by 5 dB (or whatever one feels comfortable with) in the playback software.

== Implementations ==
There are different ReplayGain implementations, each with its own uses and strength. Most use [[metadata]] to indicate the level of the volume change that the player should make. Some modify the audio data itself, and optionally use metadata as well. There are advantages and disadvantages to both methods.

In the metadata method, information on both types of ReplayGain (Track Gain and Album Gain) can be stored. The volume-change information can be very precise. If audio data was also changed, the metadata can contain "undo" info. Not all audio players/decoders know how to read and use ReplayGain information stored in metadata. And there's no standard for where and how ReplayGain info is stored; each implementation uses different formats and puts the info in different locations.

In the audio data method, the file's actual audio data is modified so that its natural/default playback volume is at the target level. In this scenario, only one type of ReplayGain (Track Gain or Album Gain) can be applied. If no "undo" info is saved somewhere, it may not be possible to restore the original audio data. Limitations of the audio file format may prevent precise (finely tuned) gain adjustments with this method. For example, MP3 and AAC files can only be losslessly modified in 1.5 dB steps. Depending on the audio file format, the process may also be lossy in the sense that it could irreversibly push a signal above the format's maximum amplitude (resulting in clipping) or below the minimum (resulting in silence).

=== Old versus new algorithms ===
Since the ReplayGain standard does not define tags to specify which algorithm was used (classic or ITU-R BS.1770) or what target was set (RG's -18 LUFS, EBU R 128's -23 LUFS, or any other target set by the user or some piece of software), there may be confusion as to how the results were produced. Typically, utilities that ship with reference encoders (FLAC / metaflac, Vorbis / vorbisgain, WavPack / wvgain, Musepack / mpcgain…) use the original RG algorithm, which can produce values that differ from newer tools by several decibels in certain cases. Generally speaking, it is recommended to run other utilities or players that implement the ITU-R BS.1770 algorithm, although it may not be obvious which algorithm they use at first glance. Their documentation may provide that information.

==== RG1, RG2 ====
Although there are many references online, and within ReplayGain scanners, about version 1 vs. version 2 of the ReplayGain standard, at the time of writing this, there is (admittedly) only one ReplayGain standard. The core principle, as well as the 4 tags from the original specification, have not changed. More tags have been proposed but they are still subject to debate. As a rule of thumb, tools that advertise "ReplayGain 2" compliance employ the newer, more accurate ITU-R BS.1770 algorithm. [[foobar2000]] labels it "EBU R128" but it essentially means the same thing. Should a better algorithm be developed in the future, it will still work towards fulfilling ReplayGain's original goals, and probably write the same tags.

=== MP3Gain ===
[[MP3Gain]] is an implementation of classic ReplayGain. It can be used to just analyze files & recommend changes or to also modify the gain. If modifying the gain, it always modifies the global gain fields in the MP3 audio data. It can add somewhat precise metadata, including undo info. The gain can be modified to any target dB, or it can be changed by a specified amount. For balance correction, user-specified changes can even be made on just one channel in simple L/R stereo-mode files (not joint stereo).

* Format: [[MP3]]
* Method: Audio + Meta (in APE tag), or Audio only
* APE tag fields (ASCII bytes):
** <code>MP3GAIN_MINMAX ###,###</code> - minimum & maximum global gain values for this file. 3 digits, zero-padded if necessary.
** <code>MP3GAIN_ALBUM_MINMAX ###,###</code> - minimum & maximum global gain values across a set of files scanned as an album. Optional.
** <code>MP3GAIN_UNDO +###,+###,N</code> - the global gain adjustment to restore the original values in the left and right channels, respectively, followed by an indicator of whether to wrap at the extremes (<code>N</code> means no, <code>W</code> means yes). The adjustment values are 3 digits, zero-padded, preceded by a sign (<code>+</code> or <code>-</code>).
** <code>REPLAYGAIN_TRACK_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Examples: <code>+0.424046</code> and <code>-10.38500</code>
** <code>REPLAYGAIN_TRACK_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Example: <code>0.149923</code>
** <code>REPLAYGAIN_ALBUM_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Optional.
** <code>REPLAYGAIN_ALBUM_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Optional.
* Limitations: Although the metadata, if written, contains precise adjustment & peak values, the audio data modifications are limited to 1.5dB steps and may become irreversible (however, that's a very rare condition; see the [https://hydrogenaud.io/index.php/topic,34154.0.html "mp3gain is NOT lossless" forum thread])
* http://mp3gain.sourceforge.net/

=== AACGain ===
[[AACGain]] is a modified version of MP3Gain that works on both MP3 and AAC files.

* Format: [[MP3]], [[AAC]] (with or without MP4 container)
* Method: Audio + Meta, or Audio only
* Limitations: Limited to 1.5dB steps mode, may become irreversible (same caveat as for MP3Gain)
* http://aacgain.altosdesign.com/

=== [[LAME]] ===
* Method: Header ([http://gabriel.mp3-tech.org/mp3infotag.html mp3infotag])
* Notes:
** Uses the classic RG algorithm.
** Tags added during encoding; not supported by any player yet; Track Gain only
** Replay Gaining MP3's is usually done using MP3Gain (see [[ReplayGain#MP3Gain|above]]) or [[ReplayGain#foobar2000 ReplayGain scanner|foobar2000]]
* http://lame.sourceforge.net/

=== [[Musepack]] ReplayGain ===
* Method: Header (similar to Meta data method)
* Notes: Uses the classic RG algorithm. ReplayGain values are stored in the header and ReplayGain is part of the Musepack specifications; therefore any Musepack decoder that does not support ReplayGain can be considered broken.
* http://www.musepack.net/

=== VorbisGain ===
* Format: (Ogg) [[Vorbis]]
* Method: Meta (in [[Vorbis comment]])
* Uses the classic RG algorithm.
* http://www.sjeng.org/vorbisgain.html
** new compiles of VorbisGain at [http://www.rarewares.org/ogg.html www.rarewares.org]
:'''''Note:''' Andavari has provided a very useful script to integrate VorbisGain, which is a CLI tool, into Windows Explorer. Please (Ogg) [[Vorbis#ReplayGain|check this section]].''

=== FLAC / METAFLAC ===
* Format: [[Free Lossless Audio Codec|FLAC]]
* Method: Meta (in [[Vorbis comment]])
* Uses the classic RG algorithm.
* http://flac.sf.net

=== WavPack / WVGAIN ===
* Format: [[WavPack]]
* Method: Meta (in [[APEv2]] tag)
* Uses the classic RG algorithm.
* http://www.wavpack.com

=== Wavegain ===
* Format: waveform
* Method: Audio
* Uses the classic RG algorithm.
* Limitations: Irreversible
* http://www.rarewares.org/others.php#wavegain

=== MusicPlayer ===
* Custom implementation, inspired by MP3Gain.
* Format: any that FFmpeg supports
* Method: Audio
* Limitations: Doesn't modify the files at all. Stores the value in own database. Used only for playback.
* https://github.com/albertz/music-player

=== [[foobar2000]] ReplayGain scanner ===
* Since v1.1.6, defaults to ITU-R BS.1770 analysis (although it labels it EBU R128), but can be configured to use the "Classic ReplayGain" algorithm instead. The ITU-R BS.1770 implementation uses a reference level of -18 LUFS instead of -23, in order to retain compatibility with the ReplayGain standard.
* Format:
** [[MP3]]: Values written to [[ID3v2]] (default) or [[APEv2]] tags.
** [[Musepack]]: Values written to header.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[Opus]]: Values written to [[Vorbis comment]] and/or file header.
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags. As with MP3, it is also an option to apply gain via a separate function.
** [[MP4]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain). Can write chosen Gain value to Apple's SoundCheck
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[TAK]]: Values written to [[APEv2]] tags.
** [[TTA]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** [[WMA]]: Values written to WMA tags.
** Modules ([[MOD]] etc.): Optionally saved into [[APEv2]] tags.
** Any non-taggable format or format supported by FFmpeg can store RG values in a database or external tag with a component.
** A separate function can be invoked to apply the tagged Track or Album Gain to the global gain fields in MP3, MP4 (AAC), or Opus files, and rewrite any existing tags to account for the peak change and compensate for the difference from 89 dB. The 89 dB reference level for tags isn't configurable, but the reference level applied to the global gain fields is.
** Can automatically copy Track or Album gain values to Apple's SoundCheck tag in MP4 files or any format that supports ID3v2 to effectively add ReplayGain support to Apple's players.
* https://foobar2000.org/

=== [[MediaMonkey]] ===
* Format:
** [[MP3]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in MediaMonkey's MDB database.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[WAV]]: Values stored in MediaMonkey's MDB database.
** [[MPC]]: Internal gain Structure.
* In addition to tags, all ReplayGain values are also stored in MediaMonkey's MDB database
* Album/Audiophile ReplayGain not supported until v3.0 (Dec 2007); support during burning & ripping added in 3.1 (Jun 2009)
* Also capable of (irreversibly) changing the volume of MP3 tracks, similar to [[MP3Gain]]
* http://www.mediamonkey.com/

=== [[Winamp]] ReplayGain scanner===
* Format:
** [[MP3]]: Values written to [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in Windows Media Audio tags.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]
** [[TAK]]: Values written to [[APEv2]] tags.
* Support Album/Track Gain

=== [[loudgain]] ===
* Format:
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** MP2, [[MP3]]: Values written to [[ID3v2]] tags (ID3v2.3/ID3v2.4 selectable).
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** (Ogg) [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** (Ogg) [[Speex]]: Values written to [[Vorbis comment]].
** [[Opus]]: Values written to [[Vorbis comment]], based on -23 LUFS Opus standard. Only <code>R128_TRACK_GAIN</code> and <code>R128_ALBUM_GAIN</code> are written, but the calculated ''true peak'' value can still be used to reduce the gain values ([[Clipping]] prevention).
** [[MP4]], [[M4A]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain). ReplayGain values are stored under <code>----:com.apple.iTunes:…</code>. This is for [[AAC]] and [[ALAC]] in [[MPEG-4]] containers.
** [[ASF]], [[Windows Media Audio|WMA]]: Values written to WMA tags, no prefix.
** [[WAV]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] (ID3v2.3/ID3v2.4 selectable) format. Using the <code>bext</code> chunk (for BWF v2) isn’t (yet) supported, but won’t be destroyed on writing.
** [[Audio Interchange File Format|AIFF]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] format.
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[Monkey's Audio]] (APE): Values written to [[APEv2]] tags.
* Follows EBU R128, ITU BS.1770 and the [[ReplayGain 2.0 specification|revised ReplayGain specification]].
* ''Never'' touches the actual audio data but ''only writes RG2 tags''.
* Uses ''true peak'' values calculated by oversampling to 192 kHz, using a custom polyphase FIR interpolator that will oversample 4x for sample rates < 96 kHz, 2x for sample rates < 192 kHz and leave the signal unchanged for 192 kHz.
* ''Clipping prevention'' can be used to lower the ReplayGain values to a safe margin (default -1 dBTP, can be changed).
* Many options for special cases: force RG tags upper-/lowercase, add extra tags (LRA, Reference loudness), strip unwanted tag types (APEv2 from MP2/MP3, ID3 from WavPack), tab-delimited table output for analysis with CSV file.
* ''Linux'' Free and Open Source software, can be installed on ''MacOS'' using ''HomeBrew'', on ''Windows 10'' using the Linux ''bash''.
* Also installs a <code>rgbpm</code> bash script for mass-tagging, which can be adapted to the user’s needs.
* '''Warning:''' Loudgain relies on standard libraries like ''TagLib''. Linux distros (except rolling releases) sometimes deliver outdated libraries, so be sure you use the latest version of ''TagLib''. Version 1.11.1 had a nasty bug for a while that [https://hydrogenaud.io/index.php/topic,118085.msg974957.html#msg974957 could corrupt Ogg Vorbis files]. This has been fixed in the meantime but the TagLib version not updated. Loudgain comes with a (slower) static version called <code>loudgain.static</code> in the repo’s <code>/bin</code> folder that doesn’t expose the bug and can also be used on older Linux versions (like Ubuntu 14.04, Linux Mint 17).
* https://github.com/Moonbase59/loudgain
* Bug tracker: https://github.com/Moonbase59/loudgain/issues

=== [[rsgain]] ===
rsgain is a newer ReplayGain command line utility designed with a "batteries included" philosophy, use the newer ITU-R BS.1770 algorithm.

Features:
* Cross-platform Windows / macOS / Linux
* Supports all popular audio formats
* Simplified "Easy Mode" command line syntax supports recursive, directory-based scanning
* Multithreaded scanning option that provides significant speed improvement with full library scans
* Option to skip files with existing ReplayGain metadata
* Scan presets allow the user to save advanced settings for consistent use

== Players support ==
ReplayGain being present in the specs of the FLAC, Musepack, and APE formats, any player that support those formats usually supports ReplayGain.

The situation with MP3 is rather different, as it was not part of the MP3 specs. The APEv2 tags metadata implementation is somewhat becoming the de-facto standard.

=== Windows ===
* [[foobar2000]] supports ReplayGain in all possible aspects.
* [[Winamp]] supports ReplayGain in album or track mode.
* [[MediaMonkey]] supports ReplayGain, with many configuration options.
* [[XMPlay]] recently implemented ReplayGain
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.

''...and probably others.''

=== Linux ===
* [[XMMS]]. Reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], [[Musepack]], (Ogg) [[Vorbis]] ..
:For [[MP3]], use the CVS version of the [http://xmms-mad.sourceforge.net/ xmms-mad] mp3 plugin (it's not yet released as binary, furthermore not available in distribs' versions for now. Meanwhile binaries are available here: [http://perso.crans.org/~krempp/xmms-mad/ custom binaries])
* [[amarok]]. By using the amarok-script [http://kde-apps.org/content/show.php?content=26073 ReplayGain]
:And possibly others, since [http://developer.kde.org/~wheeler/taglib.html TagLib] added support for [[APEv2]] tags in [[MP3]] files, players using this library (like [[amaroK]] and [[JuK]]) might support that kind of ReplayGain tags in the near future.
* [http://www.sacredchao.net/quodlibet Quod Libet] reads ReplayGain from (Ogg) [[Vorbis]], [[MP3]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:Requires support to be enabled (via the appropriate python bindings and libraries) for the above formats. Does not support ReplayGain values stored in [[APEv2]] tags in [[MP3]]s. ReplayGain values are stored in RVA2 id3v2.4 frames. See the [http://www.sacredchao.net/quodlibet/wiki/Development/ID3Notes Quod Libet RVA2 / ReplayGain notes].
* [http://www.musicpd.org/ Music Player Daemon] (MPD) reads ReplayGain from (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:foobar2000-style TXXX frames in [[MP3]]s are also supported in the latest development releases.
* [http://www.mplayerhq.hu/ MPlayer]. Mplayer support for ReplayGain is codec dependent.
:Codecs that are known to support ReplayGain: vorbis
:Because of this, you need to prioritize the codecs that support it, or choose it individually on the command line. To add it to the command line, add an -ac [codec] option after each file that you want to choose the codec for, or at the beginning to make it apply to all files listed. To prioritize the codecs by default, list them in a line in mplayer.conf:
ac=[codec],[othercodec],vorbis,mad,
* [http://idjc.sourceforge.net/ IDJC] (Internet DJ Console) reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], (Ogg) FLAC, (Ogg) [[Vorbis]], MP2 (audio), [[MP3]], [[Opus]], but only the ''lowercase'' tags. There is a [https://sourceforge.net/p/idjc/bugs/100/ ticket] open to handle tags case-insensitively.
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.
* [https://www.videolan.org/vlc/ VLC] supports ReplayGain in many file formats, but usually only the ''uppercase'' variant of the tags.
* [https://kodi.tv/ KODI] reads ReplayGain from nearly all formats, but usually only the ''lowercase'' variant of the tags.

=== Portable devices ===
[http://www.rockbox.org/ Rockbox] supports ReplayGain (in album or track mode) for most formats, including WMA, MP1/2/3, AAC, ALAC, Musepack, Monkey's Audio, Wavpack, FLAC and Vorbis. Note that ReplayGain is only supported when using the respective codec's native tagging format. For example: ReplayGain stored in APEv2 tags is not supported for MP3, rather ID3v2.x tags are expected.

Sandisk Sansa Fuze with firmware 1.02.26 and 2.02.26

Sandisk Sansa Clip+

The iPod features ''Soundcheck'', which seems to produce roughly the same normalization gains as ReplayGain, but doesn't provide an Album Gain.

=== Hi-Fi ===
Slim Devices, a company owned by Logitech Inc, supports ReplayGain on both of their hi-end audiophile players, known as the [[Slim Devices Transporter|Transporter]] and the [[Slim Devices Squeezebox|Squeezebox]].

BluOS also supports ReplayGain with the selection of album- or track-gain and a so called Smart option that decides between the two by itself.
NAD devices that use BluOS consequently also support ReplayGain.

==Notes==
<references/>

== See also ==
* [[ReplayGain specification]]

== External links ==
* [http://en.wikipedia.org/wiki/Replay_Gain ReplayGain] at Wikipedia
* [http://www.bobulous.org.uk/misc/Replay-Gain.html ReplayGain using foobar2000] (how to use ReplayGain in Windows using foobar2000).
* [http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html ReplayGain in Linux] (how to use ReplayGain in Linux using foobar2000 and Wine, or using metaflac or vorbisgain).

[[index.php?title=Category:Technical]]
[[index.php?title=Category:Metadata]]

Revised ReplayGain specification

2026-01-22T21:42:32Z

Skamp: Proposed tag: REPLAYGAIN_ALGORITHM

''This is a proposed update to the [[ReplayGain 1.0 specification|original ReplayGain specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [https://hydrogenaudio.org/index.php/topic,129032.0.html dedicated thread on Hydrogenaudio].'' --[[User:Skamp|Skamp]] 22:10 CET, January 22, 2026

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The [[ReplayGain 1.0 specification|original ReplayGain specification]] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-5<ref>https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en</ref> and was published in December 2023. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by the original ReplayGain algorithm. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG specification.

The computation required for BS.1770 loudness measurement is reduced compared to the original RG technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG algorithm in modeling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

Recent RG implementations use BS.1770 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

Classic ReplayGain is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with classic RG, newer RG uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to classic RG.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[ReplayGain 2.0 specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise an album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments''', same for other Ogg codecs
* .opus (Ogg Opus) – '''Vorbis comments''' available
** standard {{code|R128_TRACK_GAIN}} and {{code|R128_ALBUM_GAIN}} (MUST adjust for -23 LUFS) comment keys may be preferable (used by {{code|loudgain}})
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .asf, .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
** {{code|loudgain}} instead uses native ASF/WMA attributes (itself a key-value storage) via TagLib, which is more sensible
* .wav (Windows PCM) – No metadata support (use .bwf instead)
** ID3 RIFF chunk possible (used by {{code|loudgain}})
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

===De-Facto extensions===
MusicBrainz Picard and [http://github.com/Moonbase59/loudgain#tags-written-andor-deleted LoudGain] also support the following additional tags, named using the same conventions:

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!Value format
!Purpose
|-
|Track range
|REPLAYGAIN_TRACK_RANGE
|[-]a.bb dB
|rowspan=2 | Indicates dynamics (R-128 Loudness Range / LRA), may guide pre-amplification
|-
|Album range
|REPLAYGAIN_ALBUM_RANGE
|[-]a.bb dB
|-
|Reference loudness
|REPLAYGAIN_REFERENCE_LOUDNESS
|[-]a.bb LUFS
| Use alternative reference levels; change ref levels without re-scanning the file
|}

==== Proposed extension(s) ====

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!Value format
!Purpose
|-
|Algorithm
|REPLAYGAIN_ALGORITHM
|ITU-R BS.1770
|Method to produce values
|}

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[ReplayGain 2.0 specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>


===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Revised ReplayGain specification

2026-01-22T21:12:21Z

Skamp: changed the disclaimer to my name, and linked to the updated HA thread

DISPLAYTITLE

''This is a proposed update to the [[ReplayGain 1.0 specification|original ReplayGain specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [https://hydrogenaudio.org/index.php/topic,129032.0.html dedicated thread on Hydrogenaudio].'' --[[User:Skamp|Skamp]] 22:10 CET, January 22, 2026

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The [[ReplayGain 1.0 specification|original ReplayGain specification]] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-5<ref>https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en</ref> and was published in December 2023. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by the original ReplayGain algorithm. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG specification.

The computation required for BS.1770 loudness measurement is reduced compared to the original RG technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG algorithm in modeling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

Recent RG implementations use BS.1770 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

Classic ReplayGain is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with classic RG, newer RG uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to classic RG.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[ReplayGain 2.0 specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise an album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments''', same for other Ogg codecs
* .opus (Ogg Opus) – '''Vorbis comments''' available
** standard {{code|R128_TRACK_GAIN}} and {{code|R128_ALBUM_GAIN}} (MUST adjust for -23 LUFS) comment keys may be preferable (used by {{code|loudgain}})
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .asf, .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
** {{code|loudgain}} instead uses native ASF/WMA attributes (itself a key-value storage) via TagLib, which is more sensible
* .wav (Windows PCM) – No metadata support (use .bwf instead)
** ID3 RIFF chunk possible (used by {{code|loudgain}})
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

===De-Facto extensions===
MusicBrainz Picard and [http://github.com/Moonbase59/loudgain#tags-written-andor-deleted LoudGain] also support the following additional tags, named using the same conventions:

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!alue format
! Purpose
|-
|Track range
|REPLAYGAIN_TRACK_RANGE
|[-]a.bb dB
|rowspan=2 | Indicates dynamics (R-128 Loudness Range / LRA), may guide pre-amplification
|-
|Album range
|REPLAYGAIN_ALBUM_RANGE
|[-]a.bb dB
|-
|Reference loudness
|REPLAYGAIN_REFERENCE_LOUDNESS
|[-]a.bb LUFS
| Use alternative reference levels; change ref levels without re-scanning the file
|}

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[ReplayGain 2.0 specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>


===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Original ReplayGain specification

2026-01-22T20:27:50Z

Skamp:

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[Original ReplayGain specification#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[Original ReplayGain specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to be separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, ten raised to the power of one-twentieth of replay gain.<ref> After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[Original ReplayGain specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

== See also ==
: ''This is not a normative part of the specification.''
* [[Revised ReplayGain specification]] (draft)

ReplayGain

2026-01-22T20:24:53Z

Skamp: More clarification about mentions of RG1 and RG2, online and elsewhere.

'''ReplayGain''' is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the '''perceived''' loudness of audio data.

ReplayGain allows the loudness of each song within a collection of songs to be consistent. This is called 'Track Gain' (or 'Radio Gain' in earlier parlance). It also allows the loudness of a specific sub-collection (an "album") to be consistent with the rest of the collection, while allowing the dynamics from song to song on the album to remain intact. This is called 'Album Gain' (or 'Audiophile Gain' in earlier parlance). This is especially important when listening to classical music albums, because quiet tracks need to remain a certain degree quieter than the louder ones.

ReplayGain is different from [[Normalization|peak normalization]]. Peak normalization merely ensures that the peak amplitude reaches a certain level. This does not ensure equal loudness. The ReplayGain technique measures the ''effective power'' of the waveform (i.e. the RMS power after applying an "equal loudness contour"), and then adjusts the amplitude of the waveform accordingly. The result is that Replay Gained waveforms are usually more uniformly amplified than peak-normalized waveforms.

==Target loudness==
The target loudness of almost all ReplayGain utilities is 89 dB SPL when replayed in an SMPTE RP 200 calibrated system (an early departure from the proposal, endorsed by its author<ref>[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=83397&view=findpost&p=721854 Does Replay gain work differtly in Media monkey]</ref>) — the ReplayGain proposal and SMPTE recommendation are 6 dB lower.<ref>[http://www.mars.org/mailman/public/mad-dev/2004-February/000993.html ReplayGain discussion at mad-dev]</ref> The target loudness may be more commonly known and understood as '''-18''' '''[https://en.wikipedia.org/wiki/LUFS LUFS]''' (''Loudness Units relative to Full Scale'').

Some utilities have realized the inadequacies of the classic ReplayGain loudness calculation, switching to a more modern algorithm ([https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en ITU-R BS.1770]). However, the way it was integrated was extremely ''ad hoc'', at least until a draft of a [[ReplayGain 2.0 specification|revised specification]] started being written.

==Clipping==
Audio is generally recorded such that the loudest sounds don't clip, but the use of ReplayGain can cause clipping if the average volume of a song is below the target level. That is, upon playback, the volume of a quiet song is increased, so the parts of the song with above-average loudness, especially in the bass frequencies, will exceed the limits of the format and will be distorted. Whether this distortion is audible depends on the sounds in question, and the listener's sensitivity.

Implementations deal with the risk of clipping in different ways. Some have a "pre-amp" feature which reduces (or boosts) the original audio's level by a certain amount before doing whatever is needed for ReplayGain. Some have a "prevent clipping" feature to reduce the amount of ReplayGain adjustment to whatever amount would keep clipping from occurring, based on peak info stored in the file's metadata (thus reducing the effectiveness of ReplayGain). Some recommend using a compressor/limiter DSP to prevent or reduce clipping, regardless of whether it was caused by ReplayGain.

An alternative that may reduce the risk of clipping is the [https://tech.ebu.ch/docs/r/r128.pdf EBU R 128] recommendation of a '''-23''' '''LUFS''' target, although some may find the additional reduction in volume excessive, particularly if it leads to maxing out volume on user hardware. [[Opus]] in particular has adopted that recommendation.

== Implementations ==
There are different ReplayGain implementations, each with its own uses and strength. Most use [[metadata]] to indicate the level of the volume change that the player should make. Some modify the audio data itself, and optionally use metadata as well. There are advantages and disadvantages to both methods.

In the metadata method, information on both types of ReplayGain (Track Gain and Album Gain) can be stored. The volume-change information can be very precise. If audio data was also changed, the metadata can contain "undo" info. Not all audio players/decoders know how to read and use ReplayGain information stored in metadata. And there's no standard for where and how ReplayGain info is stored; each implementation uses different formats and puts the info in different locations.

In the audio data method, the file's actual audio data is modified so that its natural/default playback volume is at the target level. In this scenario, only one type of ReplayGain (Track Gain or Album Gain) can be applied. If no "undo" info is saved somewhere, it may not be possible to restore the original audio data. Limitations of the audio file format may prevent precise (finely tuned) gain adjustments with this method. For example, MP3 and AAC files can only be losslessly modified in 1.5 dB steps. Depending on the audio file format, the process may also be lossy in the sense that it could irreversibly push a signal above the format's maximum amplitude (resulting in clipping) or below the minimum (resulting in silence).

=== Old versus new algorithms ===
Since the ReplayGain standard does not define tags to specify which algorithm was used (classic or ITU-R BS.1770) or what target was set (RG's -18 LUFS, EBU R 128's -23 LUFS, or any other target set by the user or some piece of software), there may be confusion as to how the results were produced. Typically, utilities that ship with reference encoders (FLAC / metaflac, Vorbis / vorbisgain, WavPack / wvgain, Musepack / mpcgain…) use the original RG algorithm, which can produce values that differ from newer tools by several decibels in certain cases. Generally speaking, it is recommended to run other utilities or players that implement the ITU-R BS.1770 algorithm, although it may not be obvious which algorithm they use at first glance. Their documentation may provide that information.

==== RG1, RG2 ====
Although there are many references online, and within ReplayGain scanners, about version 1 vs. version 2 of the ReplayGain standard, at the time of writing this, there is (admittedly) only one ReplayGain standard. The core principle, as well as the 4 tags from the original specification, have not changed. More tags have been proposed but they are still subject to debate. As a rule of thumb, tools that advertise "ReplayGain 2" compliance employ the newer, more accurate ITU-R BS.1770 algorithm. [[foobar2000]] labels it "EBU R128" but it essentially means the same thing. Should a better algorithm be developed in the future, it will still work towards fulfilling ReplayGain's original goals, and probably write the same tags.

=== MP3Gain ===
[[MP3Gain]] is an implementation of classic ReplayGain. It can be used to just analyze files & recommend changes or to also modify the gain. If modifying the gain, it always modifies the global gain fields in the MP3 audio data. It can add somewhat precise metadata, including undo info. The gain can be modified to any target dB, or it can be changed by a specified amount. For balance correction, user-specified changes can even be made on just one channel in simple L/R stereo-mode files (not joint stereo).

* Format: [[MP3]]
* Method: Audio + Meta (in APE tag), or Audio only
* APE tag fields (ASCII bytes):
** <code>MP3GAIN_MINMAX ###,###</code> - minimum & maximum global gain values for this file. 3 digits, zero-padded if necessary.
** <code>MP3GAIN_ALBUM_MINMAX ###,###</code> - minimum & maximum global gain values across a set of files scanned as an album. Optional.
** <code>MP3GAIN_UNDO +###,+###,N</code> - the global gain adjustment to restore the original values in the left and right channels, respectively, followed by an indicator of whether to wrap at the extremes (<code>N</code> means no, <code>W</code> means yes). The adjustment values are 3 digits, zero-padded, preceded by a sign (<code>+</code> or <code>-</code>).
** <code>REPLAYGAIN_TRACK_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Examples: <code>+0.424046</code> and <code>-10.38500</code>
** <code>REPLAYGAIN_TRACK_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Example: <code>0.149923</code>
** <code>REPLAYGAIN_ALBUM_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Optional.
** <code>REPLAYGAIN_ALBUM_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Optional.
* Limitations: Although the metadata, if written, contains precise adjustment & peak values, the audio data modifications are limited to 1.5dB steps and may become irreversible (however, that's a very rare condition; see the [https://hydrogenaud.io/index.php/topic,34154.0.html "mp3gain is NOT lossless" forum thread])
* http://mp3gain.sourceforge.net/

=== AACGain ===
[[AACGain]] is a modified version of MP3Gain that works on both MP3 and AAC files.

* Format: [[MP3]], [[AAC]] (with or without MP4 container)
* Method: Audio + Meta, or Audio only
* Limitations: Limited to 1.5dB steps mode, may become irreversible (same caveat as for MP3Gain)
* http://aacgain.altosdesign.com/

=== [[LAME]] ===
* Method: Header ([http://gabriel.mp3-tech.org/mp3infotag.html mp3infotag])
* Notes:
** Uses the classic RG algorithm.
** Tags added during encoding; not supported by any player yet; Track Gain only
** Replay Gaining MP3's is usually done using MP3Gain (see [[ReplayGain#MP3Gain|above]]) or [[ReplayGain#foobar2000 ReplayGain scanner|foobar2000]]
* http://lame.sourceforge.net/

=== [[Musepack]] ReplayGain ===
* Method: Header (similar to Meta data method)
* Notes: Uses the classic RG algorithm. ReplayGain values are stored in the header and ReplayGain is part of the Musepack specifications; therefore any Musepack decoder that does not support ReplayGain can be considered broken.
* http://www.musepack.net/

=== VorbisGain ===
* Format: (Ogg) [[Vorbis]]
* Method: Meta (in [[Vorbis comment]])
* Uses the classic RG algorithm.
* http://www.sjeng.org/vorbisgain.html
** new compiles of VorbisGain at [http://www.rarewares.org/ogg.html www.rarewares.org]
:'''''Note:''' Andavari has provided a very useful script to integrate VorbisGain, which is a CLI tool, into Windows Explorer. Please (Ogg) [[Vorbis#ReplayGain|check this section]].''

=== FLAC / METAFLAC ===
* Format: [[Free Lossless Audio Codec|FLAC]]
* Method: Meta (in [[Vorbis comment]])
* Uses the classic RG algorithm.
* http://flac.sf.net

=== WavPack / WVGAIN ===
* Format: [[WavPack]]
* Method: Meta (in [[APEv2]] tag)
* Uses the classic RG algorithm.
* http://www.wavpack.com

=== Wavegain ===
* Format: waveform
* Method: Audio
* Uses the classic RG algorithm.
* Limitations: Irreversible
* http://www.rarewares.org/others.php#wavegain

=== MusicPlayer ===
* Custom implementation, not derived from the original MP3Gain one (but inspired from). As far as I know, all other implementations are directly derived from the MP3Gain (gain_analysis.c, which is GPL) source.
* Format: any that FFmpeg supports
* Method: Audio
* Limitations: Doesn't modify the files at all. Stores the value in own database. Used only for playback.
* https://github.com/albertz/music-player

=== [[foobar2000]] ReplayGain scanner ===
* Since v1.1.6, defaults to ITU-R BS.1770 analysis (although it labels it EBU R128), but can be configured to use the "Classic ReplayGain" algorithm instead. The ITU-R BS.1770 implementation uses a reference level of -18 LUFS instead of -23, in order to retain compatibility with the ReplayGain standard.
* Format:
** [[MP3]]: Values written to [[ID3v2]] (default) or [[APEv2]] tags. A separate function can be invoked to apply the tagged Track or Album Gain to the MP3 global gain fields (as MP3Gain does), and rewrite any existing tags to account for the peak change and compensate for the difference from 89 dB. The 89 dB reference level for tags isn't configurable, but the reference level applied to the global gain fields is (it's under Preferences > Advanced > Tools > ReplayGain Scanner > Target MP3 alteration volume level).
** [[Musepack]]: Values written to header.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags. As with MP3, it is also an option to apply gain via a separate function.
** [[MP4]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain).
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** Modules ([[MOD]] etc.): Optionally saved into [[APEv2]] tags.
* https://foobar2000.org/

=== [[MediaMonkey]] ===
* Format:
** [[MP3]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in MediaMonkey's MDB database.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[WAV]]: Values stored in MediaMonkey's MDB database.
** [[MPC]]: Internal gain Structure.
* In addition to tags, all ReplayGain values are also stored in MediaMonkey's MDB database
* Album/Audiophile ReplayGain not supported until v3.0 (Dec 2007); support during burning & ripping added in 3.1 (Jun 2009)
* Also capable of (irreversibly) changing the volume of MP3 tracks, similar to [[MP3Gain]]
* http://www.mediamonkey.com/

=== [[Winamp]] ReplayGain scanner===
* Format:
** [[MP3]]: Values written to [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in Windows Media Audio tags.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]
** [[TAK]]: Values written to [[APEv2]] tags.
* Support Album/Track Gain

=== [[loudgain]] ===
* Format:
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** MP2, [[MP3]]: Values written to [[ID3v2]] tags (ID3v2.3/ID3v2.4 selectable).
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** (Ogg) [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** (Ogg) [[Speex]]: Values written to [[Vorbis comment]].
** [[Opus]]: Values written to [[Vorbis comment]], based on -23 LUFS Opus standard. Only <code>R128_TRACK_GAIN</code> and <code>R128_ALBUM_GAIN</code> are written, but the calculated ''true peak'' value can still be used to reduce the gain values ([[Clipping]] prevention).
** [[MP4]], [[M4A]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain). ReplayGain values are stored under <code>----:com.apple.iTunes:…</code>. This is for [[AAC]] and [[ALAC]] in [[MPEG-4]] containers.
** [[ASF]], [[Windows Media Audio|WMA]]: Values written to WMA tags, no prefix.
** [[WAV]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] (ID3v2.3/ID3v2.4 selectable) format. Using the <code>bext</code> chunk (for BWF v2) isn’t (yet) supported, but won’t be destroyed on writing.
** [[Audio Interchange File Format|AIFF]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] format.
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[Monkey's Audio]] (APE): Values written to [[APEv2]] tags.
* Follows EBU R128, ITU BS.1770 and the [[ReplayGain 2.0 specification|revised ReplayGain specification]].
* ''Never'' touches the actual audio data but ''only writes RG2 tags''.
* Uses ''true peak'' values calculated by oversampling to 192 kHz, using a custom polyphase FIR interpolator that will oversample 4x for sample rates < 96 kHz, 2x for sample rates < 192 kHz and leave the signal unchanged for 192 kHz.
* ''Clipping prevention'' can be used to lower the ReplayGain values to a safe margin (default -1 dBTP, can be changed).
* Many options for special cases: force RG tags upper-/lowercase, add extra tags (LRA, Reference loudness), strip unwanted tag types (APEv2 from MP2/MP3, ID3 from WavPack), tab-delimited table output for analysis with CSV file.
* ''Linux'' Free and Open Source software, can be installed on ''MacOS'' using ''HomeBrew'', on ''Windows 10'' using the Linux ''bash''.
* Also installs a <code>rgbpm</code> bash script for mass-tagging, which can be adapted to the user’s needs.
* '''Warning:''' Loudgain relies on standard libraries like ''TagLib''. Linux distros (except rolling releases) sometimes deliver outdated libraries, so be sure you use the latest version of ''TagLib''. Version 1.11.1 had a nasty bug for a while that [https://hydrogenaud.io/index.php/topic,118085.msg974957.html#msg974957 could corrupt Ogg Vorbis files]. This has been fixed in the meantime but the TagLib version not updated. Loudgain comes with a (slower) static version called <code>loudgain.static</code> in the repo’s <code>/bin</code> folder that doesn’t expose the bug and can also be used on older Linux versions (like Ubuntu 14.04, Linux Mint 17).
* https://github.com/Moonbase59/loudgain
* Bug tracker: https://github.com/Moonbase59/loudgain/issues

=== [[rsgain]] ===
rsgain is a newer ReplayGain command line utility designed with a "batteries included" philosophy, use the newer ITU-R BS.1770 algorithm.

Features:
* Cross-platform Windows / macOS / Linux
* Supports all popular audio formats
* Simplified "Easy Mode" command line syntax supports recursive, directory-based scanning
* Multithreaded scanning option that provides significant speed improvement with full library scans
* Option to skip files with existing ReplayGain metadata
* Scan presets allow the user to save advanced settings for consistent use

== Players support ==
ReplayGain being present in the specs of the FLAC, Musepack, and APE formats, any player that support those formats usually supports ReplayGain.

The situation with MP3 is rather different, as it was not part of the MP3 specs. The APEv2 tags metadata implementation is somewhat becoming the de-facto standard.

=== Windows ===
* [[foobar2000]] supports ReplayGain in all possible aspects.
* [[Winamp]] supports ReplayGain in album or track mode.
* [[MediaMonkey]] supports ReplayGain, with many configuration options.
* [[XMPlay]] recently implemented ReplayGain
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.

''...and probably others.''

=== Linux ===
* [[XMMS]]. Reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], [[Musepack]], (Ogg) [[Vorbis]] ..
:For [[MP3]], use the CVS version of the [http://xmms-mad.sourceforge.net/ xmms-mad] mp3 plugin (it's not yet released as binary, furthermore not available in distribs' versions for now. Meanwhile binaries are available here: [http://perso.crans.org/~krempp/xmms-mad/ custom binaries])
* [[amarok]]. By using the amarok-script [http://kde-apps.org/content/show.php?content=26073 ReplayGain]
:And possibly others, since [http://developer.kde.org/~wheeler/taglib.html TagLib] added support for [[APEv2]] tags in [[MP3]] files, players using this library (like [[amaroK]] and [[JuK]]) might support that kind of ReplayGain tags in the near future.
* [http://www.sacredchao.net/quodlibet Quod Libet] reads ReplayGain from (Ogg) [[Vorbis]], [[MP3]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:Requires support to be enabled (via the appropriate python bindings and libraries) for the above formats. Does not support ReplayGain values stored in [[APEv2]] tags in [[MP3]]s. ReplayGain values are stored in RVA2 id3v2.4 frames. See the [http://www.sacredchao.net/quodlibet/wiki/Development/ID3Notes Quod Libet RVA2 / ReplayGain notes].
* [http://www.musicpd.org/ Music Player Daemon] (MPD) reads ReplayGain from (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:foobar2000-style TXXX frames in [[MP3]]s are also supported in the latest development releases.
* [http://www.mplayerhq.hu/ MPlayer]. Mplayer support for ReplayGain is codec dependent.
:Codecs that are known to support ReplayGain: vorbis
:Because of this, you need to prioritize the codecs that support it, or choose it individually on the command line. To add it to the command line, add an -ac [codec] option after each file that you want to choose the codec for, or at the beginning to make it apply to all files listed. To prioritize the codecs by default, list them in a line in mplayer.conf:
ac=[codec],[othercodec],vorbis,mad,
* [http://idjc.sourceforge.net/ IDJC] (Internet DJ Console) reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], (Ogg) FLAC, (Ogg) [[Vorbis]], MP2 (audio), [[MP3]], [[Opus]], but only the ''lowercase'' tags. There is a [https://sourceforge.net/p/idjc/bugs/100/ ticket] open to handle tags case-insensitively.
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.
* [https://www.videolan.org/vlc/ VLC] supports ReplayGain in many file formats, but usually only the ''uppercase'' variant of the tags.
* [https://kodi.tv/ KODI] reads ReplayGain from nearly all formats, but usually only the ''lowercase'' variant of the tags.

=== Portable devices ===
[http://www.rockbox.org/ Rockbox] supports ReplayGain (in album or track mode) for most formats, including WMA, MP1/2/3, AAC, ALAC, Musepack, Monkey's Audio, Wavpack, FLAC and Vorbis. Note that ReplayGain is only supported when using the respective codec's native tagging format. For example: ReplayGain stored in APEv2 tags is not supported for MP3, rather ID3v2.x tags are expected.

Sandisk Sansa Fuze with firmware 1.02.26 and 2.02.26

Sandisk Sansa Clip+

The iPod features ''Soundcheck'', which seems to produce roughly the same normalization gains as ReplayGain, but doesn't provide an Album Gain.

=== Hi-Fi ===
Slim Devices, a company owned by Logitech Inc, supports ReplayGain on both of their hi-end audiophile players, known as the [[Slim Devices Transporter|Transporter]] and the [[Slim Devices Squeezebox|Squeezebox]].

BluOS also supports ReplayGain with the selection of album- or track-gain and a so called Smart option that decides between the two by itself.
NAD devices that use BluOS consequently also support ReplayGain.

==Notes==
<references/>

== See also ==
* [[ReplayGain specification]]

== External links ==
* [http://en.wikipedia.org/wiki/Replay_Gain ReplayGain] at Wikipedia
* [http://www.bobulous.org.uk/misc/Replay-Gain.html ReplayGain using foobar2000] (how to use ReplayGain in Windows using foobar2000).
* [http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html ReplayGain in Linux] (how to use ReplayGain in Linux using foobar2000 and Wine, or using metaflac or vorbisgain).

[[index.php?title=Category:Technical]]
[[index.php?title=Category:Metadata]]

ReplayGain

2026-01-22T20:09:01Z

Skamp: Added a paragraph to alleviate the confusion between reference tools using the classic RG algorithm, versus newer tools that use the newer algorithm.

'''ReplayGain''' is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the '''perceived''' loudness of audio data.

ReplayGain allows the loudness of each song within a collection of songs to be consistent. This is called 'Track Gain' (or 'Radio Gain' in earlier parlance). It also allows the loudness of a specific sub-collection (an "album") to be consistent with the rest of the collection, while allowing the dynamics from song to song on the album to remain intact. This is called 'Album Gain' (or 'Audiophile Gain' in earlier parlance). This is especially important when listening to classical music albums, because quiet tracks need to remain a certain degree quieter than the louder ones.

ReplayGain is different from [[Normalization|peak normalization]]. Peak normalization merely ensures that the peak amplitude reaches a certain level. This does not ensure equal loudness. The ReplayGain technique measures the ''effective power'' of the waveform (i.e. the RMS power after applying an "equal loudness contour"), and then adjusts the amplitude of the waveform accordingly. The result is that Replay Gained waveforms are usually more uniformly amplified than peak-normalized waveforms.

==Target loudness==
The target loudness of almost all ReplayGain utilities is 89 dB SPL when replayed in an SMPTE RP 200 calibrated system (an early departure from the proposal, endorsed by its author<ref>[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=83397&view=findpost&p=721854 Does Replay gain work differtly in Media monkey]</ref>) — the ReplayGain proposal and SMPTE recommendation are 6 dB lower.<ref>[http://www.mars.org/mailman/public/mad-dev/2004-February/000993.html ReplayGain discussion at mad-dev]</ref> The target loudness may be more commonly known and understood as '''-18''' '''[https://en.wikipedia.org/wiki/LUFS LUFS]''' (''Loudness Units relative to Full Scale'').

Some utilities have realized the inadequacies of the classic ReplayGain loudness calculation, switching to a more modern algorithm ([https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en ITU-R BS.1770]). However, the way it was integrated was extremely ''ad hoc'', at least until a draft of a [[ReplayGain 2.0 specification|revised specification]] started being written.

==Clipping==
Audio is generally recorded such that the loudest sounds don't clip, but the use of ReplayGain can cause clipping if the average volume of a song is below the target level. That is, upon playback, the volume of a quiet song is increased, so the parts of the song with above-average loudness, especially in the bass frequencies, will exceed the limits of the format and will be distorted. Whether this distortion is audible depends on the sounds in question, and the listener's sensitivity.

Implementations deal with the risk of clipping in different ways. Some have a "pre-amp" feature which reduces (or boosts) the original audio's level by a certain amount before doing whatever is needed for ReplayGain. Some have a "prevent clipping" feature to reduce the amount of ReplayGain adjustment to whatever amount would keep clipping from occurring, based on peak info stored in the file's metadata (thus reducing the effectiveness of ReplayGain). Some recommend using a compressor/limiter DSP to prevent or reduce clipping, regardless of whether it was caused by ReplayGain.

An alternative that may reduce the risk of clipping is the [https://tech.ebu.ch/docs/r/r128.pdf EBU R 128] recommendation of a '''-23''' '''LUFS''' target, although some may find the additional reduction in volume excessive, particularly if it leads to maxing out volume on user hardware. [[Opus]] in particular has adopted that recommendation.

== Implementations ==
There are different ReplayGain implementations, each with its own uses and strength. Most use [[metadata]] to indicate the level of the volume change that the player should make. Some modify the audio data itself, and optionally use metadata as well. There are advantages and disadvantages to both methods.

In the metadata method, information on both types of ReplayGain (Track Gain and Album Gain) can be stored. The volume-change information can be very precise. If audio data was also changed, the metadata can contain "undo" info. Not all audio players/decoders know how to read and use ReplayGain information stored in metadata. And there's no standard for where and how ReplayGain info is stored; each implementation uses different formats and puts the info in different locations.

In the audio data method, the file's actual audio data is modified so that its natural/default playback volume is at the target level. In this scenario, only one type of ReplayGain (Track Gain or Album Gain) can be applied. If no "undo" info is saved somewhere, it may not be possible to restore the original audio data. Limitations of the audio file format may prevent precise (finely tuned) gain adjustments with this method. For example, MP3 and AAC files can only be losslessly modified in 1.5 dB steps. Depending on the audio file format, the process may also be lossy in the sense that it could irreversibly push a signal above the format's maximum amplitude (resulting in clipping) or below the minimum (resulting in silence).

=== Old versus new algorithms ===
Since the ReplayGain standard does not define tags to specify which algorithm was used (classic or ITU-R BS.1770) or what target was set (RG's -18 LUFS, EBU R 128's -23 LUFS, or any other target set by the user or some piece of software), there may be confusion as to how the results were produced. Typically, utilities that ship with reference encoders (FLAC / metaflac, Vorbis / vorbisgain, WavPack / wvgain, Musepack / mpcgain…) use the original RG algorithm, which can produce values that differ from newer tools by several decibels in certain cases. Generally speaking, it is recommended to run other utilities or players that implement the ITU-R BS.1770 algorithm, although it may not be obvious which algorithm they use at first glance. Their documentation may provide that information.

=== MP3Gain ===
[[MP3Gain]] is an implementation of classic ReplayGain. It can be used to just analyze files & recommend changes or to also modify the gain. If modifying the gain, it always modifies the global gain fields in the MP3 audio data. It can add somewhat precise metadata, including undo info. The gain can be modified to any target dB, or it can be changed by a specified amount. For balance correction, user-specified changes can even be made on just one channel in simple L/R stereo-mode files (not joint stereo).

* Format: [[MP3]]
* Method: Audio + Meta (in APE tag), or Audio only
* APE tag fields (ASCII bytes):
** <code>MP3GAIN_MINMAX ###,###</code> - minimum & maximum global gain values for this file. 3 digits, zero-padded if necessary.
** <code>MP3GAIN_ALBUM_MINMAX ###,###</code> - minimum & maximum global gain values across a set of files scanned as an album. Optional.
** <code>MP3GAIN_UNDO +###,+###,N</code> - the global gain adjustment to restore the original values in the left and right channels, respectively, followed by an indicator of whether to wrap at the extremes (<code>N</code> means no, <code>W</code> means yes). The adjustment values are 3 digits, zero-padded, preceded by a sign (<code>+</code> or <code>-</code>).
** <code>REPLAYGAIN_TRACK_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Examples: <code>+0.424046</code> and <code>-10.38500</code>
** <code>REPLAYGAIN_TRACK_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Example: <code>0.149923</code>
** <code>REPLAYGAIN_ALBUM_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Optional.
** <code>REPLAYGAIN_ALBUM_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Optional.
* Limitations: Although the metadata, if written, contains precise adjustment & peak values, the audio data modifications are limited to 1.5dB steps and may become irreversible (however, that's a very rare condition; see the [https://hydrogenaud.io/index.php/topic,34154.0.html "mp3gain is NOT lossless" forum thread])
* http://mp3gain.sourceforge.net/

=== AACGain ===
[[AACGain]] is a modified version of MP3Gain that works on both MP3 and AAC files.

* Format: [[MP3]], [[AAC]] (with or without MP4 container)
* Method: Audio + Meta, or Audio only
* Limitations: Limited to 1.5dB steps mode, may become irreversible (same caveat as for MP3Gain)
* http://aacgain.altosdesign.com/

=== [[LAME]] ===
* Method: Header ([http://gabriel.mp3-tech.org/mp3infotag.html mp3infotag])
* Notes:
** Uses the classic RG algorithm.
** Tags added during encoding; not supported by any player yet; Track Gain only
** Replay Gaining MP3's is usually done using MP3Gain (see [[ReplayGain#MP3Gain|above]]) or [[ReplayGain#foobar2000 ReplayGain scanner|foobar2000]]
* http://lame.sourceforge.net/

=== [[Musepack]] ReplayGain ===
* Method: Header (similar to Meta data method)
* Notes: Uses the classic RG algorithm. ReplayGain values are stored in the header and ReplayGain is part of the Musepack specifications; therefore any Musepack decoder that does not support ReplayGain can be considered broken.
* http://www.musepack.net/

=== VorbisGain ===
* Format: (Ogg) [[Vorbis]]
* Method: Meta (in [[Vorbis comment]])
* Uses the classic RG algorithm.
* http://www.sjeng.org/vorbisgain.html
** new compiles of VorbisGain at [http://www.rarewares.org/ogg.html www.rarewares.org]
:'''''Note:''' Andavari has provided a very useful script to integrate VorbisGain, which is a CLI tool, into Windows Explorer. Please (Ogg) [[Vorbis#ReplayGain|check this section]].''

=== FLAC / METAFLAC ===
* Format: [[Free Lossless Audio Codec|FLAC]]
* Method: Meta (in [[Vorbis comment]])
* Uses the classic RG algorithm.
* http://flac.sf.net

=== WavPack / WVGAIN ===
* Format: [[WavPack]]
* Method: Meta (in [[APEv2]] tag)
* Uses the classic RG algorithm.
* http://www.wavpack.com

=== Wavegain ===
* Format: waveform
* Method: Audio
* Uses the classic RG algorithm.
* Limitations: Irreversible
* http://www.rarewares.org/others.php#wavegain

=== MusicPlayer ===
* Custom implementation, not derived from the original MP3Gain one (but inspired from). As far as I know, all other implementations are directly derived from the MP3Gain (gain_analysis.c, which is GPL) source.
* Format: any that FFmpeg supports
* Method: Audio
* Limitations: Doesn't modify the files at all. Stores the value in own database. Used only for playback.
* https://github.com/albertz/music-player

=== [[foobar2000]] ReplayGain scanner ===
* Since v1.1.6, defaults to ITU-R BS.1770 analysis (although it labels it EBU R128), but can be configured to use the "Classic ReplayGain" algorithm instead. The ITU-R BS.1770 implementation uses a reference level of -18 LUFS instead of -23, in order to retain compatibility with the ReplayGain standard.
* Format:
** [[MP3]]: Values written to [[ID3v2]] (default) or [[APEv2]] tags. A separate function can be invoked to apply the tagged Track or Album Gain to the MP3 global gain fields (as MP3Gain does), and rewrite any existing tags to account for the peak change and compensate for the difference from 89 dB. The 89 dB reference level for tags isn't configurable, but the reference level applied to the global gain fields is (it's under Preferences > Advanced > Tools > ReplayGain Scanner > Target MP3 alteration volume level).
** [[Musepack]]: Values written to header.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags. As with MP3, it is also an option to apply gain via a separate function.
** [[MP4]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain).
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** Modules ([[MOD]] etc.): Optionally saved into [[APEv2]] tags.
* https://foobar2000.org/

=== [[MediaMonkey]] ===
* Format:
** [[MP3]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in MediaMonkey's MDB database.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[WAV]]: Values stored in MediaMonkey's MDB database.
** [[MPC]]: Internal gain Structure.
* In addition to tags, all ReplayGain values are also stored in MediaMonkey's MDB database
* Album/Audiophile ReplayGain not supported until v3.0 (Dec 2007); support during burning & ripping added in 3.1 (Jun 2009)
* Also capable of (irreversibly) changing the volume of MP3 tracks, similar to [[MP3Gain]]
* http://www.mediamonkey.com/

=== [[Winamp]] ReplayGain scanner===
* Format:
** [[MP3]]: Values written to [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in Windows Media Audio tags.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]
** [[TAK]]: Values written to [[APEv2]] tags.
* Support Album/Track Gain

=== [[loudgain]] ===
* Format:
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** MP2, [[MP3]]: Values written to [[ID3v2]] tags (ID3v2.3/ID3v2.4 selectable).
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** (Ogg) [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** (Ogg) [[Speex]]: Values written to [[Vorbis comment]].
** [[Opus]]: Values written to [[Vorbis comment]], based on -23 LUFS Opus standard. Only <code>R128_TRACK_GAIN</code> and <code>R128_ALBUM_GAIN</code> are written, but the calculated ''true peak'' value can still be used to reduce the gain values ([[Clipping]] prevention).
** [[MP4]], [[M4A]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain). ReplayGain values are stored under <code>----:com.apple.iTunes:…</code>. This is for [[AAC]] and [[ALAC]] in [[MPEG-4]] containers.
** [[ASF]], [[Windows Media Audio|WMA]]: Values written to WMA tags, no prefix.
** [[WAV]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] (ID3v2.3/ID3v2.4 selectable) format. Using the <code>bext</code> chunk (for BWF v2) isn’t (yet) supported, but won’t be destroyed on writing.
** [[Audio Interchange File Format|AIFF]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] format.
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[Monkey's Audio]] (APE): Values written to [[APEv2]] tags.
* Follows EBU R128, ITU BS.1770 and the [[ReplayGain 2.0 specification|revised ReplayGain specification]].
* ''Never'' touches the actual audio data but ''only writes RG2 tags''.
* Uses ''true peak'' values calculated by oversampling to 192 kHz, using a custom polyphase FIR interpolator that will oversample 4x for sample rates < 96 kHz, 2x for sample rates < 192 kHz and leave the signal unchanged for 192 kHz.
* ''Clipping prevention'' can be used to lower the ReplayGain values to a safe margin (default -1 dBTP, can be changed).
* Many options for special cases: force RG tags upper-/lowercase, add extra tags (LRA, Reference loudness), strip unwanted tag types (APEv2 from MP2/MP3, ID3 from WavPack), tab-delimited table output for analysis with CSV file.
* ''Linux'' Free and Open Source software, can be installed on ''MacOS'' using ''HomeBrew'', on ''Windows 10'' using the Linux ''bash''.
* Also installs a <code>rgbpm</code> bash script for mass-tagging, which can be adapted to the user’s needs.
* '''Warning:''' Loudgain relies on standard libraries like ''TagLib''. Linux distros (except rolling releases) sometimes deliver outdated libraries, so be sure you use the latest version of ''TagLib''. Version 1.11.1 had a nasty bug for a while that [https://hydrogenaud.io/index.php/topic,118085.msg974957.html#msg974957 could corrupt Ogg Vorbis files]. This has been fixed in the meantime but the TagLib version not updated. Loudgain comes with a (slower) static version called <code>loudgain.static</code> in the repo’s <code>/bin</code> folder that doesn’t expose the bug and can also be used on older Linux versions (like Ubuntu 14.04, Linux Mint 17).
* https://github.com/Moonbase59/loudgain
* Bug tracker: https://github.com/Moonbase59/loudgain/issues

=== [[rsgain]] ===
rsgain is a newer ReplayGain command line utility designed with a "batteries included" philosophy, use the newer ITU-R BS.1770 algorithm.

Features:
* Cross-platform Windows / macOS / Linux
* Supports all popular audio formats
* Simplified "Easy Mode" command line syntax supports recursive, directory-based scanning
* Multithreaded scanning option that provides significant speed improvement with full library scans
* Option to skip files with existing ReplayGain metadata
* Scan presets allow the user to save advanced settings for consistent use

== Players support ==
ReplayGain being present in the specs of the FLAC, Musepack, and APE formats, any player that support those formats usually supports ReplayGain.

The situation with MP3 is rather different, as it was not part of the MP3 specs. The APEv2 tags metadata implementation is somewhat becoming the de-facto standard.

=== Windows ===
* [[foobar2000]] supports ReplayGain in all possible aspects.
* [[Winamp]] supports ReplayGain in album or track mode.
* [[MediaMonkey]] supports ReplayGain, with many configuration options.
* [[XMPlay]] recently implemented ReplayGain
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.

''...and probably others.''

=== Linux ===
* [[XMMS]]. Reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], [[Musepack]], (Ogg) [[Vorbis]] ..
:For [[MP3]], use the CVS version of the [http://xmms-mad.sourceforge.net/ xmms-mad] mp3 plugin (it's not yet released as binary, furthermore not available in distribs' versions for now. Meanwhile binaries are available here: [http://perso.crans.org/~krempp/xmms-mad/ custom binaries])
* [[amarok]]. By using the amarok-script [http://kde-apps.org/content/show.php?content=26073 ReplayGain]
:And possibly others, since [http://developer.kde.org/~wheeler/taglib.html TagLib] added support for [[APEv2]] tags in [[MP3]] files, players using this library (like [[amaroK]] and [[JuK]]) might support that kind of ReplayGain tags in the near future.
* [http://www.sacredchao.net/quodlibet Quod Libet] reads ReplayGain from (Ogg) [[Vorbis]], [[MP3]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:Requires support to be enabled (via the appropriate python bindings and libraries) for the above formats. Does not support ReplayGain values stored in [[APEv2]] tags in [[MP3]]s. ReplayGain values are stored in RVA2 id3v2.4 frames. See the [http://www.sacredchao.net/quodlibet/wiki/Development/ID3Notes Quod Libet RVA2 / ReplayGain notes].
* [http://www.musicpd.org/ Music Player Daemon] (MPD) reads ReplayGain from (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:foobar2000-style TXXX frames in [[MP3]]s are also supported in the latest development releases.
* [http://www.mplayerhq.hu/ MPlayer]. Mplayer support for ReplayGain is codec dependent.
:Codecs that are known to support ReplayGain: vorbis
:Because of this, you need to prioritize the codecs that support it, or choose it individually on the command line. To add it to the command line, add an -ac [codec] option after each file that you want to choose the codec for, or at the beginning to make it apply to all files listed. To prioritize the codecs by default, list them in a line in mplayer.conf:
ac=[codec],[othercodec],vorbis,mad,
* [http://idjc.sourceforge.net/ IDJC] (Internet DJ Console) reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], (Ogg) FLAC, (Ogg) [[Vorbis]], MP2 (audio), [[MP3]], [[Opus]], but only the ''lowercase'' tags. There is a [https://sourceforge.net/p/idjc/bugs/100/ ticket] open to handle tags case-insensitively.
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.
* [https://www.videolan.org/vlc/ VLC] supports ReplayGain in many file formats, but usually only the ''uppercase'' variant of the tags.
* [https://kodi.tv/ KODI] reads ReplayGain from nearly all formats, but usually only the ''lowercase'' variant of the tags.

=== Portable devices ===
[http://www.rockbox.org/ Rockbox] supports ReplayGain (in album or track mode) for most formats, including WMA, MP1/2/3, AAC, ALAC, Musepack, Monkey's Audio, Wavpack, FLAC and Vorbis. Note that ReplayGain is only supported when using the respective codec's native tagging format. For example: ReplayGain stored in APEv2 tags is not supported for MP3, rather ID3v2.x tags are expected.

Sandisk Sansa Fuze with firmware 1.02.26 and 2.02.26

Sandisk Sansa Clip+

The iPod features ''Soundcheck'', which seems to produce roughly the same normalization gains as ReplayGain, but doesn't provide an Album Gain.

=== Hi-Fi ===
Slim Devices, a company owned by Logitech Inc, supports ReplayGain on both of their hi-end audiophile players, known as the [[Slim Devices Transporter|Transporter]] and the [[Slim Devices Squeezebox|Squeezebox]].

BluOS also supports ReplayGain with the selection of album- or track-gain and a so called Smart option that decides between the two by itself.
NAD devices that use BluOS consequently also support ReplayGain.

==Notes==
<references/>

== See also ==
* [[ReplayGain specification]]

== External links ==
* [http://en.wikipedia.org/wiki/Replay_Gain ReplayGain] at Wikipedia
* [http://www.bobulous.org.uk/misc/Replay-Gain.html ReplayGain using foobar2000] (how to use ReplayGain in Windows using foobar2000).
* [http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html ReplayGain in Linux] (how to use ReplayGain in Linux using foobar2000 and Wine, or using metaflac or vorbisgain).

[[index.php?title=Category:Technical]]
[[index.php?title=Category:Metadata]]

Revised ReplayGain specification

2026-01-22T19:30:35Z

Skamp:

DISPLAYTITLE

''This is a proposed update to the [[ReplayGain 1.0 specification|original ReplayGain specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The [[ReplayGain 1.0 specification|original ReplayGain specification]] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-5<ref>https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en</ref> and was published in December 2023. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by the original ReplayGain algorithm. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG specification.

The computation required for BS.1770 loudness measurement is reduced compared to the original RG technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG algorithm in modeling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

Recent RG implementations use BS.1770 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

Classic ReplayGain is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with classic RG, newer RG uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to classic RG.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[ReplayGain 2.0 specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise an album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments''', same for other Ogg codecs
* .opus (Ogg Opus) – '''Vorbis comments''' available
** standard {{code|R128_TRACK_GAIN}} and {{code|R128_ALBUM_GAIN}} (MUST adjust for -23 LUFS) comment keys may be preferable (used by {{code|loudgain}})
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .asf, .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
** {{code|loudgain}} instead uses native ASF/WMA attributes (itself a key-value storage) via TagLib, which is more sensible
* .wav (Windows PCM) – No metadata support (use .bwf instead)
** ID3 RIFF chunk possible (used by {{code|loudgain}})
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

===De-Facto extensions===
MusicBrainz Picard and [http://github.com/Moonbase59/loudgain#tags-written-andor-deleted LoudGain] also support the following additional tags, named using the same conventions:

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!alue format
! Purpose
|-
|Track range
|REPLAYGAIN_TRACK_RANGE
|[-]a.bb dB
|rowspan=2 | Indicates dynamics (R-128 Loudness Range / LRA), may guide pre-amplification
|-
|Album range
|REPLAYGAIN_ALBUM_RANGE
|[-]a.bb dB
|-
|Reference loudness
|REPLAYGAIN_REFERENCE_LOUDNESS
|[-]a.bb LUFS
| Use alternative reference levels; change ref levels without re-scanning the file
|}

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[ReplayGain 2.0 specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>


===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Talk:ReplayGain 1.0 specification

2026-01-22T19:29:14Z

Skamp: Skamp moved page Talk:ReplayGain 1.0 specification to Talk:Original ReplayGain specification: Confusion about having two distinct, numbered versions of ReplayGain

#REDIRECT [[Talk:Original ReplayGain specification]]

Talk:Original ReplayGain specification

2026-01-22T19:29:14Z

Skamp: Skamp moved page Talk:ReplayGain 1.0 specification to Talk:Original ReplayGain specification: Confusion about having two distinct, numbered versions of ReplayGain

== Musepack ==

Hi, there is an inaccuracy about Musepack files. Although they use APEv2 for metadata, replaygain is stored in the file header by specification, see [http://trac.musepack.net/trac/wiki/SV8Specification here]. Actually, this is the first format introducing APEv2 tags and native replaygain support.
So, every musepack compliant player must read the RG data from the header rather then APEv2.
[[User:Antonski|Antonski]] 14:19, 4 May 2011 (UTC)

==Development discussion threads==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=1709 Flaw in ReplayGain spec]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=62374 Replay Gain Site, Why does it look like a museum?]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=83397 Does Replay gain work differtly in Media monkey, Foobar and Media Monkey given 2 differnt Results]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85536 Replay Gain specification, update in progress]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=69568 ReplayGain equal loudness filter]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85834 Replay Gain tagging, ID3, LAME, Others?]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=86745 ReplayGain player recommendations]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=87442 ReplayGain specification complete, official launch 25 March proposed]

ReplayGain 1.0 specification

2026-01-22T19:29:14Z

Skamp: Skamp moved page ReplayGain 1.0 specification to Original ReplayGain specification: Confusion about having two distinct, numbered versions of ReplayGain

#REDIRECT [[Original ReplayGain specification]]

Original ReplayGain specification

2026-01-22T19:29:14Z

Skamp: Skamp moved page ReplayGain 1.0 specification to Original ReplayGain specification: Confusion about having two distinct, numbered versions of ReplayGain

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (Fs=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (Fs=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
 

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to be separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, ten raised to the power of one-twentieth of replay gain.<ref> After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

== See also ==
: ''This is not a normative part of the specification.''
* [[ReplayGain 2.0 specification]] (draft)

Talk:ReplayGain 2.0 specification

2026-01-22T19:28:27Z

Skamp: Skamp moved page Talk:ReplayGain 2.0 specification to Talk:Revised ReplayGain specification: There is confusion about having two distinct, numbered specifications for ReplayGain

#REDIRECT [[Talk:Revised ReplayGain specification]]

Talk:Revised ReplayGain specification

2026-01-22T19:28:27Z

Skamp: Skamp moved page Talk:ReplayGain 2.0 specification to Talk:Revised ReplayGain specification: There is confusion about having two distinct, numbered specifications for ReplayGain

==Improvement discussion threads==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=15445 Improving ReplayGain, some ideas for Devs etc]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=89841 ReplayGain2, ReplayGain2 proposal]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85614 ReplayGain album gain problem]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=84769 ReplayGain when converting 5.1 to 2]

===R128===
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=85978 R128GAIN: An EBU R128 compliant loudness scanner]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=86116 libebur128 - (yet another) EBU R 128 implementation]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=86424 R128 versus ReplayGain, The cage match begins here.]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=88498 ReplayGain: Foobar2000 results differ from MP3Gain and MetaFLAC ones]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=88778 replaygain and R 128]

==External resources==
*[http://www.dolby.com/uploadedFiles/Assets/US/Doc/Professional/AES128-Loudness-Normalization-Portable-Media-Players.pdf Loudness Normalization in the Age of Portable Media Players]
*[http://music-loudness.com/PDFs/Loudness_Alliance_White_Paper_final_v1.pdf Loudness Normalization: The Future of File-Based Playback]

ReplayGain 2.0 specification

2026-01-22T19:28:27Z

Skamp: Skamp moved page ReplayGain 2.0 specification to Revised ReplayGain specification: There is confusion about having two distinct, numbered specifications for ReplayGain

#REDIRECT [[Revised ReplayGain specification]]

Revised ReplayGain specification

2026-01-22T19:28:27Z

Skamp: Skamp moved page ReplayGain 2.0 specification to Revised ReplayGain specification: There is confusion about having two distinct, numbered specifications for ReplayGain

DISPLAYTITLE

''This is a proposed update to the [[ReplayGain 1.0 specification|original ReplayGain specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The original [[ReplayGain 1.0 specification|ReplayGain specification]] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-5<ref>https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en</ref> and was published in December 2023. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by the original ReplayGain algorithm. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG specification.

The computation required for BS.1770 loudness measurement is reduced compared to the original RG technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG algorithm in modeling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

Recent RG implementations use BS.1770 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

Classic ReplayGain is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with classic RG, newer RG uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to classic RG.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[ReplayGain 2.0 specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise an album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments''', same for other Ogg codecs
* .opus (Ogg Opus) – '''Vorbis comments''' available
** standard {{code|R128_TRACK_GAIN}} and {{code|R128_ALBUM_GAIN}} (MUST adjust for -23 LUFS) comment keys may be preferable (used by {{code|loudgain}})
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .asf, .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
** {{code|loudgain}} instead uses native ASF/WMA attributes (itself a key-value storage) via TagLib, which is more sensible
* .wav (Windows PCM) – No metadata support (use .bwf instead)
** ID3 RIFF chunk possible (used by {{code|loudgain}})
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

===De-Facto extensions===
MusicBrainz Picard and [http://github.com/Moonbase59/loudgain#tags-written-andor-deleted LoudGain] also support the following additional tags, named using the same conventions:

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!alue format
! Purpose
|-
|Track range
|REPLAYGAIN_TRACK_RANGE
|[-]a.bb dB
|rowspan=2 | Indicates dynamics (R-128 Loudness Range / LRA), may guide pre-amplification
|-
|Album range
|REPLAYGAIN_ALBUM_RANGE
|[-]a.bb dB
|-
|Reference loudness
|REPLAYGAIN_REFERENCE_LOUDNESS
|[-]a.bb LUFS
| Use alternative reference levels; change ref levels without re-scanning the file
|}

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[ReplayGain 2.0 specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>


===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Revised ReplayGain specification

2026-01-22T19:26:43Z

Skamp: Changed "ReplayGain 1.0 specification" to "the original ReplayGain specification"

DISPLAYTITLE

''This is a proposed update to the [[ReplayGain 1.0 specification|original ReplayGain specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The original [[ReplayGain 1.0 specification|ReplayGain specification]] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-5<ref>https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en</ref> and was published in December 2023. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by the original ReplayGain algorithm. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG specification.

The computation required for BS.1770 loudness measurement is reduced compared to the original RG technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG algorithm in modeling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

Recent RG implementations use BS.1770 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

Classic ReplayGain is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with classic RG, newer RG uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to classic RG.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[ReplayGain 2.0 specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise an album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments''', same for other Ogg codecs
* .opus (Ogg Opus) – '''Vorbis comments''' available
** standard {{code|R128_TRACK_GAIN}} and {{code|R128_ALBUM_GAIN}} (MUST adjust for -23 LUFS) comment keys may be preferable (used by {{code|loudgain}})
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .asf, .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
** {{code|loudgain}} instead uses native ASF/WMA attributes (itself a key-value storage) via TagLib, which is more sensible
* .wav (Windows PCM) – No metadata support (use .bwf instead)
** ID3 RIFF chunk possible (used by {{code|loudgain}})
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

===De-Facto extensions===
MusicBrainz Picard and [http://github.com/Moonbase59/loudgain#tags-written-andor-deleted LoudGain] also support the following additional tags, named using the same conventions:

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!alue format
! Purpose
|-
|Track range
|REPLAYGAIN_TRACK_RANGE
|[-]a.bb dB
|rowspan=2 | Indicates dynamics (R-128 Loudness Range / LRA), may guide pre-amplification
|-
|Album range
|REPLAYGAIN_ALBUM_RANGE
|[-]a.bb dB
|-
|Reference loudness
|REPLAYGAIN_REFERENCE_LOUDNESS
|[-]a.bb LUFS
| Use alternative reference levels; change ref levels without re-scanning the file
|}

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[ReplayGain 2.0 specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>


===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Revised ReplayGain specification

2026-01-22T19:20:25Z

Skamp:

DISPLAYTITLE

''This is a proposed update to the [[ReplayGain 1.0 specification|original ReplayGain specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The original [http://wiki.hydrogenaudio.org/index.php?title=Replaygain ReplayGain 1.0 specification] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-5<ref>https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en</ref> and was published in December 2023. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by the original ReplayGain algorithm. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG specification.

The computation required for BS.1770 loudness measurement is reduced compared to the original RG technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG algorithm in modeling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

Recent RG implementations use BS.1770 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

Classic ReplayGain is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with classic RG, newer RG uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to classic RG.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[ReplayGain 2.0 specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise an album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments''', same for other Ogg codecs
* .opus (Ogg Opus) – '''Vorbis comments''' available
** standard {{code|R128_TRACK_GAIN}} and {{code|R128_ALBUM_GAIN}} (MUST adjust for -23 LUFS) comment keys may be preferable (used by {{code|loudgain}})
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .asf, .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
** {{code|loudgain}} instead uses native ASF/WMA attributes (itself a key-value storage) via TagLib, which is more sensible
* .wav (Windows PCM) – No metadata support (use .bwf instead)
** ID3 RIFF chunk possible (used by {{code|loudgain}})
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

===De-Facto extensions===
MusicBrainz Picard and [http://github.com/Moonbase59/loudgain#tags-written-andor-deleted LoudGain] also support the following additional tags, named using the same conventions:

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!alue format
! Purpose
|-
|Track range
|REPLAYGAIN_TRACK_RANGE
|[-]a.bb dB
|rowspan=2 | Indicates dynamics (R-128 Loudness Range / LRA), may guide pre-amplification
|-
|Album range
|REPLAYGAIN_ALBUM_RANGE
|[-]a.bb dB
|-
|Reference loudness
|REPLAYGAIN_REFERENCE_LOUDNESS
|[-]a.bb LUFS
| Use alternative reference levels; change ref levels without re-scanning the file
|}

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[ReplayGain 2.0 specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>


===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

Revised ReplayGain specification

2026-01-22T19:18:11Z

Skamp: Clarified the confusion between ReplayGain having two different versions

DISPLAYTITLE

''This is a proposed update to the [[ReplayGain 1.0 specification]]. This proposal is currently '''Under Construction'''. Please discuss this proposal on the [[Talk:ReplayGain 2.0 specification|discussion page]] or the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio forum].'' --[[User:Notat|Notat]] 23:42, 8 October 2012 (CEST)

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system.

The original [http://wiki.hydrogenaudio.org/index.php?title=Replaygain ReplayGain 1.0 specification] described a loudness measurement system which included a weighting filter, root mean square (RMS) measurement and statistical processing that model human perception of loudness in both the frequency and time domains.

Since original ReplayGain proposal in 2001, the science, practice and standards for loudness normalization have been advanced significantly. The current industry standard approach to loudness measurement is described by the International Telecommunications Union<ref>http://www.itu.int/en/Pages/default.aspx</ref> (ITU) as BS.1770. The most recent version of this standard is known as ITU BS.1770-5<ref>https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en</ref> and was published in December 2023. The ITU work is freely available and is not believed to be encumbered by any patent issues. The ITU BS.1770-2 standard has been adopted in the United States by the [http://www.atsc.org ATSC] as [http://www.atsc.org/cms/standards/a_85-2011a.pdf A/85] and in Europe by the [http://www.ebu.ch European Broadcast Union] as [http://tech.ebu.ch/docs/tech/tech3343.pdf EBU R-128] for broadcast audio.

BS.1770 uses a "K-weighted" RMS measurement. This weighting function is significantly less complex than the inverted Fletcher-Munson weighting used by the original ReplayGain algorithm. A gating function designed measure the loudness of foreground components in the audio program. The gate in BS.1770 performs a similar function as the statistical processing in the original RG specification.

The computation required for BS.1770 loudness measurement is reduced compared to the original RG technique. Nevertheless, BS.1770 has been shown in several academic studies to be equally or more effective than the RG algorithm in modeling human loudness perception on music program as well as other material such as podcasts, television programs and movies.<ref>Paul Nygren. [http://www.speech.kth.se/prod/publications/files/3319.pdf Achieving equal loudness between audio files]. KTH Royal Institute of Technology</ref><ref>Martin Wolters; Harald Mundt; Jeffrey Riedmiller (May 2010). [http://www.aes.org/e-lib/browse.cfm?elib=15341 Loudness Normalization In The Age Of Portable Media Players]. Audio Engineering Society.</ref><ref>Esben Skovenborg; Søren H. Nielsen (October 2004). [http://web.archive.org/web/20120208024743/http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf Evaluation of Different Loudness Models with Music and Speech Material]. Audio Engineering Society. Archived from [http://www.tcelectronic.com/media/skovenborg_2004_loudness_m.pdf the original] on 2012-02-08.</ref>

Recent RG implementations use BS.1770 for loudness measurement. It is expected the ITU standard will evolve over time to meet the needs of broadcasters and governments. It is the intent of the ReplayGain community that RG follow any future backwards-compatible improvements to loudness measurement using the BS.1770 standard.

==Reference level==

Classic ReplayGain is calibrated to a pink noise reference signal with a RMS level 14 dB below a full-scale sinusoid. This reference signal is used to establish a reference level. ReplayGain will apply no gain or attenuation to the reference signal or any program material which has the same loudness measurements as the reference signal.

BS-1770 defines a loudness scale for program material. The units of BS.1770 loudness measurements are in Loudness Units [relative to] Full Scale (LUFS). LUFS can be treated like decibels.

In order to maintain backwards compatibility with classic RG, newer RG uses a -18 LUFS reference, which based on lots of music, can give similar loudness compared to classic RG.


==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{r}-L</math>
Where:
:<math>RG</math> is the replay gain adjustment in decibels,
:<math>L_{r}</math> is the -18 LUFS reference level
:<math>L</math> is the measured loudness of the audio file in LUFS.

Gain is positive if the loudness of the audio file is lower than the reference level. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than the reference level. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[ReplayGain 2.0 specification#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise an album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments''', same for other Ogg codecs
* .opus (Ogg Opus) – '''Vorbis comments''' available
** standard {{code|R128_TRACK_GAIN}} and {{code|R128_ALBUM_GAIN}} (MUST adjust for -23 LUFS) comment keys may be preferable (used by {{code|loudgain}})
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .asf, .wma (Windows Media audio) - '''Vorbis comments''' in Extended Content Description Object
** {{code|loudgain}} instead uses native ASF/WMA attributes (itself a key-value storage) via TagLib, which is more sensible
* .wav (Windows PCM) – No metadata support (use .bwf instead)
** ID3 RIFF chunk possible (used by {{code|loudgain}})
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain legacy metadata formats#ID3v2 RGAD|RGAD]] and [[ReplayGain legacy metadata formats#ID3v2 RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

===De-Facto extensions===
MusicBrainz Picard and [http://github.com/Moonbase59/loudgain#tags-written-andor-deleted LoudGain] also support the following additional tags, named using the same conventions:

{| class="wikitable"
|+Extension metadata keys
|-
!Metadata
!Key
!alue format
! Purpose
|-
|Track range
|REPLAYGAIN_TRACK_RANGE
|[-]a.bb dB
|rowspan=2 | Indicates dynamics (R-128 Loudness Range / LRA), may guide pre-amplification
|-
|Album range
|REPLAYGAIN_ALBUM_RANGE
|[-]a.bb dB
|-
|Reference loudness
|REPLAYGAIN_REFERENCE_LOUDNESS
|[-]a.bb LUFS
| Use alternative reference levels; change ref levels without re-scanning the file
|}

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[ReplayGain 2.0 specification#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 106/20, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history of commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>


===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

ReplayGain

2026-01-22T19:04:42Z

Skamp: fixed typo

'''ReplayGain''' is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the '''perceived''' loudness of audio data.

ReplayGain allows the loudness of each song within a collection of songs to be consistent. This is called 'Track Gain' (or 'Radio Gain' in earlier parlance). It also allows the loudness of a specific sub-collection (an "album") to be consistent with the rest of the collection, while allowing the dynamics from song to song on the album to remain intact. This is called 'Album Gain' (or 'Audiophile Gain' in earlier parlance). This is especially important when listening to classical music albums, because quiet tracks need to remain a certain degree quieter than the louder ones.

ReplayGain is different from [[Normalization|peak normalization]]. Peak normalization merely ensures that the peak amplitude reaches a certain level. This does not ensure equal loudness. The ReplayGain technique measures the ''effective power'' of the waveform (i.e. the RMS power after applying an "equal loudness contour"), and then adjusts the amplitude of the waveform accordingly. The result is that Replay Gained waveforms are usually more uniformly amplified than peak-normalized waveforms.

==Target loudness==
The target loudness of almost all ReplayGain utilities is 89 dB SPL when replayed in an SMPTE RP 200 calibrated system (an early departure from the proposal, endorsed by its author<ref>[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=83397&view=findpost&p=721854 Does Replay gain work differtly in Media monkey]</ref>) — the ReplayGain proposal and SMPTE recommendation are 6 dB lower.<ref>[http://www.mars.org/mailman/public/mad-dev/2004-February/000993.html ReplayGain discussion at mad-dev]</ref> The target loudness may be more commonly known and understood as '''-18''' '''[https://en.wikipedia.org/wiki/LUFS LUFS]''' (''Loudness Units relative to Full Scale'').

Some utilities have realized the inadequacies of the classic ReplayGain loudness calculation, switching to a more modern algorithm ([https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en ITU-R BS.1770]). However, the way it was integrated was extremely ''ad hoc'', at least until a draft of a [[ReplayGain 2.0 specification|revised specification]] started being written.

==Clipping==
Audio is generally recorded such that the loudest sounds don't clip, but the use of ReplayGain can cause clipping if the average volume of a song is below the target level. That is, upon playback, the volume of a quiet song is increased, so the parts of the song with above-average loudness, especially in the bass frequencies, will exceed the limits of the format and will be distorted. Whether this distortion is audible depends on the sounds in question, and the listener's sensitivity.

Implementations deal with the risk of clipping in different ways. Some have a "pre-amp" feature which reduces (or boosts) the original audio's level by a certain amount before doing whatever is needed for ReplayGain. Some have a "prevent clipping" feature to reduce the amount of ReplayGain adjustment to whatever amount would keep clipping from occurring, based on peak info stored in the file's metadata (thus reducing the effectiveness of ReplayGain). Some recommend using a compressor/limiter DSP to prevent or reduce clipping, regardless of whether it was caused by ReplayGain.

An alternative that may reduce the risk of clipping is the [https://tech.ebu.ch/docs/r/r128.pdf EBU R 128] recommendation of a '''-23''' '''LUFS''' target, although some may find the additional reduction in volume excessive, particularly if it leads to maxing out volume on user hardware. [[Opus]] in particular has adopted that recommendation.

== Implementations ==
There are different ReplayGain implementations, each with its own uses and strength. Most use [[metadata]] to indicate the level of the volume change that the player should make. Some modify the audio data itself, and optionally use metadata as well. There are advantages and disadvantages to both methods.

In the metadata method, information on both types of ReplayGain (Track Gain and Album Gain) can be stored. The volume-change information can be very precise. If audio data was also changed, the metadata can contain "undo" info. Not all audio players/decoders know how to read and use ReplayGain information stored in metadata. And there's no standard for where and how ReplayGain info is stored; each implementation uses different formats and puts the info in different locations.

In the audio data method, the file's actual audio data is modified so that its natural/default playback volume is at the target level. In this scenario, only one type of ReplayGain (Track Gain or Album Gain) can be applied. If no "undo" info is saved somewhere, it may not be possible to restore the original audio data. Limitations of the audio file format may prevent precise (finely tuned) gain adjustments with this method. For example, MP3 and AAC files can only be losslessly modified in 1.5 dB steps. Depending on the audio file format, the process may also be lossy in the sense that it could irreversibly push a signal above the format's maximum amplitude (resulting in clipping) or below the minimum (resulting in silence).

=== MP3Gain ===
[[MP3Gain]] is an implementation of ReplayGain. It can be used to just analyze files & recommend changes or to also modify the gain. If modifying the gain, it always modifies the global gain fields in the MP3 audio data. It can add somewhat precise metadata, including undo info. The gain can be modified to any target dB, or it can be changed by a specified amount. For balance correction, user-specified changes can even be made on just one channel in simple L/R stereo-mode files (not joint stereo).

* Format: [[MP3]]
* Method: Audio + Meta (in APE tag), or Audio only
* APE tag fields (ASCII bytes):
** <code>MP3GAIN_MINMAX ###,###</code> - minimum & maximum global gain values for this file. 3 digits, zero-padded if necessary.
** <code>MP3GAIN_ALBUM_MINMAX ###,###</code> - minimum & maximum global gain values across a set of files scanned as an album. Optional.
** <code>MP3GAIN_UNDO +###,+###,N</code> - the global gain adjustment to restore the original values in the left and right channels, respectively, followed by an indicator of whether to wrap at the extremes (<code>N</code> means no, <code>W</code> means yes). The adjustment values are 3 digits, zero-padded, preceded by a sign (<code>+</code> or <code>-</code>).
** <code>REPLAYGAIN_TRACK_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Examples: <code>+0.424046</code> and <code>-10.38500</code>
** <code>REPLAYGAIN_TRACK_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Example: <code>0.149923</code>
** <code>REPLAYGAIN_ALBUM_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Optional.
** <code>REPLAYGAIN_ALBUM_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Optional.
* Limitations: Although the metadata, if written, contains precise adjustment & peak values, the audio data modifications are limited to 1.5dB steps and may become irreversible (however, that's a very rare condition; see the [https://hydrogenaud.io/index.php/topic,34154.0.html "mp3gain is NOT lossless" forum thread])
* http://mp3gain.sourceforge.net/

=== AACGain ===
[[AACGain]] is a modified version of MP3Gain that works on both MP3 and AAC files.

* Format: [[MP3]], [[AAC]] (with or without MP4 container)
* Method: Audio + Meta, or Audio only
* Limitations: Limited to 1.5dB steps mode, may become irreversible (same caveat as for MP3Gain)
* http://aacgain.altosdesign.com/

=== [[LAME]] ===
* Method: Header ([http://gabriel.mp3-tech.org/mp3infotag.html mp3infotag])
* Notes:
** Tags added during encoding; not supported by any player yet; Track Gain only
** Replay Gaining MP3's is usually done using MP3Gain (see [[ReplayGain#MP3Gain|above]]) or [[ReplayGain#foobar2000 ReplayGain scanner|foobar2000]]
* http://lame.sourceforge.net/

=== [[Musepack]] ReplayGain ===
* Method: Header (similar to Meta data method)
* Notes: ReplayGain values are stored in the header and ReplayGain is part of the Musepack specifications; therefore any Musepack decoder that does not support ReplayGain can be considered broken.
* http://www.musepack.net/

=== VorbisGain ===
* Format: (Ogg) [[Vorbis]]
* Method: Meta (in [[Vorbis comment]])
* http://www.sjeng.org/vorbisgain.html
** new compiles of VorbisGain at [http://www.rarewares.org/ogg.html www.rarewares.org]
:'''''Note:''' Andavari has provided a very useful script to integrate VorbisGain, which is a CLI tool, into Windows Explorer. Please (Ogg) [[Vorbis#ReplayGain|check this section]].''

=== FLAC / METAFLAC ===
* Format: [[Free Lossless Audio Codec|FLAC]]
* Method: Meta (in [[Vorbis comment]])
* http://flac.sf.net

=== WavPack / WVGAIN ===
* Format: [[WavPack]]
* Method: Meta (in [[APEv2]] tag)
* http://www.wavpack.com

=== Wavegain ===
* Format: waveform
* Method: Audio
* Limitations: Irreversible
* http://www.rarewares.org/others.php#wavegain

=== MusicPlayer ===
* Custom implementation, not derived from the original MP3Gain one (but inspired from). As far as I know, all other implementations are directly derived from the MP3Gain (gain_analysis.c, which is GPL) source.
* Format: any that FFmpeg supports
* Method: Audio
* Limitations: Doesn't modify the files at all. Stores the value in own database. Used only for playback.
* https://github.com/albertz/music-player

=== [[foobar2000]] ReplayGain scanner ===
* Since v1.1.6, defaults to ITU-R BS.1770 analysis (although it labels it EBU R128), but can be configured to use the "Classic ReplayGain" algorithm instead. The ITU-R BS.1770 implementation uses a reference level of -18 LUFS instead of -23, in order to retain compatibility with the ReplayGain standard.
* Format:
** [[MP3]]: Values written to [[ID3v2]] (default) or [[APEv2]] tags. A separate function can be invoked to apply the tagged Track or Album Gain to the MP3 global gain fields (as MP3Gain does), and rewrite any existing tags to account for the peak change and compensate for the difference from 89 dB. The 89 dB reference level for tags isn't configurable, but the reference level applied to the global gain fields is (it's under Preferences > Advanced > Tools > ReplayGain Scanner > Target MP3 alteration volume level).
** [[Musepack]]: Values written to header.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags. As with MP3, it is also an option to apply gain via a separate function.
** [[MP4]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain).
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** Modules ([[MOD]] etc.): Optionally saved into [[APEv2]] tags.
* https://foobar2000.org/

=== [[MediaMonkey]] ===
* Format:
** [[MP3]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in MediaMonkey's MDB database.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[WAV]]: Values stored in MediaMonkey's MDB database.
** [[MPC]]: Internal gain Structure.
* In addition to tags, all ReplayGain values are also stored in MediaMonkey's MDB database
* Album/Audiophile ReplayGain not supported until v3.0 (Dec 2007); support during burning & ripping added in 3.1 (Jun 2009)
* Also capable of (irreversibly) changing the volume of MP3 tracks, similar to [[MP3Gain]]
* http://www.mediamonkey.com/

=== [[Winamp]] ReplayGain scanner===
* Format:
** [[MP3]]: Values written to [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in Windows Media Audio tags.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]
** [[TAK]]: Values written to [[APEv2]] tags.
* Support Album/Track Gain

=== [[loudgain]] ===
* Format:
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** MP2, [[MP3]]: Values written to [[ID3v2]] tags (ID3v2.3/ID3v2.4 selectable).
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** (Ogg) [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** (Ogg) [[Speex]]: Values written to [[Vorbis comment]].
** [[Opus]]: Values written to [[Vorbis comment]], based on -23 LUFS Opus standard. Only <code>R128_TRACK_GAIN</code> and <code>R128_ALBUM_GAIN</code> are written, but the calculated ''true peak'' value can still be used to reduce the gain values ([[Clipping]] prevention).
** [[MP4]], [[M4A]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain). ReplayGain values are stored under <code>----:com.apple.iTunes:…</code>. This is for [[AAC]] and [[ALAC]] in [[MPEG-4]] containers.
** [[ASF]], [[Windows Media Audio|WMA]]: Values written to WMA tags, no prefix.
** [[WAV]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] (ID3v2.3/ID3v2.4 selectable) format. Using the <code>bext</code> chunk (for BWF v2) isn’t (yet) supported, but won’t be destroyed on writing.
** [[Audio Interchange File Format|AIFF]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] format.
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[Monkey's Audio]] (APE): Values written to [[APEv2]] tags.
* Follows EBU R128, ITU BS.1770 and the [[ReplayGain 2.0 specification|revised ReplayGain specification]].
* ''Never'' touches the actual audio data but ''only writes RG2 tags''.
* Uses ''true peak'' values calculated by oversampling to 192 kHz, using a custom polyphase FIR interpolator that will oversample 4x for sample rates < 96 kHz, 2x for sample rates < 192 kHz and leave the signal unchanged for 192 kHz.
* ''Clipping prevention'' can be used to lower the ReplayGain values to a safe margin (default -1 dBTP, can be changed).
* Many options for special cases: force RG tags upper-/lowercase, add extra tags (LRA, Reference loudness), strip unwanted tag types (APEv2 from MP2/MP3, ID3 from WavPack), tab-delimited table output for analysis with CSV file.
* ''Linux'' Free and Open Source software, can be installed on ''MacOS'' using ''HomeBrew'', on ''Windows 10'' using the Linux ''bash''.
* Also installs a <code>rgbpm</code> bash script for mass-tagging, which can be adapted to the user’s needs.
* '''Warning:''' Loudgain relies on standard libraries like ''TagLib''. Linux distros (except rolling releases) sometimes deliver outdated libraries, so be sure you use the latest version of ''TagLib''. Version 1.11.1 had a nasty bug for a while that [https://hydrogenaud.io/index.php/topic,118085.msg974957.html#msg974957 could corrupt Ogg Vorbis files]. This has been fixed in the meantime but the TagLib version not updated. Loudgain comes with a (slower) static version called <code>loudgain.static</code> in the repo’s <code>/bin</code> folder that doesn’t expose the bug and can also be used on older Linux versions (like Ubuntu 14.04, Linux Mint 17).
* https://github.com/Moonbase59/loudgain
* Bug tracker: https://github.com/Moonbase59/loudgain/issues

=== [[rsgain]] ===
rsgain is a newer ReplayGain command line utility designed with a "batteries included" philosophy.

Features:
* Cross-platform Windows / macOS / Linux
* Supports all popular audio formats
* Simplified "Easy Mode" command line syntax supports recursive, directory-based scanning
* Multithreaded scanning option that provides significant speed improvement with full library scans
* Option to skip files with existing ReplayGain metadata
* Scan presets allow the user to save advanced settings for consistent use

== Players support ==
ReplayGain being present in the specs of the FLAC, Musepack, and APE formats, any player that support those formats usually supports ReplayGain.

The situation with MP3 is rather different, as it was not part of the MP3 specs. The APEv2 tags metadata implementation is somewhat becoming the de-facto standard.

=== Windows ===
* [[foobar2000]] supports ReplayGain in all possible aspects.
* [[Winamp]] supports ReplayGain in album or track mode.
* [[MediaMonkey]] supports ReplayGain, with many configuration options.
* [[XMPlay]] recently implemented ReplayGain
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.

''...and probably others.''

=== Linux ===
* [[XMMS]]. Reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], [[Musepack]], (Ogg) [[Vorbis]] ..
:For [[MP3]], use the CVS version of the [http://xmms-mad.sourceforge.net/ xmms-mad] mp3 plugin (it's not yet released as binary, furthermore not available in distribs' versions for now. Meanwhile binaries are available here: [http://perso.crans.org/~krempp/xmms-mad/ custom binaries])
* [[amarok]]. By using the amarok-script [http://kde-apps.org/content/show.php?content=26073 ReplayGain]
:And possibly others, since [http://developer.kde.org/~wheeler/taglib.html TagLib] added support for [[APEv2]] tags in [[MP3]] files, players using this library (like [[amaroK]] and [[JuK]]) might support that kind of ReplayGain tags in the near future.
* [http://www.sacredchao.net/quodlibet Quod Libet] reads ReplayGain from (Ogg) [[Vorbis]], [[MP3]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:Requires support to be enabled (via the appropriate python bindings and libraries) for the above formats. Does not support ReplayGain values stored in [[APEv2]] tags in [[MP3]]s. ReplayGain values are stored in RVA2 id3v2.4 frames. See the [http://www.sacredchao.net/quodlibet/wiki/Development/ID3Notes Quod Libet RVA2 / ReplayGain notes].
* [http://www.musicpd.org/ Music Player Daemon] (MPD) reads ReplayGain from (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:foobar2000-style TXXX frames in [[MP3]]s are also supported in the latest development releases.
* [http://www.mplayerhq.hu/ MPlayer]. Mplayer support for ReplayGain is codec dependent.
:Codecs that are known to support ReplayGain: vorbis
:Because of this, you need to prioritize the codecs that support it, or choose it individually on the command line. To add it to the command line, add an -ac [codec] option after each file that you want to choose the codec for, or at the beginning to make it apply to all files listed. To prioritize the codecs by default, list them in a line in mplayer.conf:
ac=[codec],[othercodec],vorbis,mad,
* [http://idjc.sourceforge.net/ IDJC] (Internet DJ Console) reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], (Ogg) FLAC, (Ogg) [[Vorbis]], MP2 (audio), [[MP3]], [[Opus]], but only the ''lowercase'' tags. There is a [https://sourceforge.net/p/idjc/bugs/100/ ticket] open to handle tags case-insensitively.
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.
* [https://www.videolan.org/vlc/ VLC] supports ReplayGain in many file formats, but usually only the ''uppercase'' variant of the tags.
* [https://kodi.tv/ KODI] reads ReplayGain from nearly all formats, but usually only the ''lowercase'' variant of the tags.

=== Portable devices ===
[http://www.rockbox.org/ Rockbox] supports ReplayGain (in album or track mode) for most formats, including WMA, MP1/2/3, AAC, ALAC, Musepack, Monkey's Audio, Wavpack, FLAC and Vorbis. Note that ReplayGain is only supported when using the respective codec's native tagging format. For example: ReplayGain stored in APEv2 tags is not supported for MP3, rather ID3v2.x tags are expected.

Sandisk Sansa Fuze with firmware 1.02.26 and 2.02.26

Sandisk Sansa Clip+

The iPod features ''Soundcheck'', which seems to produce roughly the same normalization gains as ReplayGain, but doesn't provide an Album Gain.

=== Hi-Fi ===
Slim Devices, a company owned by Logitech Inc, supports ReplayGain on both of their hi-end audiophile players, known as the [[Slim Devices Transporter|Transporter]] and the [[Slim Devices Squeezebox|Squeezebox]].

BluOS also supports ReplayGain with the selection of album- or track-gain and a so called Smart option that decides between the two by itself.
NAD devices that use BluOS consequently also support ReplayGain.

==Notes==
<references/>

== See also ==
* [[ReplayGain specification]]

== External links ==
* [http://en.wikipedia.org/wiki/Replay_Gain ReplayGain] at Wikipedia
* [http://www.bobulous.org.uk/misc/Replay-Gain.html ReplayGain using foobar2000] (how to use ReplayGain in Windows using foobar2000).
* [http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html ReplayGain in Linux] (how to use ReplayGain in Linux using foobar2000 and Wine, or using metaflac or vorbisgain).

[[index.php?title=Category:Technical]]
[[index.php?title=Category:Metadata]]

ReplayGain

2026-01-22T19:01:38Z

Skamp: Clarified the differences between the new ITU-R BS.1770 algorithm and the EBU R 128 recommendations. Addition of non breakable spaces between numbers and units.

'''ReplayGain''' is the name of a technique invented to achieve the same perceived playback loudness of audio files. It defines an algorithm to measure the '''perceived''' loudness of audio data.

ReplayGain allows the loudness of each song within a collection of songs to be consistent. This is called 'Track Gain' (or 'Radio Gain' in earlier parlance). It also allows the loudness of a specific sub-collection (an "album") to be consistent with the rest of the collection, while allowing the dynamics from song to song on the album to remain intact. This is called 'Album Gain' (or 'Audiophile Gain' in earlier parlance). This is especially important when listening to classical music albums, because quiet tracks need to remain a certain degree quieter than the louder ones.

ReplayGain is different from [[Normalization|peak normalization]]. Peak normalization merely ensures that the peak amplitude reaches a certain level. This does not ensure equal loudness. The ReplayGain technique measures the ''effective power'' of the waveform (i.e. the RMS power after applying an "equal loudness contour"), and then adjusts the amplitude of the waveform accordingly. The result is that Replay Gained waveforms are usually more uniformly amplified than peak-normalized waveforms.

==Target loudness==
The target loudness of almost all ReplayGain utilities is 89 dB SPL when replayed in an SMPTE RP 200 calibrated system (an early departure from the proposal, endorsed by its author<ref>[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=83397&view=findpost&p=721854 Does Replay gain work differtly in Media monkey]</ref>) — the ReplayGain proposal and SMPTE recommendation are 6 dB lower.<ref>[http://www.mars.org/mailman/public/mad-dev/2004-February/000993.html ReplayGain discussion at mad-dev]</ref> The target loudness may be more commonly known and understood as '''-18''' '''[https://en.wikipedia.org/wiki/LUFS LUFS]''' (''Loudness Units relative to Full Scale'').

Some utilities have realized the inadequacies of the classic ReplayGain loudness calculation, switching to a more modern algorithm ([https://www.itu.int/rec/R-REC-BS.1770-5-202311-I/en ITU-R BS.1770]). However, the way it was integrated was extremely ''ad hoc'', at least until a draft of a [[ReplayGain 2.0 specification|revised specification]] started being written.

==Clipping==
Audio is generally recorded such that the loudest sounds don't clip, but the use of ReplayGain can cause clipping if the average volume of a song is below the target level. That is, upon playback, the volume of a quiet song is increased, so the parts of the song with above-average loudness, especially in the bass frequencies, will exceed the limits of the format and will be distorted. Whether this distortion is audible depends on the sounds in question, and the listener's sensitivity.

Implementations deal with the risk of clipping in different ways. Some have a "pre-amp" feature which reduces (or boosts) the original audio's level by a certain amount before doing whatever is needed for ReplayGain. Some have a "prevent clipping" feature to reduce the amount of ReplayGain adjustment to whatever amount would keep clipping from occurring, based on peak info stored in the file's metadata (thus reducing the effectiveness of ReplayGain). Some recommend using a compressor/limiter DSP to prevent or reduce clipping, regardless of whether it was caused by ReplayGain.

An alternative that may reduce the risk of clipping is the [https://tech.ebu.ch/docs/r/r128.pdf EBU R 128] recommendation of a '''-23''' '''LUFS''' target, although some may find the additional reduction in volume excessive, particularly if it leads to maxing out volume on user hardware. [[Opus]] in particular has adopted that recommendation.

== Implementations ==
There are different ReplayGain implementations, each with its own uses and strength. Most use [[metadata]] to indicate the level of the volume change that the player should make. Some modify the audio data itself, and optionally use metadata as well. There are advantages and disadvantages to both methods.

In the metadata method, information on both types of ReplayGain (Track Gain and Album Gain) can be stored. The volume-change information can be very precise. If audio data was also changed, the metadata can contain "undo" info. Not all audio players/decoders know how to read and use ReplayGain information stored in metadata. And there's no standard for where and how ReplayGain info is stored; each implementation uses different formats and puts the info in different locations.

In the audio data method, the file's actual audio data is modified so that its natural/default playback volume is at the target level. In this scenario, only one type of ReplayGain (Track Gain or Album Gain) can be applied. If no "undo" info is saved somewhere, it may not be possible to restore the original audio data. Limitations of the audio file format may prevent precise (finely tuned) gain adjustments with this method. For example, MP3 and AAC files can only be losslessly modified in 1.5 dB steps. Depending on the audio file format, the process may also be lossy in the sense that it could irreversibly push a signal above the format's maximum amplitude (resulting in clipping) or below the minimum (resulting in silence).

=== MP3Gain ===
[[MP3Gain]] is an implementation of ReplayGain. It can be used to just analyze files & recommend changes or to also modify the gain. If modifying the gain, it always modifies the global gain fields in the MP3 audio data. It can add somewhat precise metadata, including undo info. The gain can be modified to any target dB, or it can be changed by a specified amount. For balance correction, user-specified changes can even be made on just one channel in simple L/R stereo-mode files (not joint stereo).

* Format: [[MP3]]
* Method: Audio + Meta (in APE tag), or Audio only
* APE tag fields (ASCII bytes):
** <code>MP3GAIN_MINMAX ###,###</code> - minimum & maximum global gain values for this file. 3 digits, zero-padded if necessary.
** <code>MP3GAIN_ALBUM_MINMAX ###,###</code> - minimum & maximum global gain values across a set of files scanned as an album. Optional.
** <code>MP3GAIN_UNDO +###,+###,N</code> - the global gain adjustment to restore the original values in the left and right channels, respectively, followed by an indicator of whether to wrap at the extremes (<code>N</code> means no, <code>W</code> means yes). The adjustment values are 3 digits, zero-padded, preceded by a sign (<code>+</code> or <code>-</code>).
** <code>REPLAYGAIN_TRACK_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Examples: <code>+0.424046</code> and <code>-10.38500</code>
** <code>REPLAYGAIN_TRACK_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Example: <code>0.149923</code>
** <code>REPLAYGAIN_ALBUM_GAIN +#.###### dB</code> - The value is always 9 characters including the sign and decimal point. Optional.
** <code>REPLAYGAIN_ALBUM_PEAK #.###### dB</code> - The value is always 8 characters including the decimal point. Optional.
* Limitations: Although the metadata, if written, contains precise adjustment & peak values, the audio data modifications are limited to 1.5dB steps and may become irreversible (however, that's a very rare condition; see the [https://hydrogenaud.io/index.php/topic,34154.0.html "mp3gain is NOT lossless" forum thread])
* http://mp3gain.sourceforge.net/

=== AACGain ===
[[AACGain]] is a modified version of MP3Gain that works on both MP3 and AAC files.

* Format: [[MP3]], [[AAC]] (with or without MP4 container)
* Method: Audio + Meta, or Audio only
* Limitations: Limited to 1.5dB steps mode, may become irreversible (same caveat as for MP3Gain)
* http://aacgain.altosdesign.com/

=== [[LAME]] ===
* Method: Header ([http://gabriel.mp3-tech.org/mp3infotag.html mp3infotag])
* Notes:
** Tags added during encoding; not supported by any player yet; Track Gain only
** Replay Gaining MP3's is usually done using MP3Gain (see [[ReplayGain#MP3Gain|above]]) or [[ReplayGain#foobar2000 ReplayGain scanner|foobar2000]]
* http://lame.sourceforge.net/

=== [[Musepack]] ReplayGain ===
* Method: Header (similar to Meta data method)
* Notes: ReplayGain values are stored in the header and ReplayGain is part of the Musepack specifications; therefore any Musepack decoder that does not support ReplayGain can be considered broken.
* http://www.musepack.net/

=== VorbisGain ===
* Format: (Ogg) [[Vorbis]]
* Method: Meta (in [[Vorbis comment]])
* http://www.sjeng.org/vorbisgain.html
** new compiles of VorbisGain at [http://www.rarewares.org/ogg.html www.rarewares.org]
:'''''Note:''' Andavari has provided a very useful script to integrate VorbisGain, which is a CLI tool, into Windows Explorer. Please (Ogg) [[Vorbis#ReplayGain|check this section]].''

=== FLAC / METAFLAC ===
* Format: [[Free Lossless Audio Codec|FLAC]]
* Method: Meta (in [[Vorbis comment]])
* http://flac.sf.net

=== WavPack / WVGAIN ===
* Format: [[WavPack]]
* Method: Meta (in [[APEv2]] tag)
* http://www.wavpack.com

=== Wavegain ===
* Format: waveform
* Method: Audio
* Limitations: Irreversible
* http://www.rarewares.org/others.php#wavegain

=== MusicPlayer ===
* Custom implementation, not derived from the original MP3Gain one (but inspired from). As far as I know, all other implementations are directly derived from the MP3Gain (gain_analysis.c, which is GPL) source.
* Format: any that FFmpeg supports
* Method: Audio
* Limitations: Doesn't modify the files at all. Stores the value in own database. Used only for playback.
* https://github.com/albertz/music-player

=== [[foobar2000]] ReplayGain scanner ===
* Since v1.1.6, defaults to ITU-R BS.1770 analysis (although it labels is EBU R128), but can be configured to use the "Classic ReplayGain" algorithm instead. The ITU-R BS.1770 implementation uses a reference level of -18 LUFS instead of -23, in order to retain compatibility with the ReplayGain standard.
* Format:
** [[MP3]]: Values written to [[ID3v2]] (default) or [[APEv2]] tags. A separate function can be invoked to apply the tagged Track or Album Gain to the MP3 global gain fields (as MP3Gain does), and rewrite any existing tags to account for the peak change and compensate for the difference from 89 dB. The 89 dB reference level for tags isn't configurable, but the reference level applied to the global gain fields is (it's under Preferences > Advanced > Tools > ReplayGain Scanner > Target MP3 alteration volume level).
** [[Musepack]]: Values written to header.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags. As with MP3, it is also an option to apply gain via a separate function.
** [[MP4]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain).
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** Modules ([[MOD]] etc.): Optionally saved into [[APEv2]] tags.
* https://foobar2000.org/

=== [[MediaMonkey]] ===
* Format:
** [[MP3]]: Values written to [[APEv2]] or [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in MediaMonkey's MDB database.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[WAV]]: Values stored in MediaMonkey's MDB database.
** [[MPC]]: Internal gain Structure.
* In addition to tags, all ReplayGain values are also stored in MediaMonkey's MDB database
* Album/Audiophile ReplayGain not supported until v3.0 (Dec 2007); support during burning & ripping added in 3.1 (Jun 2009)
* Also capable of (irreversibly) changing the volume of MP3 tracks, similar to [[MP3Gain]]
* http://www.mediamonkey.com/

=== [[Winamp]] ReplayGain scanner===
* Format:
** [[MP3]]: Values written to [[ID3v2]] tags.
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** [[WMA]]: Values stored in Windows Media Audio tags.
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** [[APE]]: Values written to [[APEv2]] tags.
** [[AAC]]: Values written to [[APEv2]] tags.
** [[MP4]]
** [[TAK]]: Values written to [[APEv2]] tags.
* Support Album/Track Gain

=== [[loudgain]] ===
* Format:
** [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** MP2, [[MP3]]: Values written to [[ID3v2]] tags (ID3v2.3/ID3v2.4 selectable).
** (Ogg) [[Vorbis]]: Values written to [[Vorbis comment]].
** (Ogg) [[Free Lossless Audio Codec|FLAC]]: Values written to [[Vorbis comment]].
** (Ogg) [[Speex]]: Values written to [[Vorbis comment]].
** [[Opus]]: Values written to [[Vorbis comment]], based on -23 LUFS Opus standard. Only <code>R128_TRACK_GAIN</code> and <code>R128_ALBUM_GAIN</code> are written, but the calculated ''true peak'' value can still be used to reduce the gain values ([[Clipping]] prevention).
** [[MP4]], [[M4A]]: Uses its own iTunes-compatible tagging system (though iTunes does not support ReplayGain). ReplayGain values are stored under <code>----:com.apple.iTunes:…</code>. This is for [[AAC]] and [[ALAC]] in [[MPEG-4]] containers.
** [[ASF]], [[Windows Media Audio|WMA]]: Values written to WMA tags, no prefix.
** [[WAV]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] (ID3v2.3/ID3v2.4 selectable) format. Using the <code>bext</code> chunk (for BWF v2) isn’t (yet) supported, but won’t be destroyed on writing.
** [[Audio Interchange File Format|AIFF]]: Values written to the <code>ID3 </code> chunk, in [[ID3v2]] format.
** [[WavPack]]: Values written to [[APEv2]] tags.
** [[Monkey's Audio]] (APE): Values written to [[APEv2]] tags.
* Follows EBU R128, ITU BS.1770 and the [[ReplayGain 2.0 specification|revised ReplayGain specification]].
* ''Never'' touches the actual audio data but ''only writes RG2 tags''.
* Uses ''true peak'' values calculated by oversampling to 192 kHz, using a custom polyphase FIR interpolator that will oversample 4x for sample rates < 96 kHz, 2x for sample rates < 192 kHz and leave the signal unchanged for 192 kHz.
* ''Clipping prevention'' can be used to lower the ReplayGain values to a safe margin (default -1 dBTP, can be changed).
* Many options for special cases: force RG tags upper-/lowercase, add extra tags (LRA, Reference loudness), strip unwanted tag types (APEv2 from MP2/MP3, ID3 from WavPack), tab-delimited table output for analysis with CSV file.
* ''Linux'' Free and Open Source software, can be installed on ''MacOS'' using ''HomeBrew'', on ''Windows 10'' using the Linux ''bash''.
* Also installs a <code>rgbpm</code> bash script for mass-tagging, which can be adapted to the user’s needs.
* '''Warning:''' Loudgain relies on standard libraries like ''TagLib''. Linux distros (except rolling releases) sometimes deliver outdated libraries, so be sure you use the latest version of ''TagLib''. Version 1.11.1 had a nasty bug for a while that [https://hydrogenaud.io/index.php/topic,118085.msg974957.html#msg974957 could corrupt Ogg Vorbis files]. This has been fixed in the meantime but the TagLib version not updated. Loudgain comes with a (slower) static version called <code>loudgain.static</code> in the repo’s <code>/bin</code> folder that doesn’t expose the bug and can also be used on older Linux versions (like Ubuntu 14.04, Linux Mint 17).
* https://github.com/Moonbase59/loudgain
* Bug tracker: https://github.com/Moonbase59/loudgain/issues

=== [[rsgain]] ===
rsgain is a newer ReplayGain command line utility designed with a "batteries included" philosophy.

Features:
* Cross-platform Windows / macOS / Linux
* Supports all popular audio formats
* Simplified "Easy Mode" command line syntax supports recursive, directory-based scanning
* Multithreaded scanning option that provides significant speed improvement with full library scans
* Option to skip files with existing ReplayGain metadata
* Scan presets allow the user to save advanced settings for consistent use

== Players support ==
ReplayGain being present in the specs of the FLAC, Musepack, and APE formats, any player that support those formats usually supports ReplayGain.

The situation with MP3 is rather different, as it was not part of the MP3 specs. The APEv2 tags metadata implementation is somewhat becoming the de-facto standard.

=== Windows ===
* [[foobar2000]] supports ReplayGain in all possible aspects.
* [[Winamp]] supports ReplayGain in album or track mode.
* [[MediaMonkey]] supports ReplayGain, with many configuration options.
* [[XMPlay]] recently implemented ReplayGain
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.

''...and probably others.''

=== Linux ===
* [[XMMS]]. Reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], [[Musepack]], (Ogg) [[Vorbis]] ..
:For [[MP3]], use the CVS version of the [http://xmms-mad.sourceforge.net/ xmms-mad] mp3 plugin (it's not yet released as binary, furthermore not available in distribs' versions for now. Meanwhile binaries are available here: [http://perso.crans.org/~krempp/xmms-mad/ custom binaries])
* [[amarok]]. By using the amarok-script [http://kde-apps.org/content/show.php?content=26073 ReplayGain]
:And possibly others, since [http://developer.kde.org/~wheeler/taglib.html TagLib] added support for [[APEv2]] tags in [[MP3]] files, players using this library (like [[amaroK]] and [[JuK]]) might support that kind of ReplayGain tags in the near future.
* [http://www.sacredchao.net/quodlibet Quod Libet] reads ReplayGain from (Ogg) [[Vorbis]], [[MP3]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:Requires support to be enabled (via the appropriate python bindings and libraries) for the above formats. Does not support ReplayGain values stored in [[APEv2]] tags in [[MP3]]s. ReplayGain values are stored in RVA2 id3v2.4 frames. See the [http://www.sacredchao.net/quodlibet/wiki/Development/ID3Notes Quod Libet RVA2 / ReplayGain notes].
* [http://www.musicpd.org/ Music Player Daemon] (MPD) reads ReplayGain from (Ogg) [[Vorbis]], [[Free Lossless Audio Codec|FLAC]], and [[Musepack]].
:foobar2000-style TXXX frames in [[MP3]]s are also supported in the latest development releases.
* [http://www.mplayerhq.hu/ MPlayer]. Mplayer support for ReplayGain is codec dependent.
:Codecs that are known to support ReplayGain: vorbis
:Because of this, you need to prioritize the codecs that support it, or choose it individually on the command line. To add it to the command line, add an -ac [codec] option after each file that you want to choose the codec for, or at the beginning to make it apply to all files listed. To prioritize the codecs by default, list them in a line in mplayer.conf:
ac=[codec],[othercodec],vorbis,mad,
* [http://idjc.sourceforge.net/ IDJC] (Internet DJ Console) reads ReplayGain from [[Free Lossless Audio Codec|FLAC]], (Ogg) FLAC, (Ogg) [[Vorbis]], MP2 (audio), [[MP3]], [[Opus]], but only the ''lowercase'' tags. There is a [https://sourceforge.net/p/idjc/bugs/100/ ticket] open to handle tags case-insensitively.
* [https://picard.musicbrainz.org/ MusicBrainz Picard] is a tagger (and player) that tags using metadata from the MusicBrainz.org database. Picard supports ReplayGain tags for files tagged with APE, ASF, ID3, MP4 and Vorbis tags. There is a ReplayGain plugin that can be used to calculate the ReplayGain values for both Albums and Tracks.
* [https://www.videolan.org/vlc/ VLC] supports ReplayGain in many file formats, but usually only the ''uppercase'' variant of the tags.
* [https://kodi.tv/ KODI] reads ReplayGain from nearly all formats, but usually only the ''lowercase'' variant of the tags.

=== Portable devices ===
[http://www.rockbox.org/ Rockbox] supports ReplayGain (in album or track mode) for most formats, including WMA, MP1/2/3, AAC, ALAC, Musepack, Monkey's Audio, Wavpack, FLAC and Vorbis. Note that ReplayGain is only supported when using the respective codec's native tagging format. For example: ReplayGain stored in APEv2 tags is not supported for MP3, rather ID3v2.x tags are expected.

Sandisk Sansa Fuze with firmware 1.02.26 and 2.02.26

Sandisk Sansa Clip+

The iPod features ''Soundcheck'', which seems to produce roughly the same normalization gains as ReplayGain, but doesn't provide an Album Gain.

=== Hi-Fi ===
Slim Devices, a company owned by Logitech Inc, supports ReplayGain on both of their hi-end audiophile players, known as the [[Slim Devices Transporter|Transporter]] and the [[Slim Devices Squeezebox|Squeezebox]].

BluOS also supports ReplayGain with the selection of album- or track-gain and a so called Smart option that decides between the two by itself.
NAD devices that use BluOS consequently also support ReplayGain.

==Notes==
<references/>

== See also ==
* [[ReplayGain specification]]

== External links ==
* [http://en.wikipedia.org/wiki/Replay_Gain ReplayGain] at Wikipedia
* [http://www.bobulous.org.uk/misc/Replay-Gain.html ReplayGain using foobar2000] (how to use ReplayGain in Windows using foobar2000).
* [http://www.bobulous.org.uk/misc/Replay-Gain-in-Linux.html ReplayGain in Linux] (how to use ReplayGain in Linux using foobar2000 and Wine, or using metaflac or vorbisgain).

[[index.php?title=Category:Technical]]
[[index.php?title=Category:Metadata]]

LossyWAV

2013-10-22T10:13:57Z

Skamp: /* Linux / OS X support: lossyWAV and WINE */ updated caudec URL

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.3.0
| preview_release = <none>
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 1.3.0 release thread] [http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 1.3.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF_WAVE|WAV]] file format. Proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least_significant_bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512 --'''merge-blocks'''
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaudio.org/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.3.0, Copyright (C) 2007-2011 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, not fully transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10; E=7.5; H=5; S=2.5; C=0; P=-2.5; X=-5).
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --adaptive <n/t> modify settings for Sebastian Gesemann's adaptive noise
shaping method. takes a parameter to set the order of the
FIR filter, (32<=n<=96; default=64; multiple of 8 only);
"OFF" to disable adaptive shaping; "NOWARP" to disable
default frequency warping;
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=6; default=3,
i.e. 32, 64 & 1024 samples. n=2, remove 32 sample FFT;
n>3 add 512; n>4, add 256; n>6, add 128) nb. FFT lengths.
stated are for 44.1/48kHz audio, higher sample rates will
automatically increase all FFT lengths as required.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(10000<=n<=20000; default=16000).
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0<=n<=16; default=3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
--scale <n> factor to scale audio by; (0.0625<n<=8.0; default=1).
-s, --shaping [n] enable fixed noise shaping, takes optional parameter [n]
to allow user defined shaping proportion (0.0<=n<=1.0),
otherwise default to quality setting dependent value.
Disables adaptive noise shaping.
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
7<=n<=28, limited to bits-per-sample - 4).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist enable frequency analysis display of input data.
-H, --histogram show sample value histogram (input, lossy and correction).
--longdist show long frequency distribution data (input/lossy/lwcdf).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%1" flac -d "%1" --stdout --silent|lossywav - --stdout --standard --stdinname "%1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag --fromfile "%1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===Linux / OS X support: lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c). The latest stable version (1.3.0 at the time of writing) is fully supported.

[http://caudec.net/ caudec] is a command-line tool that can encode and decode lossyWAV files (lossyFLAC, lossyWV, lossyTAK), using the official binary (lossyWAV.exe) with Wine (see: [http://caudec.net/documentation/windowscodecs/ installation instructions]). Caudec can also test file integrity and compute (and tag) Replaygain data. While it hasn't been tested at the time of writing, it is possible that lossyWAV support in caudec works on OS X as well.

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.flac
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is : lossless or hybrid
Highest BPS mode supported: 24 </pre>

lossyTAK settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.tak
Parameters : /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wv
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wma
Parameters : /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|
c:\"program files"\bin\wmaencode - %d --codec lsl --ignorelength
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Will it low-pass filter my audio?
*'''Short answer:''' No. The frequency limit is for the analysis only. LossyWAV cannot low-pass filter your audio.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[Category:Software]]

TAK

2013-10-22T10:13:11Z

Skamp: /* Linux */ updated caudec URL

{{Codec Infobox
| name = Tom's lossless Audio Kompressor
| logo =
| type = lossless
| purpose = lossless audio compression.
| maintainer = Thomas Becker
| recommended_encoder = TAK encoder
| recommended_text = TAK v2.3.0
| website = [http://thbeck.de/Tak/Tak.html ThBeck.de/Tak/Tak.html] ''(german)''
}}

== Description ==
'''Tom's lossless Audio Kompressor''' ('''TAK''') is a lossless audio compressor which promises compression performance similar to [[Monkey's Audio]] “High” and decompression speed similar to [[Free Lossless Audio Codec|FLAC]].

=== Features ===
* High compression
* Fast compression and decompression speed
* Streaming support (necessary headers for decompressing the audio are written to the stream every 2 seconds)
* Piping support for encoding
* Error tolerance (single bit error will never affect more than 250 ms)
* Error detection (each frame protected by a 24-bit checksum (CRC))
* High-resolution (up to 24-bit/channel) audio support
* Support for up to 192 Khz Audio
* Seeking without seek table
* APEv2 tags supported at end of file

=== Pros ===
* Fast encoding speed (while providing better compression TAK encodes as fast as [[Free Lossless Audio Codec|FLAC]] -8 in TAK's “Insane” and several times faster in “Turbo” mode)
* Fast decompression speed (on par with FLAC / [[WavPack]])
* Good compression levels (on par with [[Monkey's Audio]] High)
* Error Robustness
* Fast Seeking

=== Cons ===
* Closed Source
* No hardware support

== Software support ==
=== Windows ===
* [http://thbeck.de/Download/TAK_2.3.0.zip TAK 2.3.0] - Official release which consists of a CLI, a GUI, a [[Winamp]] plugin, the SDK, and the decoding library.
* [http://www.foobar2000.org/components/view/foo_input_tak TAK Decoder 0.4.7] - Plugin for [[foobar2000]] (supports tagging and [[ReplayGain]]).
* [http://www.liviocavallo.altervista.org/ dsfTAKSource 0.0.1.6] - DirectShow source filter which uses the official decoding library to play TAK-files in Windows Media Player, Media Player Classic - Home Cinema, Zoom Player and alike.
* [http://reino.degeelebosch.nl/ DC-Bass Source Mod 1.5.2.0] - DirectShow source filter which uses the official decoding library to play TAK-files, amongst many others, in any DirectShow media player (as mentioned above).
* [http://code.google.com/p/lavfilters/ LAV Filters] - Set of open-source DirectShow filters which uses [http://www.hydrogenaudio.org/forums/index.php?showtopic=96976&view=findpost&p=810355 FFMpeg's reverse-engineered decoder] to play TAK-files in any DirectShow media player.
* [http://sourceforge.net/projects/mpcbe/ Media Player Classic - BE] - DirectShow media player with an internal TAK source filter which uses FFMpeg's reverse-engineered decoder to play TAK-files. The internal TAK source filter also supports embedded cue-sheets.
* [[Mp3tag]] – universal tag editor with support for TAK
* [http://etree.org/shnutils/shntool/ shntool] (since version 3.0.6)

=== Linux ===
* ffmpeg can demux, decode and parse TAK since commit d7a473926504e2acfa6ae3bead0938e1f4e03441:[http://git.videolan.org/?p=ffmpeg.git;a=commit;h=d7a473926504e2acfa6ae3bead0938e1f4e03441]. First official release that supports TAK decoding is 1.1.
* The GUI program (Tak.exe) and the command-line program (Takc.exe) work with [http://www.winehq.org/ Wine].
* [http://caudec.net/ caudec] is a command-line tool that can encode and decode TAK files, using the official binary (Takc.exe) with Wine (see: [http://caudec.net/documentation/windowscodecs/ installation instructions]). Caudec can also test file integrity and compute (and tag) Replaygain data. While it hasn't been tested at the time of writing, it is possible that TAK support in caudec works on OS X as well.

== Hardware support ==
* None

== Recommended Settings ==
* Default compression: “-p2” (formerly ''Normal'') is the most attractive setting, providing an excellent compromise between compression and encoding speed. (At compression levels close to [[Monkey's Audio]] High (<0.4% difference), it is able to encode more quickly.)
takc -e [input file]
* Highest compression: “-pMax” (same as -p4m) (This will create files which are comparable in size to file created using [[Monkey's Audio]] High. Decompression speed is comparable to [[WavPack]] Normal.)
takc -e -pMax [input file]
* Fastest compression: “-p0” (This will create files which are comparable in size to [[Monkey's Audio]] Fast or [[WavPack]] High. Decompression speed is comparable to [[Free Lossless Audio Codec|FLAC]] 0.)
takc -e -p0 [input file]

=== TAK Performance Graph ===
[[Image:TAK_performance_graph_1-0-4.png|frame|center|Graph showing encoding and decoding rate against compression, using data from Synthetic Soul's test on TAK 1.0.4 (see [[TAK#External Links|External Links]])]]

== Using TAK ==
=== TAK with [[foobar2000]] ===
* Copy the takc.exe to your [[foobar2000]] directory
* Go to File → Preferences → Tools → Converter
* Set it up as shown:
[[Image:Tak_foobar_converter.png|frame|center|Screenshot of foobar 0.9.5 Converter settings for TAK 1.0.3]]
'''Note:''' replace -p2 with the desired compression level.

* TAK introduced encoding from STDIN in version 1.0.3, eliminating the need for a temporary file and greatly improving overall compression time. If you are using an earlier version of TAK use the following command line instead:
-e -p2 %s %d
* Use [[APEv2 specification|APEv2]] tagging (will be used as internal tagging)

=== TAK with EAC ===
Please read the [[EAC and TAK|wiki guide]], which details how to create TAK files with [[Exact Audio Copy|EAC]].

=== Converting TAK using pipe ===
Takc.exe -d input.tak - | lame.exe -V 6 - output.mp3
Takc.exe -d input.tak - | opusenc.exe --bitrate 64 - output.opus
Takc.exe -d input.tak - | flac.exe -8 - -o output.flac
Takc.exe -d input.tak - | wavpack.exe -hhx - output.wv

flac.exe -dc input.flac | Takc.exe -e -pMax - output.tak
wvunpack.exe input.wv - | Takc.exe -e -pMax - output.tak
ffmpeg.exe -i input.xxx -f wav - | Takc.exe -e -pMax '''-ihs''' - output.tak

== Future Features ==
* Unicode support
* MD5 audio checksums for verification and identification
* A German version
* Embedded cue sheets
* Embedded cover art
* Multichannel audio

== Frequently Asked Questions ==
; Is the codec safe for use/definitely lossless?
: Yes, TAK is verified as being lossless, as determined through rigorous testing by the author and satisfied users. To check, convert a WAV to TAK and back and compare the two, for instance using [[Foobar2000:Foobar2000|foobar2000]]'s [[Foobar2000:Components/Binary Comparator (foo_bitcompare)|Binary Comparator]].
; Why should I use TAK?
: TAK offers high ratios of compression but also great decoding speeds.
; What can I compress with TAK?
: TAK 1.0 can compress any integer-format (up to 24 bits per channel) PCM RIFF WAVE file (.WAV). Piping support is implemented as of v1.0.3, so converting lossless files to WAV first is not necessary: users can simply pipe the decompressed output from their decoder of choice directly into TAK's encoder.
; What about hardware support?
: There is none at the moment. However, ''-p0'', ''-p1'' and ''-p2'' are the candidates for most suitable settings for hardware.
; Will the source be opened?
: The official encoder and decoder are currently closed-source. Thomas has expressed an intention to open the source of the decoder at some point in time, stipulating preconditions of its first being further refined, ported to C or C++, and documented. This may or may not lead to releases of other code. However, as of June of 2013, he feels that “a lot of (not very exciting) work is required” until the decoding source would be ready to be published, and that may or may not happen in the foreseeable future. Such questions generally generate more noise than fruitful discussion, so it is best to wait and see what happens. In any case, there is an independently implemented open source decoder available, bundled with ffmpeg.

== External Links ==
* [http://thbeck.de/Tak/Tak.html thbeck.de/Tak/Tak.html] – Official Website ''(german)''
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=101386 TAK 2.3.0 Discussion Thread on HA] ''(english)''
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=89610 TAK 2.2.0 Discussion Thread on HA] ''(english)''
* [http://synthetic-soul.co.uk/comparison/lossless/ synthetic-soul.co.uk/comparison/lossless] – Comparison with Other Codecs (by Synthetic Soul)
* [http://flac.sourceforge.net/comparison.html flac.sourceforge.net/comparison.html] – An Updated Comparison (from FLAC Homepage)

[[Category:Lossless]]
[[Category:Encoder/Decoder]]

LossyWAV

2013-06-04T17:47:09Z

Skamp: added caudec to the list of Linux software that supports lossyWAV

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.3.0
| preview_release = <none>
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 1.3.0 release thread] [http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 1.3.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF_WAVE|WAV]] file format. Proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least_significant_bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512 --'''merge-blocks'''
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaudio.org/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.3.0, Copyright (C) 2007-2011 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, not fully transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10; E=7.5; H=5; S=2.5; C=0; P=-2.5; X=-5).
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --adaptive <n/t> modify settings for Sebastian Gesemann's adaptive noise
shaping method. takes a parameter to set the order of the
FIR filter, (32<=n<=96; default=64; multiple of 8 only);
"OFF" to disable adaptive shaping; "NOWARP" to disable
default frequency warping;
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=6; default=3,
i.e. 32, 64 & 1024 samples. n=2, remove 32 sample FFT;
n>3 add 512; n>4, add 256; n>6, add 128) nb. FFT lengths.
stated are for 44.1/48kHz audio, higher sample rates will
automatically increase all FFT lengths as required.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(10000<=n<=20000; default=16000).
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0<=n<=16; default=3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
--scale <n> factor to scale audio by; (0.0625<n<=8.0; default=1).
-s, --shaping [n] enable fixed noise shaping, takes optional parameter [n]
to allow user defined shaping proportion (0.0<=n<=1.0),
otherwise default to quality setting dependent value.
Disables adaptive noise shaping.
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
7<=n<=28, limited to bits-per-sample - 4).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist enable frequency analysis display of input data.
-H, --histogram show sample value histogram (input, lossy and correction).
--longdist show long frequency distribution data (input/lossy/lwcdf).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%1" flac -d "%1" --stdout --silent|lossywav - --stdout --standard --stdinname "%1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag --fromfile "%1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===Linux / OS X support: lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c). The latest stable version (1.3.0 at the time of writing) is fully supported.

[http://caudec.outpost.fr/ caudec] is a command-line tool that can encode and decode lossyWAV files (lossyFLAC, lossyWV, lossyTAK), using the official binary (lossyWAV.exe) with Wine (see: [http://caudec.outpost.fr/documentation/windowscodecs/ installation instructions]). Caudec can also test file integrity and compute (and tag) Replaygain data. While it hasn't been tested at the time of writing, it is possible that lossyWAV support in caudec works on OS X as well.

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.flac
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is : lossless or hybrid
Highest BPS mode supported: 24 </pre>

lossyTAK settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.tak
Parameters : /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wv
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wma
Parameters : /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|
c:\"program files"\bin\wmaencode - %d --codec lsl --ignorelength
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Will it low-pass filter my audio?
*'''Short answer:''' No. The frequency limit is for the analysis only. LossyWAV cannot low-pass filter your audio.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[Category:Software]]

TAK

2013-06-04T17:38:50Z

Skamp: added caudec to the list of Linux software that supports TAK

{{Codec Infobox
| name = Tom's lossless Audio Kompressor
| logo =
| type = lossless
| purpose = lossless audio compression.
| maintainer = Thomas Becker
| recommended_encoder = TAK encoder
| recommended_text = TAK v2.2.0
| website = [http://thbeck.de/Tak/Tak.html ThBeck.de/Tak/Tak.html] ''(german)''
}}

== Description ==
'''Tom's lossless Audio Kompressor''' ('''TAK''') is a lossless audio compressor which promises compression performance similar to [[Monkey's Audio]] “High” and decompression speed similar to [[Free Lossless Audio Codec|FLAC]].

=== Features ===
* High compression
* Fast compression and decompression speed
* Streaming support (necessary headers for decompressing the audio are written to the stream every 2 seconds)
* Piping support for encoding
* Error tolerance (single bit error will never affect more than 250 ms)
* Error detection (each frame protected by a 24-bit checksum (CRC))
* High-resolution (up to 24-bit/channel) audio support
* Support for up to 192 Khz Audio
* Seeking without seek table
* APEv2 tags supported at end of file

=== Pros ===
* Fast encoding speed (while providing better compression TAK encodes as fast as [[Free Lossless Audio Codec|FLAC]] -8 in TAK's “Insane” and several times faster in “Turbo” mode)
* Fast decompression speed (on par with FLAC / [[WavPack]])
* Good compression levels (on par with [[Monkey's Audio]] High)
* Error Robustness
* Fast Seeking

=== Cons ===
* Closed Source
* No hardware support
* Limited software support

== Software support ==
=== Windows ===
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=89610 TAK 2.2.0] (official release which consists of a CLI, a GUI, a [[Winamp]] plugin, the SDK, and the decoding library)
* [http://foosion.foobar2000.org/components/ TAK Decoder 0.4.4] - plugin for [[foobar2000]] (supports tagging and [[ReplayGain]])
* [http://www.liviocavallo.altervista.org/ dsfTAKSource 0.0.1.6] - DirectShow source filter to play TAK-files in Windows Media Player, Media Player Classic - Home Cinema, Zoom Player and alike
* [http://reino.degeelebosch.nl/ DC-Bass Source Mod 1.5.0.0] - DirectShow source filter to play TAK-files, amongst many others, in any DirectShow media player (as mentioned above)
* [[Mp3tag]] – universal tag editor with support for TAK
* [http://etree.org/shnutils/shntool/ shntool] (since version 3.0.6)

=== Linux ===
* ffmpeg can demux, decode and parse TAK since commit d7a473926504e2acfa6ae3bead0938e1f4e03441:[http://git.videolan.org/?p=ffmpeg.git;a=commit;h=d7a473926504e2acfa6ae3bead0938e1f4e03441]. First official release that supports TAK decoding is 1.1.
* The GUI program (Tak.exe) and the command-line program (Takc.exe) work with [http://www.winehq.org/ Wine].
* [http://caudec.outpost.fr caudec] is a command-line tool that can encode and decode TAK files, using the official binary (Takc.exe) with Wine (see: [http://caudec.outpost.fr/documentation/windowscodecs/ installation instructions]). Caudec can also test file integrity and compute (and tag) Replaygain data. While it hasn't been tested at the time of writing, it is possible that TAK support in caudec works on OS X as well.

== Hardware support ==
* None

== Recommended Settings ==
* Default compression: “-p2” (formerly ''Normal'') is the most attractive setting, providing an excellent compromise between compression and encoding speed. (At compression levels close to [[Monkey's Audio]] High (<0.4% difference), it is able to encode more quickly.)
takc -e [input file]
* Highest compression: “-pMax” (same as -p4m) (This will create files which are comparable in size to file created using [[Monkey's Audio]] High. Decompression speed is comparable to [[WavPack]] Normal.)
takc -e -pMax [input file]
* Fastest compression: “-p0” (This will create files which are comparable in size to [[Monkey's Audio]] Fast or [[WavPack]] High. Decompression speed is comparable to [[Free Lossless Audio Codec|FLAC]] 0.)
takc -e -p0 [input file]

=== TAK Performance Graph ===
[[Image:TAK_performance_graph_1-0-4.png|frame|center|Graph showing encoding and decoding rate against compression, using data from Synthetic Soul's test on TAK 1.0.4 (see [[TAK#External Links|External Links]])]]

== Using TAK ==
=== TAK with [[foobar2000]] ===
* Copy the takc.exe to your [[foobar2000]] directory
* Go to File → Preferences → Tools → Converter
* Set it up as shown:
[[Image:Tak_foobar_converter.png|frame|center|Screenshot of foobar 0.9.5 Converter settings for TAK 1.0.3]]
'''Note:''' replace -p2 with the desired compression level.

* TAK introduced encoding from STDIN in version 1.0.3, eliminating the need for a temporary file and greatly improving overall compression time. If you are using an earlier version of TAK use the following command line instead:
-e -p2 %s %d
* Use [[APEv2 specification|APEv2]] tagging (will be used as internal tagging)

=== TAK with EAC ===
Please read the [[EAC and TAK|wiki guide]], which details how to create TAK files with [[Exact Audio Copy|EAC]].

== Future Features ==
* Unicode support
* MD5 audio checksums for verification and identification
* A German version
* Embedded cue sheets
* Embedded cover art
* Multichannel audio

== Frequently Asked Questions ==
; Is the codec safe for use?
: Yes. To check, convert a WAVE to TAK and back and compare the two (or use foobar's bitcompare tool).
; Why should I use TAK?
: TAK offers high compression ratios with great decoding rates.
; What can I compress with TAK?
: TAK 1.0 can compress any integer-format (up to 24 bits per channel) PCM RIFF WAVE file (.wav). Piping support as of v1.0.3 is implemented, so converting lossless files to WAV first is not necessary.
; What about hardware support?
: None at the moment. Although, ''-p0'', ''-p1'' and ''-p2'' are the candidates for hardware playback.
; Will the source be opened?
: Yes, TAK will be open-source, as soon as the code is ported to C or C++ and documented. However, Thomas has mentioned that he would like to improve the codec before opening the source.

== External Links ==
* [http://thbeck.de/Tak/Tak.html thbeck.de/Tak/Tak.html] – Official Website ''(german)''
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=89610 TAK 2.2.0 Discussion Thread on HA] ''(english)''
* [http://synthetic-soul.co.uk/comparison/lossless/ synthetic-soul.co.uk/comparison/lossless] – Comparison with Other Codecs (by Synthetic Soul)
* [http://flac.sourceforge.net/comparison.html flac.sourceforge.net/comparison.html] – An Updated Comparison (from FLAC Homepage)

[[Category:Lossless]]
[[Category:Encoder/Decoder]]