ReplayGain legacy metadata formats

From Hydrogenaudio Knowledgebase
Revision as of 03:16, 19 January 2011 by Notat (talk | contribs) (move from ReplayGain specification)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

To allow for future expansion: If more than three values are stored, players should ignore those they do not recognise, but process those that they do. If additional Replay Gain adjustments other than Track and Album are stored, they should come after Track and Album. The Peak Amplitude must always occupy the first 4 bytes of the Replay Gain header frame. The three values listed above (or at least fields to hold the three values, should the values themselves be unknown) are required in all Replay Gain headers.

Range

The replay gain adjustment must be between -51.0 dB and +51.0 dB. Values outside this range must be limited to be within the range, though they are certainly in error, and should probably be re-calculated, or stored as "not set". For example, trying to cause a silent 24-bit file to play at 83 dB will yield a replay gain adjustment of +57 dB.

In practice, adjustment values from -23 dB to +17 dB are the likely extremes, and values from -18 dB to +2 dB are more usual.

Bit format

Each Replay Gain value should be stored in a Replay Gain Adjustment field consisting of two bytes (16 bits). Here are two example Replay Gain Adjustment fields:

Track gain adjustment

0 0 1 0 1 1 1 0 0 1 1 1 1 1 0 1
\___/ \___/ | \_______________/
  |     |   |         |        
name    |  sign       |        
code    |  bit        |        
        |             |        
   originator         |        
      code            |        
                 Replay Gain   
                  Adjustment   

Album gain adjustment

0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0
\___/ \___/ | \_______________/
  |     |   |         |
name    |  sign       |
code    |  bit        |
        |             |
   originator         |
      code            |
                 Replay Gain
                  Adjustment

In the above example, the Track Gain Adjustment is -12.5 dB, and was calculated automatically. The Album Gain Adjustment is +2.0 dB, and was set by the user.

Name code
000 = not set
001 = Track Gain Adjustment
010 = Album Gain Adjustment
other = reserved for future use

If space has been reserved for the Replay Gain in the file header, but no replay gain calculation has been carried out, then all bits (including the Name code) may be zero.

For each Replay Gain Adjustment field, if the name code = 000 (not set), then players should ignore the rest of that individual field.

For each Replay Gain Adjustment field, if the name code is an unrecognised value (i.e. not 001-Track or 010-Album), then players should ignore the rest of that individual field.

If no valid Replay Gain Adjustment fields are found (i.e. all name codes are either 000 or unknown), then the player should proceed as if the file contained no Replay Gain Adjustment information (see player requirements).

Originator code
000 = Replay Gain unspecified
001 = Replay Gain pre-set by artist/producer/mastering engineer
010 = Replay Gain set by user
011 = Replay Gain determined automatically, as described in Calculating (above)
other = reserved for future use

For each Replay Gain Adjustment field, if the name code is valid, but the Originator code is 000 (Replay Gain unspecified), then the player should ignore that Replay Gain adjustment field.

For each Replay Gain Adjustment field, if the name code is valid, but the Originator code is unknown, then the player should still use the information within that Replay Gain Adjustment field. This is because, even if we are unsure as to how the adjustment was determined, any valid Replay Gain adjustment is more useful than none at all.

If no valid Replay Gain Adjustment fields are found (i.e. all originator codes are 000), then the player should proceed as if the file contained no Replay Gain Adjustment information (see player requirements).

Sign bit
0 = positive gain (boost)
1 = negative gain (attenuation)
Replay Gain Adjustment

The value, multiplied by ten, stripped of its sign (since the + or - is stored in the "sign" bit), is represented in 9 bits. e.g. -3.1 dB becomes 31 = 000011111.

Default Value

$00 $00 (0000000000000000) should be used where no Replay Gain has been calculated or set. This value will be interpreted by players in the same manner as a file without a Replay Gain field in the header (see player requirements).

The values of xxxyyy0000000000 (where xxx is any name code, and yyy is any originator code) are all valid, but indicate that the Replay Gain is to be left at 83 dB (0 dB Replay Gain Adjustment). These are not default values, and should only be used where appropriate (e.g. where the user, producer, or Replay Gain calculation has indicated that the correct Replay Gain is 83 dB).

Illegal Values

The values xxxyyy1000000000 are all illegal. If enountered, players should treat them in the same manner as $00 $00 (the default value).

The value $xx $ff is not illegal, but it would give a false synch value within an mp3 file. The problems this may cause should be investigated, and a solution (e.g. unsychronisation) sought. Maybe this is a use for negative zero?

Peak amplitude data format

Scanning the file for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored within the file header. This can be used to check if the required replay gain adjustment will cause the file to clip.

Data Format

The maximum peak amplitude (a single value) should be stored as a 32-bit floating point number, where 1=digital full scale.

Uncompressed Files

Simply store the maximum absolute sample value held in the file (on any channel). The single sample value should be converted to a 32-bit float, such that digital full scale is equivalent to a value of 1.

Compressed files

Compressed audio does not exist as a waveform until it is decoded. Unfortunately, psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. However, it is likely that such values will be brought back within range after scaling by the replay level. Even so, it is necessary to store the peak value of a compressed file as a 32-bit floating-point representation, where +/-1 represent digital full scale, and values outside this range would usually clip.

Implementation

For uncompressed files, the maximum values must be found and stored. For compressed files, the files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom), and the maximum value stored.