Hydrogenaudio Knowledgebase - User contributions [en]

Opus

2018-02-22T15:47:09Z

Dynamic: /* Speech encoding quality */ Mention of VOIP mode signal modification

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.2.1
| preview_release = 1.2 rc1
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps] and [http://listening-test.coresv.net/results.htm also at 96 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 14 kbps in encoder version 1.2 (was 21 kbps in v1.1, 29 kbps in v1.0). Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz. Encoder version 1.2 includes great improvements to music encoding in the 32-64 kbps range, allowing full-band stereo at 32 kbps and providing acceptable quality at 48 kbps where artifacts are audible but rarely annoying. Version 1.3 is expected to further improve quality in this range.

Multi-format stereo music listening tests have demonstrated the superiority of Opus at 64 kbps and 96 kbps compared to the best AAC-LC, HE-AAC and Ogg Vorbis encoders, and at 96 kbps also to 128 kbps MP3 encoded using LAME -V 5.

==Indicative bitrate and quality==
The tables below give illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In encoder version 1.1 automatic detection of speech/music and bandwidth detection were introduced to improve mode decisions and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff, and these improvements are further enhanced in version 1.2 and the forthcoming 1.3. These tables are likely to require updates as the encoder is improved, especially in low-bitrate regions.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed. Note that the selection of ''VOIP'' mode will deliberately modify the sound with a High Pass Filter and emphasis of formants and harmonics to improve intelligibility of speech especially in noisy environments much as telephones do. ''Auto'' mode will not modify the sound prior to encoding so is usually better for high quality speech recordings or mixed speech and music.

{| class="wikitable" style="text-align:center"
|-
!Bitrate Target
!Bandwidth
!Typical Mode Used
!Speech Quality
!Use Cases / Competitive Codecs
|-
!Less than 5 kbps
| -
| -
| Bitrates lower than 6 kbps not supported by Opus
| Try [http://codec2.org/ codec2] for 0.7-3.2 kbps mono speech
|-
!6 kbps
|6 kHz medium-band
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, [[Speex]] also competitive
|-
!8 kbps
|6 kHz medium-band
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. [[Speex]] competitive.
|-
!12 kbps
|12 kHz super-wideband
|hybrid
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|20 kHz
|hybrid/CELT
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|20 kHz
|hybrid/CELT
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|CELT
|Essentially transparent speech plus moderately good stereo music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps or more
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|6 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|6 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|20 kHz
|hybrid/CELT
|Fairly poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|20 kHz
|hybrid/CELT
|Fair but OK for bitrate
|OK for incidental music
|-
!32 to 40 kbps
|stereo
|20 kHz
|CELT
|Moderately good stereo, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, may have problems with cymbals
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming. Beat LC-AAC, Vorbis, MP3 in [http://listening-test.coresv.net/results.htm listening test]
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!160 to 192 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

==== libopus v1.1.3 ====
Released July 15th, 2016. This version contains:

-Neon optimizations improving performance on ARMv7 and ARMv8 by up to 15%

-Fixes some issues with 16-bit platforms (e.g. TI C55x)

-Fixes to comfort noise generation (CNG)

-Documenting that PLC packets can also be 2 bytes

-Includes experimental ambisonics work (--enable-ambisonics)

==== libopus v1.2.1 ====
Released June 26th, 2017. This version contains:

-Speech quality improvements especially in the 12-20 kbit/s range

-Improved VBR encoding for hybrid mode

-More aggressive use of wider speech bandwidth, including fullband speech starting at 14 kbit/s

-Music quality improvements in the 32-48 kb/s range

-Generic and SSE CELT optimizations

-Support for directly encoding packets up to 120 ms

-DTX support for CELT mode

-SILK CBR improvements

-Support for all of the fixes in draft-ietf-codec-opus-update-06 (the mono downmix and the folding fixes need --enable-update-draft)

-Many bug fixes, including integer wrap-arounds discovered through fuzzing (no security implications)

=== Ports ===

==== Concentus ====

The libopus reference library (fixed-point variant) has successfully been ported to both '''C#''' and '''Java''', as part of a project called '''Concentus'''. The aim of the project is specifically to target cross-platform applications where native C interop is relatively difficult. The code is available on [https://github.com/lostromb/concentus Github] and distributed via standard package managers.

==== Emscripten ports ====

At least one implementation of opus in Javascript has been made using the automated tool [https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Emscripten emscripten]. See [https://blog.rillke.com/opusenc.js/ here], [https://github.com/kazuki/opus.js-sample here] and [https://github.com/audiocogs/opus.js here].

=== VoIP software ===
* The open source virtual PBX Freeswitch supports Opus transcoding.
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.
* Android 5.0 and above supports Opus natively if encapsulated in the Ogg container, but .opus filename extension is not recognized by Android, so the use of double filename extension .opus.ogg is recommended as a workaround to allow apps to recognize files as playable audio.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)
*# Clementine has Opus support
*# Audacious player
*# [[MPD]] as of version 0.18 if compiled against libopus (supports both encoding for http streams and decoding)

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]
*# MPC-HC

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]
*#foobar2000 [https://itunes.apple.com/us/app/foobar2000/id1072807669?mt=8 iOS]/[https://play.google.com/store/apps/details?id=com.foobar2000.foobar2000&hl=en Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]
*# [https://play.google.com/store/apps/details?id=com.acmeandroid.listen Listen Audiobook Player]
*# [https://play.google.com/store/apps/details?id=com.mxtech.videoplayer.ad MX Player]
*# [https://play.google.com/store/apps/details?id=org.tomahawk.tomahawk_android Tomahawk Player Beta]
*# [https://play.google.com/store/apps/details?id=com.maxmpz.audioplayer&hl=en Poweramp Music Player]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]
* [http://www.xdlab.ru/en/ TagScanner]
* [http://www.xmedia-recode.de/ XMedia Recode]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2018-02-05T12:22:42Z

Dynamic: /* Speech encoding quality */ Updated codec2 bitrate range

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.2.1
| preview_release = 1.2 rc1
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps] and [http://listening-test.coresv.net/results.htm also at 96 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 14 kbps in encoder version 1.2 (was 21 kbps in v1.1, 29 kbps in v1.0). Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz. Encoder version 1.2 includes great improvements to music encoding in the 32-64 kbps range, allowing full-band stereo at 32 kbps and providing acceptable quality at 48 kbps where artifacts are audible but rarely annoying. Version 1.3 is expected to further improve quality in this range.

Multi-format stereo music listening tests have demonstrated the superiority of Opus at 64 kbps and 96 kbps compared to the best AAC-LC, HE-AAC and Ogg Vorbis encoders, and at 96 kbps also to 128 kbps MP3 encoded using LAME -V 5.

==Indicative bitrate and quality==
The tables below give illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In encoder version 1.1 automatic detection of speech/music and bandwidth detection were introduced to improve mode decisions and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff, and these improvements are further enhanced in version 1.2 and the forthcoming 1.3. These tables are likely to require updates as the encoder is improved, especially in low-bitrate regions.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate Target
!Bandwidth
!Typical Mode Used
!Speech Quality
!Use Cases / Competitive Codecs
|-
!Less than 5 kbps
| -
| -
| Bitrates lower than 6 kbps not supported by Opus
| Try [http://codec2.org/ codec2] for 0.7-3.2 kbps mono speech
|-
!6 kbps
|6 kHz medium-band
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, [[Speex]] also competitive
|-
!8 kbps
|6 kHz medium-band
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. [[Speex]] competitive.
|-
!12 kbps
|12 kHz super-wideband
|hybrid
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|20 kHz
|hybrid/CELT
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|20 kHz
|hybrid/CELT
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|CELT
|Essentially transparent speech plus moderately good stereo music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps or more
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|6 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|6 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|20 kHz
|hybrid/CELT
|Fairly poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|20 kHz
|hybrid/CELT
|Fair but OK for bitrate
|OK for incidental music
|-
!32 to 40 kbps
|stereo
|20 kHz
|CELT
|Moderately good stereo, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, may have problems with cymbals
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming. Beat LC-AAC, Vorbis, MP3 in [http://listening-test.coresv.net/results.htm listening test]
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!160 to 192 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

==== libopus v1.1.3 ====
Released July 15th, 2016. This version contains:

-Neon optimizations improving performance on ARMv7 and ARMv8 by up to 15%

-Fixes some issues with 16-bit platforms (e.g. TI C55x)

-Fixes to comfort noise generation (CNG)

-Documenting that PLC packets can also be 2 bytes

-Includes experimental ambisonics work (--enable-ambisonics)

==== libopus v1.2.1 ====
Released June 26th, 2017. This version contains:

-Speech quality improvements especially in the 12-20 kbit/s range

-Improved VBR encoding for hybrid mode

-More aggressive use of wider speech bandwidth, including fullband speech starting at 14 kbit/s

-Music quality improvements in the 32-48 kb/s range

-Generic and SSE CELT optimizations

-Support for directly encoding packets up to 120 ms

-DTX support for CELT mode

-SILK CBR improvements

-Support for all of the fixes in draft-ietf-codec-opus-update-06 (the mono downmix and the folding fixes need --enable-update-draft)

-Many bug fixes, including integer wrap-arounds discovered through fuzzing (no security implications)

=== Ports ===

==== Concentus ====

The libopus reference library (fixed-point variant) has successfully been ported to both '''C#''' and '''Java''', as part of a project called '''Concentus'''. The aim of the project is specifically to target cross-platform applications where native C interop is relatively difficult. The code is available on [https://github.com/lostromb/concentus Github] and distributed via standard package managers.

==== Emscripten ports ====

At least one implementation of opus in Javascript has been made using the automated tool [https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Emscripten emscripten]. See [https://blog.rillke.com/opusenc.js/ here], [https://github.com/kazuki/opus.js-sample here] and [https://github.com/audiocogs/opus.js here].

=== VoIP software ===
* The open source virtual PBX Freeswitch supports Opus transcoding.
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.
* Android 5.0 and above supports Opus natively if encapsulated in the Ogg container, but .opus filename extension is not recognized by Android, so the use of double filename extension .opus.ogg is recommended as a workaround to allow apps to recognize files as playable audio.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)
*# Clementine has Opus support
*# Audacious player
*# [[MPD]] as of version 0.18 if compiled against libopus (supports both encoding for http streams and decoding)

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]
*# MPC-HC

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]
*#foobar2000 [https://itunes.apple.com/us/app/foobar2000/id1072807669?mt=8 iOS]/[https://play.google.com/store/apps/details?id=com.foobar2000.foobar2000&hl=en Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]
*# [https://play.google.com/store/apps/details?id=com.acmeandroid.listen Listen Audiobook Player]
*# [https://play.google.com/store/apps/details?id=com.mxtech.videoplayer.ad MX Player]
*# [https://play.google.com/store/apps/details?id=org.tomahawk.tomahawk_android Tomahawk Player Beta]
*# [https://play.google.com/store/apps/details?id=com.maxmpz.audioplayer&hl=en Poweramp Music Player]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]
* [http://www.xdlab.ru/en/ TagScanner]
* [http://www.xmedia-recode.de/ XMedia Recode]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2018-01-31T15:55:31Z

Dynamic: /* Indicative bitrate and quality */ In preamble, noted improvements from later encoders at low-bitrates, but have yet to update tables

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.2.1
| preview_release = 1.2 rc1
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps] and [http://listening-test.coresv.net/results.htm also at 96 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 14 kbps in encoder version 1.2 (was 21 kbps in v1.1, 29 kbps in v1.0). Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz. Encoder version 1.2 includes great improvements to music encoding in the 32-64 kbps range, allowing full-band stereo at 32 kbps and providing acceptable quality at 48 kbps where artifacts are audible but rarely annoying. Version 1.3 is expected to further improve quality in this range.

Multi-format stereo music listening tests have demonstrated the superiority of Opus at 64 kbps and 96 kbps compared to the best AAC-LC, HE-AAC and Ogg Vorbis encoders, and at 96 kbps also to 128 kbps MP3 encoded using LAME -V 5.

==Indicative bitrate and quality==
The tables below give illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In encoder version 1.1 automatic detection of speech/music and bandwidth detection were introduced to improve mode decisions and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff, and these improvements are further enhanced in version 1.2 and the forthcoming 1.3. These tables are likely to require updates as the encoder is improved, especially in low-bitrate regions.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate Target
!Bandwidth
!Typical Mode Used
!Speech Quality
!Use Cases / Competitive Codecs
|-
!Less than 5 kbps
| -
| -
| Bitrates lower than 6 kbps not supported by Opus
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|6 kHz medium-band
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, [[Speex]] also competitive
|-
!8 kbps
|6 kHz medium-band
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. [[Speex]] competitive.
|-
!12 kbps
|12 kHz super-wideband
|hybrid
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|20 kHz
|hybrid/CELT
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|20 kHz
|hybrid/CELT
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|CELT
|Essentially transparent speech plus moderately good stereo music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps or more
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|6 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|6 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|20 kHz
|hybrid/CELT
|Fairly poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|20 kHz
|hybrid/CELT
|Fair but OK for bitrate
|OK for incidental music
|-
!32 to 40 kbps
|stereo
|20 kHz
|CELT
|Moderately good stereo, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, may have problems with cymbals
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming. Beat LC-AAC, Vorbis, MP3 in [http://listening-test.coresv.net/results.htm listening test]
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!160 to 192 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

==== libopus v1.1.3 ====
Released July 15th, 2016. This version contains:

-Neon optimizations improving performance on ARMv7 and ARMv8 by up to 15%

-Fixes some issues with 16-bit platforms (e.g. TI C55x)

-Fixes to comfort noise generation (CNG)

-Documenting that PLC packets can also be 2 bytes

-Includes experimental ambisonics work (--enable-ambisonics)

==== libopus v1.2.1 ====
Released June 26th, 2017. This version contains:

-Speech quality improvements especially in the 12-20 kbit/s range

-Improved VBR encoding for hybrid mode

-More aggressive use of wider speech bandwidth, including fullband speech starting at 14 kbit/s

-Music quality improvements in the 32-48 kb/s range

-Generic and SSE CELT optimizations

-Support for directly encoding packets up to 120 ms

-DTX support for CELT mode

-SILK CBR improvements

-Support for all of the fixes in draft-ietf-codec-opus-update-06 (the mono downmix and the folding fixes need --enable-update-draft)

-Many bug fixes, including integer wrap-arounds discovered through fuzzing (no security implications)

=== Ports ===

==== Concentus ====

The libopus reference library (fixed-point variant) has successfully been ported to both '''C#''' and '''Java''', as part of a project called '''Concentus'''. The aim of the project is specifically to target cross-platform applications where native C interop is relatively difficult. The code is available on [https://github.com/lostromb/concentus Github] and distributed via standard package managers.

==== Emscripten ports ====

At least one implementation of opus in Javascript has been made using the automated tool [https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Emscripten emscripten]. See [https://blog.rillke.com/opusenc.js/ here], [https://github.com/kazuki/opus.js-sample here] and [https://github.com/audiocogs/opus.js here].

=== VoIP software ===
* The open source virtual PBX Freeswitch supports Opus transcoding.
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.
* Android 5.0 and above supports Opus natively if encapsulated in the Ogg container, but .opus filename extension is not recognized by Android, so the use of double filename extension .opus.ogg is recommended as a workaround to allow apps to recognize files as playable audio.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)
*# Clementine has Opus support
*# Audacious player
*# [[MPD]] as of version 0.18 if compiled against libopus (supports both encoding for http streams and decoding)

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]
*# MPC-HC

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]
*#foobar2000 [https://itunes.apple.com/us/app/foobar2000/id1072807669?mt=8 iOS]/[https://play.google.com/store/apps/details?id=com.foobar2000.foobar2000&hl=en Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]
*# [https://play.google.com/store/apps/details?id=com.acmeandroid.listen Listen Audiobook Player]
*# [https://play.google.com/store/apps/details?id=com.mxtech.videoplayer.ad MX Player]
*# [https://play.google.com/store/apps/details?id=org.tomahawk.tomahawk_android Tomahawk Player Beta]
*# [https://play.google.com/store/apps/details?id=com.maxmpz.audioplayer&hl=en Poweramp Music Player]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]
* [http://www.xdlab.ru/en/ TagScanner]
* [http://www.xmedia-recode.de/ XMedia Recode]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2018-01-31T14:37:36Z

Dynamic: /* Bitrate performance */ Modified full-band speech bitrates and stereo music range in light of v1.2 improvements and mentioned listening tests.

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.2.1
| preview_release = 1.2 rc1
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps] and [http://listening-test.coresv.net/results.htm also at 96 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 14 kbps in encoder version 1.2 (was 21 kbps in v1.1, 29 kbps in v1.0). Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz. Encoder version 1.2 includes great improvements to music encoding in the 32-64 kbps range, allowing full-band stereo at 32 kbps and providing acceptable quality at 48 kbps where artifacts are audible but rarely annoying. Version 1.3 is expected to further improve quality in this range.

Multi-format stereo music listening tests have demonstrated the superiority of Opus at 64 kbps and 96 kbps compared to the best AAC-LC, HE-AAC and Ogg Vorbis encoders, and at 96 kbps also to 128 kbps MP3 encoded using LAME -V 5.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate Target
!Bandwidth
!Typical Mode Used
!Speech Quality
!Use Cases / Competitive Codecs
|-
!Less than 5 kbps
| -
| -
| Bitrates lower than 6 kbps not supported by Opus
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|6 kHz medium-band
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, [[Speex]] also competitive
|-
!8 kbps
|6 kHz medium-band
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. [[Speex]] competitive.
|-
!12 kbps
|12 kHz super-wideband
|hybrid
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|20 kHz
|hybrid/CELT
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|20 kHz
|hybrid/CELT
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|CELT
|Essentially transparent speech plus moderately good stereo music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps or more
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|6 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|6 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|20 kHz
|hybrid/CELT
|Fairly poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|20 kHz
|hybrid/CELT
|Fair but OK for bitrate
|OK for incidental music
|-
!32 to 40 kbps
|stereo
|20 kHz
|CELT
|Moderately good stereo, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, may have problems with cymbals
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming. Beat LC-AAC, Vorbis, MP3 in [http://listening-test.coresv.net/results.htm listening test]
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!160 to 192 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

==== libopus v1.1.3 ====
Released July 15th, 2016. This version contains:

-Neon optimizations improving performance on ARMv7 and ARMv8 by up to 15%

-Fixes some issues with 16-bit platforms (e.g. TI C55x)

-Fixes to comfort noise generation (CNG)

-Documenting that PLC packets can also be 2 bytes

-Includes experimental ambisonics work (--enable-ambisonics)

==== libopus v1.2.1 ====
Released June 26th, 2017. This version contains:

-Speech quality improvements especially in the 12-20 kbit/s range

-Improved VBR encoding for hybrid mode

-More aggressive use of wider speech bandwidth, including fullband speech starting at 14 kbit/s

-Music quality improvements in the 32-48 kb/s range

-Generic and SSE CELT optimizations

-Support for directly encoding packets up to 120 ms

-DTX support for CELT mode

-SILK CBR improvements

-Support for all of the fixes in draft-ietf-codec-opus-update-06 (the mono downmix and the folding fixes need --enable-update-draft)

-Many bug fixes, including integer wrap-arounds discovered through fuzzing (no security implications)

=== Ports ===

==== Concentus ====

The libopus reference library (fixed-point variant) has successfully been ported to both '''C#''' and '''Java''', as part of a project called '''Concentus'''. The aim of the project is specifically to target cross-platform applications where native C interop is relatively difficult. The code is available on [https://github.com/lostromb/concentus Github] and distributed via standard package managers.

==== Emscripten ports ====

At least one implementation of opus in Javascript has been made using the automated tool [https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Emscripten emscripten]. See [https://blog.rillke.com/opusenc.js/ here], [https://github.com/kazuki/opus.js-sample here] and [https://github.com/audiocogs/opus.js here].

=== VoIP software ===
* The open source virtual PBX Freeswitch supports Opus transcoding.
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.
* Android 5.0 and above supports Opus natively if encapsulated in the Ogg container, but .opus filename extension is not recognized by Android, so the use of double filename extension .opus.ogg is recommended as a workaround to allow apps to recognize files as playable audio.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)
*# Clementine has Opus support
*# Audacious player
*# [[MPD]] as of version 0.18 if compiled against libopus (supports both encoding for http streams and decoding)

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]
*# MPC-HC

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]
*#foobar2000 [https://itunes.apple.com/us/app/foobar2000/id1072807669?mt=8 iOS]/[https://play.google.com/store/apps/details?id=com.foobar2000.foobar2000&hl=en Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]
*# [https://play.google.com/store/apps/details?id=com.acmeandroid.listen Listen Audiobook Player]
*# [https://play.google.com/store/apps/details?id=com.mxtech.videoplayer.ad MX Player]
*# [https://play.google.com/store/apps/details?id=org.tomahawk.tomahawk_android Tomahawk Player Beta]
*# [https://play.google.com/store/apps/details?id=com.maxmpz.audioplayer&hl=en Poweramp Music Player]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]
* [http://www.xdlab.ru/en/ TagScanner]
* [http://www.xmedia-recode.de/ XMedia Recode]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2018-01-31T14:24:07Z

Dynamic: /* Bitrate performance */

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.2.1
| preview_release = 1.2 rc1
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps] and [http://listening-test.coresv.net/results.htm also at 96 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 16 kbps (for encoder version 1.2 and later, nearer to 32 kbps in prior versions). Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate Target
!Bandwidth
!Typical Mode Used
!Speech Quality
!Use Cases / Competitive Codecs
|-
!Less than 5 kbps
| -
| -
| Bitrates lower than 6 kbps not supported by Opus
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|6 kHz medium-band
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, [[Speex]] also competitive
|-
!8 kbps
|6 kHz medium-band
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. [[Speex]] competitive.
|-
!12 kbps
|12 kHz super-wideband
|hybrid
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|20 kHz
|hybrid/CELT
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|20 kHz
|hybrid/CELT
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|CELT
|Essentially transparent speech plus moderately good stereo music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps or more
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|6 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|6 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|20 kHz
|hybrid/CELT
|Fairly poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|20 kHz
|hybrid/CELT
|Fair but OK for bitrate
|OK for incidental music
|-
!32 to 40 kbps
|stereo
|20 kHz
|CELT
|Moderately good stereo, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, may have problems with cymbals
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming. Beat LC-AAC, Vorbis, MP3 in [http://listening-test.coresv.net/results.htm listening test]
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!160 to 192 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

==== libopus v1.1.3 ====
Released July 15th, 2016. This version contains:

-Neon optimizations improving performance on ARMv7 and ARMv8 by up to 15%

-Fixes some issues with 16-bit platforms (e.g. TI C55x)

-Fixes to comfort noise generation (CNG)

-Documenting that PLC packets can also be 2 bytes

-Includes experimental ambisonics work (--enable-ambisonics)

==== libopus v1.2.1 ====
Released June 26th, 2017. This version contains:

-Speech quality improvements especially in the 12-20 kbit/s range

-Improved VBR encoding for hybrid mode

-More aggressive use of wider speech bandwidth, including fullband speech starting at 14 kbit/s

-Music quality improvements in the 32-48 kb/s range

-Generic and SSE CELT optimizations

-Support for directly encoding packets up to 120 ms

-DTX support for CELT mode

-SILK CBR improvements

-Support for all of the fixes in draft-ietf-codec-opus-update-06 (the mono downmix and the folding fixes need --enable-update-draft)

-Many bug fixes, including integer wrap-arounds discovered through fuzzing (no security implications)

=== Ports ===

==== Concentus ====

The libopus reference library (fixed-point variant) has successfully been ported to both '''C#''' and '''Java''', as part of a project called '''Concentus'''. The aim of the project is specifically to target cross-platform applications where native C interop is relatively difficult. The code is available on [https://github.com/lostromb/concentus Github] and distributed via standard package managers.

==== Emscripten ports ====

At least one implementation of opus in Javascript has been made using the automated tool [https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Emscripten emscripten]. See [https://blog.rillke.com/opusenc.js/ here], [https://github.com/kazuki/opus.js-sample here] and [https://github.com/audiocogs/opus.js here].

=== VoIP software ===
* The open source virtual PBX Freeswitch supports Opus transcoding.
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.
* Android 5.0 and above supports Opus natively if encapsulated in the Ogg container, but .opus filename extension is not recognized by Android, so the use of double filename extension .opus.ogg is recommended as a workaround to allow apps to recognize files as playable audio.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)
*# Clementine has Opus support
*# Audacious player
*# [[MPD]] as of version 0.18 if compiled against libopus (supports both encoding for http streams and decoding)

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]
*# MPC-HC

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]
*#foobar2000 [https://itunes.apple.com/us/app/foobar2000/id1072807669?mt=8 iOS]/[https://play.google.com/store/apps/details?id=com.foobar2000.foobar2000&hl=en Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]
*# [https://play.google.com/store/apps/details?id=com.acmeandroid.listen Listen Audiobook Player]
*# [https://play.google.com/store/apps/details?id=com.mxtech.videoplayer.ad MX Player]
*# [https://play.google.com/store/apps/details?id=org.tomahawk.tomahawk_android Tomahawk Player Beta]
*# [https://play.google.com/store/apps/details?id=com.maxmpz.audioplayer&hl=en Poweramp Music Player]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]
* [http://www.xdlab.ru/en/ TagScanner]
* [http://www.xmedia-recode.de/ XMedia Recode]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2016-09-27T16:17:52Z

Dynamic: /* Operating systems and desktop multimedia frameworks */ mention support in Android 5.0 and above and .opus.ogg extension workaround

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.1.3
| preview_release = 1.1.1-rc
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps] and [http://listening-test.coresv.net/results.htm also at 96 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate Target
!Bandwidth
!Typical Mode Used
!Speech Quality
!Use Cases / Competitive Codecs
|-
!Less than 5 kbps
| -
| -
| Bitrates lower than 6 kbps not supported by Opus
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, [[Speex]] also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. [[Speex]] competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps or more
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!36 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming. Beat LC-AAC, Vorbis, MP3 in [http://listening-test.coresv.net/results.htm listening test]
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!192 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

==== libopus v1.1.3 ====
Released July 15th, 2016. This version contains:

-Neon optimizations improving performance on ARMv7 and ARMv8 by up to 15%

-Fixes some issues with 16-bit platforms (e.g. TI C55x)

-Fixes to comfort noise generation (CNG)

-Documenting that PLC packets can also be 2 bytes

-Includes experimental ambisonics work (--enable-ambisonics)

=== Ports ===

==== Concentus ====

A project called '''Concentus''' was started with the aim of porting the libopus reference library to managed languages such as C# and Java. As of June 2016 it has succeeded in creating an Opus encoder/decoder in pure managed C#, though with significantly slower speed than the C library. A Java port is also planned in the future. It is available on [https://github.com/lostromb/concentus Github]

==== Emscripten ports ====

At least one implementation of opus in Javascript has been made using the automated tool [https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Emscripten emscripten]. See [https://blog.rillke.com/opusenc.js/ here], [https://github.com/kazuki/opus.js-sample here] and [https://github.com/audiocogs/opus.js here].

=== VoIP software ===
* The open source virtual PBX Freeswitch supports Opus transcoding.
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.
* Android 5.0 and above supports Opus natively if encapsulated in the Ogg container, but .opus filename extension is not recognized by Android, so the use of double filename extension .opus.ogg is recommended as a workaround to allow apps to recognize files as playable audio.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)
*# Clementine has Opus support
*# Audacious player

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]
*# MPC-HC

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]
*# [https://play.google.com/store/apps/details?id=com.acmeandroid.listen Listen Audiobook Player]
*# [https://play.google.com/store/apps/details?id=com.mxtech.videoplayer.ad MX Player]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]
* [http://www.xdlab.ru/en/ TagScanner]
* [http://www.xmedia-recode.de/ XMedia Recode]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

LAME

2016-06-16T14:52:57Z

Dynamic: /* Maximum quality and archiving */ "other" -> "higher". Clearly, significantly "lower bitrate" settings can affect perceived quality.

{{Software Infobox
| name = LAME
| logo = [[Image:Lamelogo.png|250px|LAME official logo]]
| screenshot =
| caption = LAME ain't an MP3 encoder
| maintainer = The LAME project
| stable_release = 3.99
| preview_release = 3.100
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = LGPL
| website = [http://lame.sourceforge.net/ LAME website], Download site: [http://www.rarewares.org/mp3-lame-bundle.php Rarewares LAME-bundle]
}}{{featured}}
'''LAME''' (Lame Ain't an MP3 Encoder) is the [[Hydrogenaudio]] recommended [[MP3]] encoder. It has been developed by the open-source community since 1998, and has become the highest quality MP3 encoder for most purposes.

Some benefits of using LAME:
* Highly optimised presets
* Fast encoding
* [[CBR]], [[ABR]] and quality-optimized [[VBR]] encoding methods
* [[Gapless]] playback with LAME-header compliant decoders
* Supported by recommended CD rippers [[Exact Audio Copy]] and [[CDex]]
* Highly tunable



==History==
LAME development began around mid-1998. Mike Cheng started it as a patch against the 8hz-MP3 encoder sources. After some quality concerns raised by others, he decided to start from scratch based on the dist10 sources.<ref>dist10 is the rudimentary "demonstration" MP3 encoder described in the MPEG-2 standard, ISO/IEC 13818.</ref> That branch (a patch against the reference sources) became LAME 2.0. By the release of LAME 3.81, all dist10 code was removed, making LAME a completely new program, not a mere patch of an existing encoder.

The project quickly became a team effort. Mike Cheng eventually left leadership and started working on [http://toolame.sourceforge.net/ tooLAME], an [[MP2]] encoder. Mark Taylor became leader and released version 3.0 featuring gpsycho, a new psychoacoustic model developed by him.

Nowadays LAME is considered the best MP3 encoder at mid & high bitrates, and features the best VBR model among MP3 implementations, mostly thanks to the dedicated work of talented developers Takehiro Tominaga, Naoki Shibata, Darin Morrison, Gabriel Bouvigne, Robert Hegemann, and others. Development is ongoing.

Although LAME is generally considered to be an encoder, according to the LAME technical FAQ, it's technically not an encoder, but rather is officially just "a development project which uses the open source model to improve MP3 technology." This improved technology is only released in source code form in order to minimize the risk of violating patents. When the source code is compiled and distributed, it ''may'' require a license from Thomson, depending on where and how it's to be used. The LAME project's position is "Source code is considered as speech, which may contain descriptions of patented technology. Descriptions of patents are in the public domain."

LAME source code is maintained in a CVS repository, and the only official codebase for public use is the trunk code tagged "MAIN". There are also numerous experimental branches of this code in which the developers test new ideas. One of these branches was started after the release of LAME 3.92 in 2002. To keep it from being confused with LAME 3.93 alpha versions, the code was made to self-identify as LAME 4.0 alpha 1 (in late 2002) through 4.0 alpha 14 (since 2005). This code is mainly for the developers to test optimizations and architectural changes in LAME's foundational code, ideas that may eventually be used in the main branch if and when development actually begins on LAME 4.0. However, some members of the public used this code to build working copies of "LAME 4.0" alpha versions in 2003-2005. These should not be considered actual LAME 4.0 releases and the developers do not want public feedback on them, nor do they want any more public builds to be made from this branch.

==Recommended encoder compiles and source code==

Unless noted otherwise, the recommended LAME compile for optimal quality is always the '''latest stable version'''.

'''Download the latest LAME from these links:'''
* [http://www.rarewares.org/mp3.php RareWares MP3 Page] - Compiles for Win32, Mac OS X universal binary, Linux etc.
* [http://sourceforge.net/project/showfiles.php?group_id=290&package_id=309 LAME source code on SourceForge]

Avoid using alpha versions of LAME. These versions have "a" in their version string and are usually only for testing changes and new features, and may result in lower quality MP3s. Use them only if you want to help the developers and provide feedback.

==Recommended encoder settings==
This section describes the [[Hydrogenaudio]] recommended settings to be used with LAME for highest quality MP3 encoding. These settings require LAME 3.98 or later (the latest stable version is recommended).

<div style="background-color: #F0F0F0; color: black; border: 1px solid black; margin: 1em; padding: 1em 2em 1em 2em;">
====Maximum quality and archiving====

Maximum quality is achieved when, regardless of listening conditions, you are unable to detect a difference between the MP3 and the original. As demonstrated by blind [[ABX]] tests, LAME-encoded MP3s typically achieve this level of [[transparency]] when encoded with the default settings, at bitrates well below maximum. Encoding with higher-bitrate settings will have no effect on the perceived quality.

For archiving, only [[lossless]] formats like [[WavPack]], [[FLAC]], etc. are ideal; they will preserve the audio with no changes, sample-for-sample, regardless of encoder settings. In contrast, lossy formats like MP3 are designed to save space by changing the audio in subtle, often imperceptible ways, even at the encoder's maximum settings.

====Very high quality: <font style="color:red">HiFi, home, or quiet listening, with best file size</font>====

<code><font style="color:green">-V0</font></code> (~245 kbps), <code><font style="color:green">-V1</font></code> (~225 kbps), <code><font style="color:green">-V2</font></code> (~190 kbps) or <code><font style="color:green">-V3</font></code> (~175 kbps) are recommended.

These [[VBR]] settings will normally produce [[transparency|transparent]] results. Audible differences between these presets may exist, but are rare.

====Very high quality: <font style="color:red">HiFi, home, or quiet listening, with maximum file size</font>====

<code><font style="color:red">-b 320</font></code> is an alternative to the VBR settings above.

This [[CBR]] mode will maximize the MP3's bitrate and overall file size. The extra space may allow for some parts of the audio to be compressed with fewer sacrifices, but to date, no one has produced ABX test results demonstrating that perceived quality is ever better than the highest VBR profiles described above.<ref>Prior to version 3.99, CBR and VBR modes were encoded differently by LAME. In some unusual problem samples, these differences were sometimes audible, even at very high bitrates. Current versions of LAME encode CBR and VBR with the same psychoacoustic model, so such differences shouldn't arise from normal use.</ref>

====Portable: <font style="color:purple">listening in noisy conditions, lower bitrate, smaller file size</font>====

<code><font style="color:purple">-V4</font></code> (~165 kbps), <code><font style="color:purple">-V5</font></code> (~130 kbps) or <code><font style="color:purple">-V6</font></code> (~115 kbps) are recommended.

<code><font style="color:purple">-V6</font></code> produces an "acceptable" quality, while <code><font style="color:purple">-V4</font></code> should be close to perceptual [[transparency]].

====Very low bitrate, small sizes: <font style="color:blue">eg. for voice, radio, [[mono]] encoding etc.</font>====

For very low bitrates, up to 100kbps, [[ABR]] is most often the best solution.
Use <code><font style="color:blue">--abr <bitrate></font></code> (e.g. --abr 80).

'''--preset voice''' is only available in the command line front-end, and is there for compatibility.
It is currently mapped to '''''--abr 56 -mm''''', so that means that the recommendation would be to encode in mono, and use ABR.
</div>

==Understanding the bitrate settings==
MP3s are divided into frames, each frame being a particular size, expressed as a [[bitrate]]. If the bitrate of every frame is the same throughout the file, then the file is considered to be ''constant bit rate'' ([[CBR]]). Otherwise, it is ''variable bit rate'' ([[VBR]]). LAME offers CBR and VBR encoding modes, as well as a special VBR encoding mode called [[ABR]] (''average bit rate'').

===VBR (variable bitrate) settings===
'''[[VBR]]:''' ''variable bitrate mode. Use variable bitrate modes when the goal is to achieve a fixed level of quality using the lowest possible bitrate.''

VBR is best used to target a specific quality level, instead of a specific bitrate. The final file size of a VBR encode is less predictable than with [[ABR]], but the quality is usually better.

Unlike other MP3 encoders which do VBR encoding based on predictions of output quality, LAME's default VBR method tests the ''actual'' output quality to ensure the desired quality level is always achieved.

'''Usage:''' <code>-V <number></code> where <number> is between 0 and 9, 0 being highest quality, 9 being the lowest. (Note: The "V" has to be a capital letter.)

'''Example:''' <code>-V 2</code>

Fractional values are also accepted, with 9.999 being the absolute lowest quality.

'''Example:''' <code>-V 2.75</code>

'''Note:''' The switch <code>--vbr-new</code>, which enabled a superior VBR mode in LAME 3.97 and some previous versions, is no longer needed with LAME 3.98 and higher, as it is now the default VBR mode. However, if you're still using LAME 3.97 or older, you have to add <code>--vbr-new</code> to your command line to use that mode.

The target bitrate and actual typical bitrate for each VBR quality level is shown in the [[#Technical information|Technical details for recommended LAME settings]] section below.

If you need a predictable bitrate (in a streaming application, for example), use ABR or CBR modes, described below.

===ABR (average bitrate) settings===
'''[[ABR]]:''' ''average bitrate mode. A compromise between VBR and CBR modes, ABR encoding varies bits around a specified target bitrate.''

Use ABR when you need to know the final size of the file but still want to allow the encoder some flexibility to decide which passages need more bits. The output is an ordinary VBR file compatible with all MP3 players that support VBR; ABR is not a special type of file, just a LAME-specific strategy for producing VBR.

'''Usage:''' <code>--preset <bitrate></code> where <bitrate> (desired averaged bitrate in kbit/s) is a value between 8 and 320.

'''Example:''' <code>--preset 200</code>

'''Important:''' ''ABR setting is tuned from 320 kbit/s down to 80 kbit/s.''

===CBR (constant bitrate) settings===
'''[[CBR]]:''' ''constant bitrate mode. CBR encoding is not efficient. Whereas VBR and ABR modes can supply more bits to complex music passages and save bits on simpler ones, CBR encodes every frame at the same bitrate.''

CBR is only recommended for usage in streaming situations where the upper bitrate must be strictly enforced. There is still some variability in bitrate behind the scenes, through LAME's use of the [[bit reservoir]] feature of the MP3 format, but it is much less flexible than actual VBR.

'''Usage:''' <code>-b <bitrate></code> where <bitrate> (bitrate in kbit/s) must be chosen from the following values: 8, 16, 24, 32, 40, 48, 64, 80, 96, 112, 128, 160, 192, 224, 256, or 320.

'''Example:''' <code>-b 192</code>

'''Important:''' ''CBR setting is tuned from 320 kbit/s down to 80 kbit/s.''

===Remarks===
* The rule of thumb when considering encoding options: at a given bitrate, [[VBR]] is higher quality than [[ABR]], which is higher quality than [[CBR]] (VBR > ABR > CBR in terms of quality). However, [[ABX]] tests demonstrate that as bitrate increases, the perceptual differences diminish, with all modes generally reaching [[transparency]] well before their maximum settings; when you can't tell the difference, the modes are qualitatively the same.

* In terms of filesize [[VBR]] tends to produce the smallest files down to -V7. For lower quality (e.g. for non-music audio such as speech) [[ABR]] will produce smaller files than [[VBR]], starting from --abr 115.

* All modes and settings mentioned in this topic belong to the specifications of the MP3 standard, and the resulting MP3s should be playable by every MP3 decoder that conforms with the standard. If your decoder or device does not play MP3s produced by LAME, blame the manufacturer or developer, not LAME.

* Prior to LAME 3.98, the <code>--vbr-new</code> switch enabled the new VBR mode. This is now the default VBR mode, with the old mode being available via <code>--vbr-old</code>. In terms of quality, the new mode appears to be better than the old, but reports of artifacts when using the new mode do exist. Despite these possible issues, the new mode is currently recommended due to both the speed and quality increases afforded by the new algorithm.

==Technical information==
===Recommended settings details===

{| class="wikitable" style="margin: 1em auto 1em auto;"
|+'''Technical details of the recommended settings'''
! style="vertical-align: bottom" | Switch !! style="vertical-align: bottom" | Preset !! style="width: 4em; vertical-align: bottom" | Target Kbps !! style="width: 4em; vertical-align: bottom" | Typical Kbps<ref>Typical bitrates are mostly based on the results of testing with LAME 3.98.2.</ref> !! style="width: 4em; vertical-align: bottom" | [[LAME Y switch|Y Switch]] !! style="vertical-align: bottom" | Lowpass<ref>This range is the transition band of the lowpass filter. Signal components are at full intensity at the lower frequency. Higher frequencies are attenuated on a slope which reaches zero at (and beyond) the high end of the given range. Further info can be found [http://www.hydrogenaud.io/forums/index.php?s=&showtopic=106868&view=findpost&p=874354 in the HA forum].</ref> !! style="vertical-align: bottom" | Resample
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-b 320</code> || <code>--preset insane</code> || style="text-align: right" | 320 || style="text-align: center" | 320 || style="text-align: center" | Y<ref>CBR mode uses <code>-Y</code> in effect; see the [[LAME Y switch]] article.</ref> || ||
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 0</code> || <code>--preset extreme</code> || style="text-align: right" | ~240 || style="text-align: center" | 220–260 || || ||
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 1</code> || || style="text-align: right" | ~220 || style="text-align: center" | 190–250 || || style="text-align: center" | 19383 Hz – 19916 Hz ||
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 2</code> || <code>--preset standard</code> || style="text-align: right" | ~190 || style="text-align: center" | 170–210 || || style="text-align: center" | 18671 Hz – 19205 Hz ||
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 3</code> || || style="text-align: right" | ~170 || style="text-align: center" | 150–195 || style="text-align: center" | Y || style="text-align: center" | 17960 Hz – 18494 Hz ||
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 4</code> || <code>--preset medium</code> || style="text-align: right" | ~160 || style="text-align: center" | 140–185 || style="text-align: center" | Y || style="text-align: center" | 17249 Hz – 17782 Hz ||
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 5</code> || || style="text-align: right" | ~130 || style="text-align: center" | 120–150 || style="text-align: center" | Y || style="text-align: center" | 16538 Hz – 17071 Hz ||
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 6</code> || || style="text-align: right" | ~120 || style="text-align: center" | 100–130 || style="text-align: center" | Y || style="text-align: center" | 15115 Hz – 15648 Hz ||
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 7</code> || || style="text-align: right" | ~100 || style="text-align: center" | 80–120 || style="text-align: center" | Y || style="text-align: center" | 14581 Hz – 14968 Hz || style="text-align: right" | 32000 Hz
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 8</code> || || style="text-align: right" | ~80 || style="text-align: center" | 70–105 || style="text-align: center" | Y || style="text-align: center" | 12516 Hz – 12903 Hz || style="text-align: right" | 32000 Hz
|-
|- style="background:white;color:black"
| style="text-align: center" | <code>-V 9</code> || || style="text-align: right" | ~70 || style="text-align: center" | 45–85 || style="text-align: center" | Y || style="text-align: center" | 9336 Hz – 9602 Hz || style="text-align: right" | 24000 Hz
|}

The default lowpass settings were not chosen at random; for general use, they are as high as they can be without putting quality at risk. Raising the the cutoff via command-line options is not recommended. See the [[high-frequency content in MP3s]] article for more info.

===Fraunhofer decoder incompatibility===
Differing interpretations of an unclear portion of the MP3 spec led to a Windows-specific version of the Fraunhofer IIS MP3 decoder being unable to properly play certain MP3s created with certain versions of LAME.

In order to demonstrate the problem, the problematic MP3 must have been created with LAME 3.97 or earlier, and must contain a frame with certain parameters and a very large amount of data, such as a 320-kbps frame which makes heavy use of the [[bit reservoir]]. The decoder must be the DirectShow filter <code>l3codecx.ax</code> version 1.5.0 or lower, as used by Windows Media Player on versions of Windows prior to Windows Vista. An [http://support.microsoft.com/kb/2115168/en-us August 2010 security update] for Windows XP and Server 2003 upgraded this filter to version 1.6.0, which can play the problematic MP3s. Windows Vista shipped with the older version but Windows Media Player uses a different filter, and later versions of Windows don't have the old filter at all.

A workaround was implemented in LAME 3.98.0 beta 1 through LAME 3.98.2, and in LAME 3.99 alpha 1, whereby 320-kbps frames were limited in how much of the bit reservoir they could use. This resulted in wasted space when the bit reservoir would grow beyond the limit. In LAME 3.98.3 and beyond, and in LAME 3.99 alpha 2 and beyond, the method was changed such that the bit reservoir can't grow beyond the limit.

Related discussion threads:
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=40308 LAME high bitrate files in l3codeca.ax]
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=78114 Lame 3.98 wastes bits]

===VBR header and LAME tag===

LAME supports the ''de facto'' standard of adding an extra frame of silence to the beginning of MP3 files. This "VBR header" or "Info tag" provides a home for precise info about the audio duration and a table of seek points. It is mainly for the benefit of players working with VBR files. Decoders usually treat the frame as informational, rather than playing the audio.

LAME uses the Xing format for this header, and extends it by embedding a 20-byte "LAME tag" with additional info:
* a short version string (9 ASCII bytes; see [[LAME version string]])
* audio and info tag CRCs (since LAME 3.90)
* separate delay & padding values for gapless playback (since LAME 3.90)
* various encoder settings (since LAME 3.90, expanded in 3.94 to include presets)

Prior to LAME 3.94, the VBR header was only written in VBR files. Since 3.94, it is written to CBR files, too, with "Info" instead of "XING" at the beginning.

Details are in this wiki's [[MP3#VBRI, XING, and LAME headers|MP3 article]] and [[LAME version string]] article, and in LAME developer Gabriel Bouvigne's [http://gabriel.mp3-tech.org/mp3infotag.html MP3 Info Tag] documentation.

===Hey! What happened to "--alt-preset"?===

The revolutionary <code>--alt-preset</code> system was introduced in LAME 3.90. It was replaced by the <code>--preset</code> flags in later versions.

Starting with version 3.94, the <code>-Vx</code> quality system was introduced, allowing finer control over the desired quality level and bitrate. The <code>--preset</code> switches were made into aliases to the corresponding <code>-V</code> flags for the sake of backward compatibility. '''There is no difference between the output you get if you use <code>-V2</code> or <code>--alt-preset standard</code>.'''

Recent LAME versions feature more streamlined command-line options, and it's recommended to stick to one of the values described in the text or shown in the table above.

For example, the following command-line options will all produce the same output:

* <code>--alt-preset insane</code>
* <code>--preset insane</code>
* <code>-b 320</code>
* <code>--preset 320</code>
* <code>--preset cbr 320</code>

==See also==
* [[LAME Y switch|The -Y switch]]
* [[MP3]]
* [[CBR]]
* [[VBR]]
* [[ABR]]
* [[Exact Audio Copy]]
* [[EAC and Lame | Configuring EAC and LAME]]

==Notes and references==
<references/>

==External links==
* [http://lame.sourceforge.net LAME official homepage]


[[Category:Software]]
[[Category:Encoder/Decoder]]
[[Category:MP3]]

Opus

2014-10-22T10:34:02Z

Dynamic: /* Music encoding quality */ Add 96kbps listening test to music quality table

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = 1.1 beta
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps] and [http://listening-test.coresv.net/results.htm also at 96 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!36 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming. Beat LC-AAC, Vorbis, MP3 in [http://listening-test.coresv.net/results.htm listening test]
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 (recommended latest stable release) ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

=== VoIP software ===
* The open source virtual PBX Freeswitch supports Opus transcoding.
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)
*# Clementine has Opus support

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]
* [http://www.xdlab.ru/en/ TagScanner]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2014-10-22T10:31:17Z

Dynamic: Added link to 96k listening test in Lead Section

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = 1.1 beta
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps] and [http://listening-test.coresv.net/results.htm also at 96 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!36 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 (recommended latest stable release) ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

=== VoIP software ===
* The open source virtual PBX Freeswitch supports Opus transcoding.
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)
*# Clementine has Opus support

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]
* [http://www.xdlab.ru/en/ TagScanner]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Hydrogenaudio

2014-09-11T09:11:13Z

Dynamic: typo

Hydrogenaudio is an internet community designed to be a focal point for information related to all facets of audio technology.
Hydrogenaudio features several [http://www.hydrogenaudio.org/forums/index.php?act=idx forums] for many audio related topic, a [http://www.hydrogenaudio.org/forums/index.php forum portal] and this [[Main Page|Knowledgebase]]

In mid 2014 the original hydrogenaudio.org domain was due to expire and the forums, wiki and other features were duplicated at the new domain hydrogenaud.io allowing links and citations on other sites to be updated using a simple search & replace.

== Logo ==
Here you can find various versions of the logo for linking on your own page or for any other use you might have for the Hydrogenaudio logo.

Standard Hydrogenaudio logo as seen on at the forum:
[[Image:Logo2b.png|frame|center|Normal HA logo]]

===Meaning===
''Taken from [http://www.hydrogenaud.io/forums/index.php?showtopic=15916 this HA thread].''

The logo symbolizes the spreading of information as well as expanding sound waves. It's easily recognizable and can also be modified to a smaller logo for sites that want to link to HA. The text is simple and non-distracting. It was decided not to have headphones in the logo, as it wouldn't be individual and original enough.

===Downloads===

; Adobe Illustrator version of the Logo
: [[Media:Logo AI.zip|Download]] (205kB).
: This is the original vector version of the logo. This will render correctly in Adobe illustrator.

; EPS version of the Logo
: [[Media:Logo eps.zip|Download]] (245kB).
: Alternate vector version. Might not render correctly.

; SVG version of the Logo
: [[Media:Logo svg.zip|Download]] (2kB).
: Alternate vector version. Might not render correctly.

; Oversized PNG version of the Logo
: If you don't have a vector program you can still get a decent resolution with [[Media:Logo_large.png|this oversized version]].

Hydrogenaudio

2014-09-11T09:10:18Z

Dynamic: Mentioned domain name migration in 2014

Hydrogenaudio is an internet community designed to be a focal point for information related to all facets of audio technology.
Hydrogenaudio features several [http://www.hydrogenaudio.org/forums/index.php?act=idx forums] for many audio related topic, a [http://www.hydrogenaudio.org/forums/index.php forum portal] and this [[Main Page|Knowledgebase]]

In mid 2014 the original hydrogenaudio.org domain was due to expire and the forum, wiki and other features were duplicated at the new domain hydrogenaud.io allowing links and citations on other sites to be updated using a simple search & replace.

== Logo ==
Here you can find various versions of the logo for linking on your own page or for any other use you might have for the Hydrogenaudio logo.

Standard Hydrogenaudio logo as seen on at the forum:
[[Image:Logo2b.png|frame|center|Normal HA logo]]

===Meaning===
''Taken from [http://www.hydrogenaud.io/forums/index.php?showtopic=15916 this HA thread].''

The logo symbolizes the spreading of information as well as expanding sound waves. It's easily recognizable and can also be modified to a smaller logo for sites that want to link to HA. The text is simple and non-distracting. It was decided not to have headphones in the logo, as it wouldn't be individual and original enough.

===Downloads===

; Adobe Illustrator version of the Logo
: [[Media:Logo AI.zip|Download]] (205kB).
: This is the original vector version of the logo. This will render correctly in Adobe illustrator.

; EPS version of the Logo
: [[Media:Logo eps.zip|Download]] (245kB).
: Alternate vector version. Might not render correctly.

; SVG version of the Logo
: [[Media:Logo svg.zip|Download]] (2kB).
: Alternate vector version. Might not render correctly.

; Oversized PNG version of the Logo
: If you don't have a vector program you can still get a decent resolution with [[Media:Logo_large.png|this oversized version]].

Opus

2014-07-20T06:38:39Z

Dynamic: /* libopus v1.1 (recommended latest stable release) */ Tidied up. mentioned surround improvements

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = 1.1 beta
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 (recommended latest stable release) ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead is not perfect but usually is undecided in audio where either mode will work well.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies.

'''Surround sound improvements''' were introduced since the beta release with considerable advances in coding efficiency, bitrate allocation and quality.

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2014-07-20T06:27:48Z

Dynamic: /* Commandline binaries & libopus versions */ Changed recommended version to 1.1

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = 1.1 beta
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server], plus in the foobar2000 free encoders pack and some alternative compiles through the hydrogenaud.io opus forum. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of mid 2014 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1 (recommended latest stable release) ====

The alpha source code released 21 Dec 2012 for testing & user feedback and following a beta release and testing, the stable 1.1 version was released on 5 December 2013, considered well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead, takes a second or two typically and will some

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies, and while easier to implement, developers would also been keen to know of any failure of this feature (potentially caused by aliasing, quantization and dithering/noise-shaping in source material).

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome have audio support as of version 33.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===

* Windows/Mac/Linux (Cross-Platform)
*# [[VLC]] (media player supports Opus as of version 2.0.4
*#[[Amarok]] 2.8 has transcoding support for Opus codec if ffmpeg is compiled with support for the libopus library & support for playback of Opus encoded files if Amarok is compiled against TagLib (newer than V1.8)

* Windows Exclusive
*# AIMP supports Opus natively as of version 3.20 build 1125 beta 1
*# [[foobar2000]] supports Opus natively as of v1.1.14 beta 1
*# Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
*# [[Winamp]] supports Opus using a [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in]

* iOS/Android (Cross-Platform)
*#Capriccio [https://itunes.apple.com/us/app/capriccio-free-ultimate-music/id434829018?mt=8 iOS]/[https://play.google.com/store/apps/details?id=me.ideariboso.capriccio Android]

* Android Exclusive
*# [http://gonemadmusicplayer.blogspot.com/ GoneMAD Music Player]
*# [http://neutronmp.com/ Neutron Music Player]
*# [http://www.videolan.org/vlc/download-android.html VLC Media Player for Android]
*# [https://play.google.com/store/apps/details?id=ru.recoilme.freeamp FreeMP]
*# [https://play.google.com/store/apps/details?id=net.mderezynski.youki3 Youki]
*# [https://play.google.com/store/apps/details?id=com.aimp.player AIMP for Android]

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-07-19T20:43:40Z

Dynamic: Changed preview release to 1.1 beta

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = 1.1 beta
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server]. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of early 2013 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 (recommended latest stable release) ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1-alpha ====
Source code released 21 Dec 2012 for testing & user feedback ([https://ftp.mozilla.org/pub/mozilla.org/opus/win32/opus-tools-0.1.6-opus-1.1-alpha-win32.zip win32 binaries]), but not yet considered stable and well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead, takes a second or two typically and will sometimes make incorrect decisions. The developers would be keen to know of examples of its failure.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies, and while easier to implement, developers would also been keen to know of any failure of this feature (potentially caused by aliasing, quantization and dithering/noise-shaping in source material).

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.
* [[Winamp]] supports for Opus via [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in].

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-07-12T07:44:18Z

Dynamic: /* Streaming audio */ Icecast directory now filtered by Opus format

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server]. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of early 2013 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 (recommended latest stable release) ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1-alpha ====
Source code released 21 Dec 2012 for testing & user feedback ([https://ftp.mozilla.org/pub/mozilla.org/opus/win32/opus-tools-0.1.6-opus-1.1-alpha-win32.zip win32 binaries]), but not yet considered stable and well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead, takes a second or two typically and will sometimes make incorrect decisions. The developers would be keen to know of examples of its failure.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies, and while easier to implement, developers would also been keen to know of any failure of this feature (potentially caused by aliasing, quantization and dithering/noise-shaping in source material).

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/by_format/Opus Stream directory by format Opus], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.
* [[Winamp]] supports for Opus via [http://forums.winamp.com/showthread.php?p=2925154#post2925154 3rd party plug-in].

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT
* [[MP3tag|MP3tag]]

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

LossyWAV

2013-05-12T15:57:52Z

Dynamic: /* Codec compatibility */ Added --merge-blocks switch to recommended Wavpack parameters

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.3.0
| preview_release = <none>
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 1.3.0 release thread]<br />[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 1.3.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF_WAVE|WAV]] file format. Proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least_significant_bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512 --'''merge-blocks'''
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaudio.org/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.3.0, Copyright (C) 2007-2011 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, not fully transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10; E=7.5; H=5; S=2.5; C=0; P=-2.5; X=-5).
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --adaptive <n/t> modify settings for Sebastian Gesemann's adaptive noise
shaping method. takes a parameter to set the order of the
FIR filter, (32<=n<=96; default=64; multiple of 8 only);
"OFF" to disable adaptive shaping; "NOWARP" to disable
default frequency warping;
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=6; default=3,
i.e. 32, 64 & 1024 samples. n=2, remove 32 sample FFT;
n>3 add 512; n>4, add 256; n>6, add 128) nb. FFT lengths.
stated are for 44.1/48kHz audio, higher sample rates will
automatically increase all FFT lengths as required.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(10000<=n<=20000; default=16000).
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0<=n<=16; default=3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
--scale <n> factor to scale audio by; (0.0625<n<=8.0; default=1).
-s, --shaping [n] enable fixed noise shaping, takes optional parameter [n]
to allow user defined shaping proportion (0.0<=n<=1.0),
otherwise default to quality setting dependent value.
Disables adaptive noise shaping.
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
7<=n<=28, limited to bits-per-sample - 4).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist enable frequency analysis display of input data.
-H, --histogram show sample value histogram (input, lossy and correction).
--longdist show long frequency distribution data (input/lossy/lwcdf).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%1" flac -d "%1" --stdout --silent|lossywav - --stdout --standard --stdinname "%1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag --fromfile "%1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c).

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.flac
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is : lossless or hybrid
Highest BPS mode supported: 24 </pre>

lossyTAK settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.tak
Parameters : /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wv
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wma
Parameters : /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|
c:\"program files"\bin\wmaencode - %d --codec lsl --ignorelength
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Will it low-pass filter my audio?
*'''Short answer:''' No. The frequency limit is for the analysis only. LossyWAV cannot low-pass filter your audio.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[Category:Software]]

LossyWAV

2013-05-06T16:15:41Z

Dynamic: /* Frequently asked questions */ Added FAQ: "Q: Will it low-pass filter my audio? A: No."

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.3.0
| preview_release = <none>
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 1.3.0 release thread]<br />[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 1.3.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF_WAVE|WAV]] file format. Proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least_significant_bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaudio.org/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.3.0, Copyright (C) 2007-2011 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, not fully transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10; E=7.5; H=5; S=2.5; C=0; P=-2.5; X=-5).
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --adaptive <n/t> modify settings for Sebastian Gesemann's adaptive noise
shaping method. takes a parameter to set the order of the
FIR filter, (32<=n<=96; default=64; multiple of 8 only);
"OFF" to disable adaptive shaping; "NOWARP" to disable
default frequency warping;
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=6; default=3,
i.e. 32, 64 & 1024 samples. n=2, remove 32 sample FFT;
n>3 add 512; n>4, add 256; n>6, add 128) nb. FFT lengths.
stated are for 44.1/48kHz audio, higher sample rates will
automatically increase all FFT lengths as required.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(10000<=n<=20000; default=16000).
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0<=n<=16; default=3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
--scale <n> factor to scale audio by; (0.0625<n<=8.0; default=1).
-s, --shaping [n] enable fixed noise shaping, takes optional parameter [n]
to allow user defined shaping proportion (0.0<=n<=1.0),
otherwise default to quality setting dependent value.
Disables adaptive noise shaping.
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
7<=n<=28, limited to bits-per-sample - 4).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist enable frequency analysis display of input data.
-H, --histogram show sample value histogram (input, lossy and correction).
--longdist show long frequency distribution data (input/lossy/lwcdf).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%1" flac -d "%1" --stdout --silent|lossywav - --stdout --standard --stdinname "%1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag --fromfile "%1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c).

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.flac
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is : lossless or hybrid
Highest BPS mode supported: 24 </pre>

lossyTAK settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.tak
Parameters : /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wv
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wma
Parameters : /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|
c:\"program files"\bin\wmaencode - %d --codec lsl --ignorelength
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Will it low-pass filter my audio?
*'''Short answer:''' No. The frequency limit is for the analysis only. LossyWAV cannot low-pass filter your audio.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[Category:Software]]

Opus

2013-03-19T16:08:32Z

Dynamic: Clarification that codebook comparison is against Vorbis.

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike Vorbis, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers, a field where patent-free Vorbis is commonly used.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server]. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of early 2013 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 (recommended latest stable release) ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1-alpha ====
Source code released 21 Dec 2012 for testing & user feedback ([https://ftp.mozilla.org/pub/mozilla.org/opus/win32/opus-tools-0.1.6-opus-1.1-alpha-win32.zip win32 binaries]), but not yet considered stable and well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead, takes a second or two typically and will sometimes make incorrect decisions. The developers would be keen to know of examples of its failure.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies, and while easier to implement, developers would also been keen to know of any failure of this feature (potentially caused by aliasing, quantization and dithering/noise-shaping in source material).

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/ Stream directory], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-02-21T18:46:36Z

Dynamic: Edit of introductory section to emphasise its value as a high quality storage format, not just interactive, and its dual speech/music compatibility.

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) designed to be suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} including music as well as speech, yet it is also very competitive for use as a storage and playback format, being a [http://people.xiph.org/~greg/opus/ha2011/ class leader at around 64 kbps]. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike these codecs, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server]. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of early 2013 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 (recommended latest stable release) ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1-alpha ====
Source code released 21 Dec 2012 for testing & user feedback ([https://ftp.mozilla.org/pub/mozilla.org/opus/win32/opus-tools-0.1.6-opus-1.1-alpha-win32.zip win32 binaries]), but not yet considered stable and well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead, takes a second or two typically and will sometimes make incorrect decisions. The developers would be keen to know of examples of its failure.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies, and while easier to implement, developers would also been keen to know of any failure of this feature (potentially caused by aliasing, quantization and dithering/noise-shaping in source material).

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/ Stream directory], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-02-08T02:04:01Z

Dynamic: Renamed Commandline Binaries section and summarised stable 1.0 release (recommended) and 1.1alpha version improvements

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) and made especially suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} though it is also very competitive for use as a storage and playback format. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike these codecs, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries & libopus versions ===
The commandline tools of the reference version are available pre-compiled for the most popular operating systems at [http://opus-codec.org/downloads opus-codec.org] and [https://ftp.mozilla.org/pub/mozilla.org/opus/ Mozilla's ftp server]. No other implementations of opus are currently known. The libopus commandline tools include encoder ''opusenc'', decoder ''opusdec'', and with a different license, the ''opusinfo'' opus stream & metadata analyzer.

The '''latest stable release''' is recommended for general use and as of early 2013 is considered competitive with or superior to the best alternative speech or general music encoders at most supported bitrates.

==== libopus v1.0 (recommended latest stable release) ====
Released 11 Sep 2012 when RFC6716 was standardized but mostly fully developed by late 2011.

'''Stable''', '''well-tuned''' ''opusenc'' reference encoder as included in RFC documentation.

CELT layer closely related to CELT 0.10 implements Constrained VBR mode by default (bitrate boost used mainly for transients), plus true CBR.

==== libopus v1.1-alpha ====
Source code released 21 Dec 2012 for testing & user feedback ([https://ftp.mozilla.org/pub/mozilla.org/opus/win32/opus-tools-0.1.6-opus-1.1-alpha-win32.zip win32 binaries]), but not yet considered stable and well tested enough for general release.

CELT layer [http://jmspeex.livejournal.com/11737.html quality improvements] introduced to provide '''unconstrained VBR''' include a rate boost not just for transients but now for highly tonal signals too and rate reduction when stereo image is narrow. There's also a rewrite of its '''transient detection''' code and '''time-frequency analysis''' code, and rewritten '''dynamic allocation''' code (HF/LF tilt and Band Boost) to allow more aggressive changes from the typical static allocation when warranted.

There are many minor improvements to '''speech quality''' in both SILK and CELT layers.

'''DC-rejection''' below 3 Hz also aids quality if inaudible DC offset is present with no effect on deep bass notes.

'''Automatic speech/music detection''' is introduced to optimize encoding mode choices, especially near the bitrate target range (presumably around 24~40kbps) where the encoder may perform best with SILK, hybrid or CELT depending on content type. Below that range SILK performs best for both music & speech, and above it CELT performs best for speech & music. The detection, without look-ahead, takes a second or two typically and will sometimes make incorrect decisions. The developers would be keen to know of examples of its failure.

'''Automatic bandwidth detection''' is also introduced to save wasted bits allocated to absent frequencies, and while easier to implement, developers would also been keen to know of any failure of this feature (potentially caused by aliasing, quantization and dithering/noise-shaping in source material).

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/ Stream directory], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-02-06T19:27:56Z

Dynamic: /* Streaming audio */ added some Icecast stream examples

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) and made especially suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} though it is also very competitive for use as a storage and playback format. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike these codecs, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries ===
The commandline tools are available pre-compiled for the most popular operating systems at [http://opus-codec.org opus-codec.org]

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast. (examples: [http://dir.xiph.org/ Stream directory], [http://smj.delfa.net/opus_64.m3u 64k]/[http://smj.delfa.net/opus_256.m3u 256k] [http://smj.delfa.net/ Smooth Jazz Opus Stream], [http://www.absoluteradio.co.uk/listen/labs.html Absolute Radio Opus Trial] 7 stations at 24,64,96 kbps, [http://icecast.ofdoom.com:8000/burst-opus.ogg Icecast Of Doom 96k]
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-02-06T19:12:59Z

Dynamic: /* CELT layer latency versus quality/bitrate trade-off */ Clarified use cases for lowest latencies.

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) and made especially suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} though it is also very competitive for use as a storage and playback format. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike these codecs, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for interactive streaming only. For music storage & delayed playback or non-interactive streaming, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries ===
The commandline tools are available pre-compiled for the most popular operating systems at [http://opus-codec.org opus-codec.org]

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast.
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

MP3

2013-01-29T15:46:33Z

Dynamic: /* Model 2 technical details */ Replaced mentions of BM with Basilar Membrane - please edit if incorrect

'''MPEG-1 Audio Layer 3''', more commonly referred to as MP3, is a popular digital audio encoding and lossy compression format, designed to greatly reduce the amount of data required to represent audio, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. It was invented by a team of European engineers who worked in the framework of the EUREKA 147 DAB digital radio research program, and it became an ISO/IEC standard in 1991.

== History ==
The MP3 algorithm development started in 1987, with a joint cooperation of [http://www.iis.fraunhofer.de/ Fraunhofer IIS-A] and the University of Erlangen. It is standardized as ISO-MPEG Audio Layer-3 (IS 11172-3 and IS 13818-3).

It soon became the de facto standard for lossy audio encoding, due to the high [[compression rates]] (1/11 of the original size, still retaining considerable quality), the high availability of decoders and the low CPU requirements for playback. (486 DX2-100 is enough for real-time decoding)

It supports [[multichannel]] files (see [http://www.mp3surround-format.com/ page]), [[sampling rate]]s from 16 kHz to 24 kHz (MPEG2 Layer 3) and 32 kHz to 48 kHz (MPEG1 Layer 3)

Formal and informal listening tests have shown that MP3 at the 160-224 kbps range provide encoded results indistinguishable from the original materials in most of the cases.

== Encoding and decoding ==
=== Encoding of MP3 audio ===
The MPEG-1 standard does not include a precise specification for an MP3 encoder. The decoding algorithm and file format, as a contrast, are well defined. Implementers of the standard were supposed to devise their own algorithms suitable for removing parts of the information in the raw audio (or rather its MDCT representation in the frequency domain). During encoding 576 time domain samples are taken and are transformed to 576 frequency domain samples. If there is a transient 192 samples are taken instead of 576. This is done to limit the temporal spread of quantization noise accompanying the transient.

This is the domain of psychoacoustics: the study of subjective human perception of sounds.

As a result, there are many different MP3 encoders available, each producing files of differing quality. Comparisons are widely available, so it is easy for a prospective user of an encoder to research the best choice. It must be kept in mind that an encoder that is proficient at encoding at higher bitrates (such as LAME, which is in widespread use for encoding at higher bitrates) is not necessarily as good at other, lower bitrates.

=== Decoding of MP3 audio ===
Decoding, on the other hand, is carefully defined in the standard. Most decoders are "bitstream compliant", meaning that the decompressed output they produce from a given MP3 file will be the same (within a specified degree of rounding tolerance) as the output specified mathematically in the ISO/IEC standard document. The MP3 file has a standard format which is a frame consisting of 384, 576, or 1152 samples (depends on MPEG version and layer) and all the frames have associated header information (32 bits) and side information (9, 17, or 32 bytes, depending on MPEG version and stereo/mono). The header and side information help the decoder to decode the associated Huffman encoded data correctly.

Therefore, for the most part, comparison of decoders is almost exclusively based on how computationally efficient they are (i.e., how much memory or CPU time they use in the decoding process).

== MP3 file structure ==
[[Image:MP3 file structure.png|thumb|right|500px|Breakdown of an MP3 File's Structure]]
An MP3 file is made up of multiple MP3 frames which consist of the MP3 header and the MP3 data. This sequence of frames is called an Elementary stream. Frames are independent items: one can cut the frames from a file and an MP3 player would be able to play it. The MP3 data is the actual audio payload. The diagram shows that the MP3 header consists of a sync word which is used to identify the beginning of a valid frame. This is followed by a bit indicating that this is the MPEG standard and two bits that indicate that layer 3 is being used, hence MPEG-1 Audio Layer 3 or MP3. After this, the values will differ depending on the MP3 file. The range of values for each section of the header along with the specification of the header is defined by ISO/IEC 11172-3.

Most MP3 files today contain ID3 metadata which precedes or follows the MP3 frames; this is also shown in the diagram.

===VBRI, XING, and LAME headers===
MP3 files often begin with a single frame of silence which contains an extra header that, when supported by decoders, results in the entire frame being treated as informational instead of being played (although some are known to do both). The extra header is in the frame's data section, before the actual silent audio data, and was originally intended to help with the playback of VBR files.

Xing and Fraunhofer each developed their own formats for this header. The Xing-format header is just called the ''Xing header'' or ''XING header''. The Fraunhofer-format header is called the ''VBRI header'' or ''VBR Info header''.

====Seek table====
Both formats specify a table of seek points, which help players correlate playback position (e.g., in seconds, or as a percentage) with byte offsets in the file.

====Gapless playback info====
In addition to the seek-point table, the Fraunhofer format contains a combined encoder delay & padding value (measured in samples), which can assist [[gapless playback]]. The encoder delay value is the number of samples added to the beginning of the audio data, and the encoder padding value is the number of samples added to the end. There's also a decoder delay, usually 529 samples of junk samples added to the beginning by the decoder. To determine the starting and ending samples of the non-delay, non-padding portion of the decoder output, MP3 players can perform the following calculation:

<nowiki>
gapless_range_start = encoder_delay + decoder_delay
if encoder_padding < decoder_delay:
gapless_range_end = total_samples
else:
gapless_range_end = total_samples - encoder_padding + decoder_delay</nowiki>

Alternatively, when <code>encoder_padding</code> < <code>decoder_delay</code>, a player could feed an extra MP3 frame to the decoder (e.g. a silent frame, or the first frame of the next MP3 in a sequence), and then use the second <code>gapless_range_end</code> calculation. At least one player (Rockbox) does the latter to handle an uncommon type of MP3 encoded specially for gapless playback, where one long stream is split up and written into separate files.

====LAME tag====
The [[LAME]] encoder extends the Xing header format. This modified header is sometimes called a ''LAME header'' or ''LAME tag'', although the actual LAME tag is only the LAME-specific data embedded in unused space in the header.

When the header was first added in LAME 3.12, the LAME tag contained only a 20-byte LAME version string. In LAME 3.90, this region was expanded to hold additional data, such as:
* audio and info tag CRCs
* separate delay & padding values for gapless playback
* various encoder settings (expanded in LAME 3.94 to include presets)
The modified header is also included in CBR files (effective LAME 3.94), with "Info" instead of "XING" near the beginning.

====Specs====
The Fraunhofer VBRI header and the LAME tag have explicit specifications. The Xing format can only be inferred from the C code the company provided to read the headers. Here are links to the code and specs:
* [http://gabriel.mp3-tech.org/mp3infotag.html LAME MP3 Info Tag spec]
* [http://www.all4mp3.com/tools/tech-and-tools.php All4mp3 mp3 Tech & Tools downloads] - official distribution site for Fraunhofer's ''Source code to add VBRi header to mp3 file'' (contains header spec) and ''MP3 VBR-Header SDK'' (header-reading C code sample)
* [http://www.mp3-tech.org/programmer/sources/vbrheadersdk.zip Xing Variable Bitrate MP3 Playback SDK]
* [http://mp3decoders.mp3-tech.org/decoders_lame.html#delays Information about common encoder and decoder delays]

== Technical information ==
=== Codec block diagram ===
A basic functional block diagram of the MPEG1 layer 3 audio codec is as shown below.
[[Image:Layer3_block.png|frame|center|Block diagram of the MPEG1 layer 3 audio]]

=== The hybrid polyphase filterbank ===

The polyphase [[filterbank]] is the key component common to all layers of MPEG1 audio compression. The purpose of the polyphase filterbank is to divide the audio signal into 32 equal-width [[frequency]] [[subband]]s, by using a set of [[bandpass filters]] covering the entire audio frequency range (a set of 512 tap FIR Filters).

====Polyphase Filterbank Formula====
[[Image:Poly_samples.png|frame|center|Polyphase filterbank]]

Audio is processed by frames of 1152 samples per audio channel. The polyphase filter groups 3 groups of 12 samples (3x12=36) samples per subband as seen from the picture above (3x12x32 subbands=1152 samples).

The polyphase filter bank and its inverse are not [[lossless]] transformations. Even without [[quantize|quantization]], the inverse transformation cannot perfectly recover the original signal. However by design the error introduced by the filter bank is small and inaudible.<br /><br />[[Image:Mdct.png|frame|center|MDCT]]<br />MDCT formula: <math> X(m)= \sum_{k=0}^{n-1}f(k)x(k)\cos [{ {\pi \over {2n}} ({2k+1+{n \over 2}})({2m+1})}],~m=0 ... {n \over 2}-1</math><br />

Layer 3 compensates for some of the filter bank deficiencies by processing the filter bank output with a Modified Discrete Cosine Transform ([[MDCT]]). The polyphase [[filterbank]] and the MDCT are together called as the hybrid filterbank. The hybrid filterbank adapts to the signal characteristics (block switching depending on the signal etc.).

The 32 [[subband]] signals are subdivided further in frequency content by applying a 18-spectral point or 6-spectral point MDCT. Layer 3 specifies two different MDCT block lengths: a long block (18 spectral points) or a short block (6 spectral points).

Long blocks have a higher frequency resolution. Each subband is transformed into 18 spectral coefficients by MDCT, yielding a maximum of 576 spectral coefficients (32x18=576 spectral lines) each representing a bandwidth of 41.67Hz at 48kHz sampling rate. At 48kHz sampling rate a long block has a time resolution of about x ms. There is a 50% overlap between successive transform windows, so the window size is 36 for long blocks.

Short blocks have a higher time resolution. Short block length is one third of a long block and used for transients to provide better time (temporal) resolution. Each subband is transformed into 6 spectral coefficients by MDCT, yielding a maximum of 192 spectral coefficients (32x6=192 spectral lines) each representing a bandwidth of 125Hz at 48kHz [[sampling rate]]. At 48kHz sampling rate a short block has impulse response of 18.6ms. There is a 50% overlap between successive transform windows, so the window size is 12 for short blocks.

Time resolution of long blocks and time resolution of short blocks are not constants, but jitter depending on the position of the sample in the transformed block. See [http://hydrogenaudio.org/musepack/klemm/www.personal.uni-jena.de/~pfk/mpp/timeres.html here] for diagrams showing the average time resolutions of different codecs.

[[Image:Freqlines.png|center|frame|Psychoacoustic-MDCT]]

Block switching ([[MDCT]] window switching) is triggered by [[Psychoacoustic|psycho acoustics]].

For a given frame of 1152 samples, the MDCT's can all have the same block length (long or short) or have a mixed-block mode (mixed-block mode for Lame is in development).

Unlike only the polyphase [[filterbank]], without quantization the MDCT transformation is [[lossless]].

Once the MDCT converts the audio signal into the [[frequency domain]], the [[aliasing]] introduced by the subsampling in the filterbank can be partially cancelled. The decoder has to undo this so that the inverse MDCT can reconstruct the [[subband]] samples in their original aliased form for reconstruction by the synthesis filterbank.

=== The psychoacoustic model ===

This section is a work in progress. It is incomplete and data is still being gathered.

==== Concepts ====
;[[Critical band]]s
: Much of what is done in simultaneous [[masking]] is based on the existence of critical bands. The hearing works much like a non-uniform filterbank, and the critical bands can be said to approximate the characteristics of those filters. Critical bands does not really have specific "on" and "off" frequencies, but rather width as a function of [[frequency]] - critical [[bandwidth]]s.

;Tonality estimation

;Spreading function
: Masking does not only occur within the [[critical band]], but also spreads to neighboring bands. A spreading function SF(z,a) can be defined, where z is the frequency and a the amplitude of a masker. This function would give a masking threshold produced by a single masker for neighboring frequencies. The simplest function would be a triangular function with slopes of +25 and -10 dB / [[Bark]], but a more sophisticated one is highly nonlinear and depends on both frequency and amplitude of masker.

;Simultaneous masking
: Simultaneous [[masking]] is a frequency domain phenomenon where a low level signal, e.g, a smallband noise (the maskee) can be made inaudible by simultaneously occurring stronger signal (the masker), e.g, a pure tone, if masker and maskee are close enough to each other in frequency. A masking threshold can be measured below which any signal will not be audible. The masking threshold depends on the sound pressure level (SPL) and the frequency of the masker, and on the characteristics of the masker and maskee. The slope of the masking threshold is steeper towards lower frequencies,i.e., higher frequencies are more easily masked.

: Without a masker, a signal is inaudible if its SPL is below the threshold of quiet, which depends on frequency and covers a dynamic range of more than 60 dB. We have just described masking by only one masker. If the source signal consists of many simultaneous maskers, a global masking threshold can be computed that describes the threshold of just noticeable distortions as a function of frequency. The calculation of the global masking threshold is based on the high resolution short term [[frequency|amplitude]] spectrum of the audio or speech signal, sufficient for critical band based analysis, and is determined in audio coding via 512 or 1024 point FFT. In a first step all individual masking thresholds are calculated, depending on signal level, type of masker(noise or tone), and frequency range. Next the global masking threshold is determined by adding all individual thresholds and the threshold in quiet (adding this later threshold ensures that the computed global masking threshold is not below the threshold in quiet). The effects of masking reaching over [[critical band]] bounds must be included in the calculation. Finally the global signal-to-mask ratio (SMR) is determined as the ratio of the maximum of signal power and global masking threshold.

;Temporal masking
: In addition to simultaneous [[masking]] two [[time domain]] phenomena also play an important role in human auditory perception, pre-masking and post-masking. The temporal masking effects occur before and after a masking signal has been switched on and off, respectively. The duration when pre-masking applies is less than -or as newer results indicate, significantly less than-one tenth that of the post-masking, which is in the order of 50 to 200 msec. Both pre and post-masking are being exploited in the ISO/MPEG audio coding algorithm.

: It uses either a separate [[filterbank]] or combines the calculation of energy values (for the masking calculations) and the main filter bank. The output of the perceptual model consists of values for the masking threshold or the allowed noise for each coder partition. If the quantization noise can be kept below the masking threshold, then the compression results should be indistinguishable from the original signal.

;[[ATH]]

;[[Masking]] threshold
: Masking raises the threshold of hearing, and compressors take advantage of this effect by raising the noise floor, which allows the audio waveform to be expressed with fewer bits. The noise floor can only be raised at [[frequency|frequencies]] at which there is effective masking.

: The equal widths of the [[subband]]s do not accurately reflect the human auditory system's frequency dependent behavior. The width of a "[[critical band]]" as a function of frequency is a good indicator of this behavior. Many psychoacoustic effects are consistent with a critical band frequency scaling. For example, both the perceived loudness of a signal and its audibility in the presence of a masking signal is different for signals within one critical band than for signals that extend over more than one critical band. Figure 2 compares the polyphase filter [[bandwidth]]s with the width of these critical bands. At lower frequencies a single subband covers several critical bands.

==== Simplified overview of the psychoacoustic model ====
* Perform a 1024-sample [[FFT]]s on each half of a frame (1152 samples) of the input signal, selecting the lower of the two masking thresholds to use for that subband.
* Each frequency bin is mapped to its corresponding critical band.
* Calculate a tonality index, a measure of whether a signal is more tone-like or noise-like.
* Use a defined spreading function to calculate the masking effect of the signal on neighbouring [[critical band]]s.
* Calculate the final masking threshold for each subband, using the tonality index, the output of the spreading function, and the [[ATH]].
* Calculate the signal-to-mask ratio for each [[subband]], and passes information on to the [[quantize|quantizer]].

==== More detailed overview the psychoacoustic model====
The MPEG/audio algorithm compresses the audio data in large part by removing the acoustically irrelevant parts of the audio signal. That is, it takes advantage of the human auditory system's inability to hear quantization noise under conditions of auditory masking. This masking is a perceptual property of the human auditory system that occurs whenever the presence of a strong audio signal makes a temporal or spectral neighborhood of weaker audio signals imperceptible. A variety of psychoacoustic experiments corroborate this masking phenomenon.

Empirical results also show that the human auditory system has a limited, [[frequency]] dependent, resolution. This frequency dependency can be expressed in terms of critical band widths which are less than 100Hz for the lowest audible frequencies and more than 4kHz at the highest. The human auditory system blurs the various signal components within a critical band although this system's frequency selectivity is much finer than a critical band.

The psychoacoustic model analyzes the audio signal and computes the amount of noise [[masking]] available as a function of frequency. The masking ability of a given signal component depends on its frequency position and its loudness. The encoder uses this information to decide how best to represent the input audio signal with its limited number of code bits. The MPEG/audio standard provides two example implementations of the psychoacoustic model.

Below is a general outline of the basic steps involved in the psychoacoustic calculations for either model. Differences between the two models will be highlighted.

* Time align audio data. There is one psychoacoustic evaluation per frame. The audio data sent to the psychoacoustic model must be concurrent with the audio data to be coded. The psychoacoustic model must account for both the delay of the audio data through the [[filterbank]] and a data offset so that the relevant data is centered within the psychoacoustic analysis window.
* Convert audio to a [[frequency]] domain representation. The psychoacoustic model should use a separate, independent, time-to-frequency mapping instead of the polyphase filter bank because it needs finer frequency resolution for an accurate calculation of the masking thresholds.

Layer II and III use a 1,152 sample frame size so the 1,024 sample window does not provide complete coverage. While ideally the analysis window should completely cover the samples to be coded, a 1,024 sample window is a reasonable compromise. Samples falling outside the analysis window generally will not have a major impact on the psychoacoustic evaluation.

For Layers II and III, the model computes two 1,024 point psychoacoustic calculations for each frame. The first calculation centers the first half of the 1,152 samples in the analysis window and the second calculation centers the second half. The model combines the results of the two calculations by using the higher of the two signal-to-mask ratios for each [[subband]]. This in effect selects the lower of the two noise masking thresholds for each subband.

* Process spectral values in groupings related to critical band widths. To simplify the psychoacoustic calculations, both models process the frequency values in perceptual quanta.

Psychoacoustic model 2 never actually separates tonal and non-tonal components. Instead, it computes a tonality index as a function of frequency. This index gives a measure of whether the component is more tone-like or noise-like. Model 2 uses this index to interpolate between pure tone-masking-noise and noise-masking-tone values. The tonality index is based on a measure of predictability. Model 2 uses data from the previous two analysis windows to predict, via linear extrapolation, the component values for the current window. Tonal components are more predictable and thus will have higher tonality indices. Because this process relies on more data, it is more likely to better discriminate between tonal and non-tonal components than the model 1 method.

* Apply a spreading function. The [[masking]] ability of a given signal spreads across its surrounding [[critical band]]. The model determines the noise masking thresholds by first applying an empirically determined masking (model 1) or spreading function (model 2) to the signal components.

* Set a lower bound for the threshold values. Both models include an empirically determined absolute masking threshold, the threshold in quiet. This threshold is the lower bound on the audibility of sound.

* Find the masking threshold for each [[subband]]. Model 2 selects the minimum of the masking thresholds covered by the subband only where the band is wide relative to the critical band in that [[frequency]] region. It uses the average of the masking thresholds covered by the subband where the band is narrow relative to the critical band. Model 2 is not less accurate for the higher frequency subbands because it does not concentrate the non-tonal components.

* Calculate the signal-to-mask ratio. The psychoacoustic model computes the signal-to-mask ratio as the ratio of the signal energy within the subband (or, for Layer III , a group of bands) to the minimum masking threshold for that subband. The model passes this value to the bit (or noise) allocation section of the encoder.

==== Model 2 technical details ====


The psychoacoustic model calculates just-noticeable distortion (JND) profiles for each band in the [[filterbank]]. This noise level is used to determine the actual quantizers and quantizer levels. There are two psychoacoustic models defined by the standard. They can be applied to any layer of the MPEG/Audio algorithm. In practice however, Model 1 has been used for Layers I and II and Model 2 for Layer III. Both models compute a signal-to-mask ratio (SMR) for each band (Layers I and II) or group of bands (Layer III).

The more sophisticated of the two, Model 2, will be discussed. The steps leading to the computation of the JND profiles is outlined below.

;1. Time-align audio data

The psychoacoustic model must estimate the [[masking]] thresholds for the audio data that are to be [[quantize|quantized]]. So, it must account for both the delay through the filterbank and a data offset so that the relevant data is centered within the psychoacoustic analysis window. For the Layer III algorithm, time-aligning the psychoacoustic model with the filterbank demands that the data fed to the model be delayed by 768 samples.

;2. Spectral analysis and normalization.

A high-resolution spectral estimate of the time-aligned data is essential for an accurate estimation of the masking thresholds in the [[critical band]]s. The low frequency resolution of the filterbank leaves no option but to compute an independent time-to-frequency mapping via a fast Fourier Transform ([[FFT]]). A Hanning window is applied to the data to reduce the edge effects of the transform window.

Layer III operates on 1152-sample data frames. Model 2 uses a 1024- point window for spectral estimation. Ideally, the analysis window should completely cover the samples to be coded. The model computes two 1024-point psychoacoustic calculations. On the first pass, the first 576 samples are centered in the analysis window. The second pass centers the remaining samples. The model combines the results of the two calculations by using the more stringent of the two JND estimates for bit or noise allocation in each [[subband]].

Since playback levels are unknown3, the sound-pressure level (SPL) needs to be normalized. This implies clamping the lowest point in the absolute threshold of hearing curves to +/- 1-bit [[frequency|amplitude]].

;3. Grouping of spectral values into threshold calculation partitions.

The uniform [[frequency]] decomposition and poor selectivity of the filterbank do not reflect the response of the ear's Basilar Membrane. To accurately model the masking phenomenon characteristic of the Basilar Membrane, the spectral values are grouped into a large number of partitions. The exact number of threshold partitions depends on the choice of sampling rate. This transformation provides a resolution of approximately either 1 FFT line or 1/3 critical band, whichever is smaller. At low frequencies, a single line of the FFT will constitute a partition, while at high frequency|frequencies many lines are grouped into one.

;4. Estimation of tonality indices.

It is necessary to identify tonal and non-tonal (noise-like) components because the masking abilities of the two types of signals differ. Model 2 does not explicitly separate tonal and non-tonal components. Instead, it computes a tonality index as a function of frequency. This is an indicator of the tone-like or noise-like nature of the spectral component. The tonality index is based on a measure of predictability. Linear extrapolation is used to predict the component values of the current window from the previous two analysis windows. Model 2 uses this index to interpolate between pure tone-masking-noise and noise-masking-tone values. Tonal components are more predictable and thus have a higher tonality index. As this process has memory, it is more likely to discriminate better between tonal and non-tonal components, unlike psychoacoustic Model 116.

;5. Simulation of the spread of masking on the Basilar Membrane.

A strong signal component affects the audibility of weaker components in the same critical band and the adjacent bands. Model 2 simulates this phenomenon by applying a Spreading function to spread the energy of any critical band into its surrounding bands. On the [[Bark]] scale, the spreading function has a constant shape as a function of partition number, with slopes of +25 and –10 dB per Bark.

;6. Set a lower bound for the threshold values.

An empirically determined absolute [[masking]] threshold, the threshold in quiet, is used as a lower bound on the audibility of sound.

;7. Determination of masking threshold per [[subband]].

At low [[frequency|frequencies]], the minimum of the masking thresholds within a subband is chosen as the threshold value. At higher frequencies, the average of the thresholds within the subband is selected as the masking threshold. Model 2 has the same accuracy for the higher subbands as for low frequency ones because it does not concentrate non-tonal components16.

;8. [[Pre echo]] detection and window switching decision.

;9. Calculation of the signal-to-mask ratio (SMR).

SMR is calculated as a ratio of signal energy within the subband (for Layers I and II) or a group of subbands (Layer III) to the minimum threshold for that subband. This is the final output of the psychoacoustic model.

The masking threshold computed from the spread energy and the tonality index.

== Pros and cons ==
=== Pros ===
* Widespread acceptance, support in nearly all hardware audio players and devices
* An [[ISO]] standard, part of MPEG specs
* Fast decoding, lower complexity than [[Advanced Audio Coding|AAC]] or [[Vorbis]]
* Anyone can create their own implementation (Specs and demo sources available)
* Relaxed licensing schedule

=== Cons ===
* Lower performance/efficiency than modern codecs.
* Problem cases that trip out all transform codecs.
* Sometimes, maximum bitrate (320kbps) isn't enough.
* Unusable for high definition audio (sampling rates higher than 48kHz).

== See also ==
=== Techniques used in compression ===
* [[Huffman coding]]
* [[Quantization]]
* [[Joint stereo|M/S matrixing]]
* [[Intensity stereo]]
* [[Channel coupling]]
* Modified discrete cosine transform ([[MDCT]])
* Polyphase filter bank

There is a non-standardized form of MP3 called [[MP3Pro]], which takes advantage of [[SBR]] encoding to provide better quality at low bitrates.

=== Encoders/decoders (supported platforms) ===
* [[LAME]] (Win32/Posix)
* [[Audioactive]] (Win32)
* [[Blade]] (Win32/Posix)
* [[Xing]] (Win32)
* [[Gogo]] (Win32/Posix)

=== Metadata (tags) ===
* [[ID3v1]]
* [[ID3v1.1]]
* [[ID3v2]]

== Further reading and bibliography ==
* [[Best MP3 Decoder]]
* [[High-frequency content in MP3s]]

== External links ==
* <s>Roberto's listening test</s> featuring MP3 encoders
* [http://en.wikipedia.org/wiki/Mp3 MP3 at Wikipedia]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:MP3]]

Opus

2013-01-13T17:42:33Z

Dynamic: /* Music encoding quality */ info that mono/stereo column is not definitive

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) and made especially suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} though it is also very competitive for use as a storage and playback format. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike these codecs, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates, though a certain amount of stereo encoding can still be used - content dependent even when mono is specified as the typical stereo mode in the table below.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics.
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for streaming only. For music storage & delayed playback, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries ===
The commandline tools are available pre-compiled for the most popular operating systems at [http://opus-codec.org opus-codec.org]

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast.
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-01-13T16:06:34Z

Dynamic: /* Music encoding quality */

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) and made especially suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} though it is also very competitive for use as a storage and playback format. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike these codecs, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. Competitor HE-AAC@32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for streaming only. For music storage & delayed playback, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries ===
The commandline tools are available pre-compiled for the most popular operating systems at [http://opus-codec.org opus-codec.org]

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast.
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-01-13T01:10:51Z

Dynamic: /* Characteristics */

{{Software Infobox
| name = Opus
| logo = [[Image:opus-logo.png|250px|Official Opus logo]]
| screenshot =
| caption = Opus Interactive Audio Codec
| maintainer = [http://xiph.org/ Xiph.Org Foundation]
| stable_release = 1.0.2
| preview_release = exp_analysis7
| operating_system = Windows, Mac OS/X, Linux/BSD
| use = Encoder/Decoder
| license = 3-clause BSD license
| website = [http://www.opus-codec.org/ opus-codec.org]
}}

'''Opus''' is a [[lossy]] audio compression format developed by the Internet Engineering Task Force (IETF) and made especially suitable for interactive real-time applications over the Internet,{{ref|homepage|a}} though it is also very competitive for use as a storage and playback format. As an open format standardised through [http://tools.ietf.org/html/rfc6716 Request for Comments (RFC) 6716],{{ref|RFC|c}} a high quality reference implementation is provided under the 3-clause BSD license{{ref|homepage|a}} which compiles and runs on the vast majority of general purpose and embedded (fixed point) processors. Many Software patents which cover Opus are licensed under royalty-free terms.{{ref|FAQ|b}} Opus is also a Mandatory To Implement (MTI) codec for the upcoming WebRTC (Web Real Time Communication) specification of the World Wide Web Consortium (W3C).

Opus incorporates technology from two codecs, the speech-oriented SILK codec developed by Skype and the multi-purpose low-latency CELT codec developed by Xiph.org with significant changes to each to ensure they can work together.{{ref|RFC|c}} Opus can seamlessly transition among high and low bitrates, using a linear prediction codec (the SILK layer) at lower bitrates and a lapped transform codec (the CELT layer) at higher bitrates, as well as a hybrid of the two for a short overlap in which SILK encodes the 0-8kHz spectrum and the CELT layer encodes only the frequencies above 8kHz.{{ref|RFC|c}} Opus has very low algorithmic delay (typ 22.5 ms) compared to popular music formats such as [[MP3]], [[Vorbis |Ogg Vorbis]], [[AAC | LC-AAC and HE-AAC]] (all over 100 ms), yet performs very competitively with them in terms of quality per bitrate, making it comparably viable as a storage & playback format. Also unlike these codecs, Opus does not require the definition of large codebooks for each individual file, making it also preferable for short clips of audio, such as those often used by game developers.{{ref|RFC|c}}

Considerably more details of the history and potential applications for Opus are included in the ''Wikipedia'' page for '''[http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Opus (audio format)]'''

==Characteristics==
Opus supports bitrates from 6kbps to 510kbps for typical stereo audio sources (and a maximum of around 255 kbps per channel for multichannel audio), with the 'sweet spot' for music and general audio around 30kbps (mono) and 40-100 kbps (stereo). It is intrinsically [[VBR | variable bitrate]], though constrained VBR and [[CBR | constant bitrate]] modes are possible where required. In the case of the reference release, libopus, the target bitrate is calibrated against the internal constant quality targets so that over a typical music collection, something very close to the target bitrate will be achieved. This bitrate-calibrated approach differs from most VBR encoders (e.g. LAME, helix mp3, qaac, Nero aacenc, Ogg Vorbis, Musepack) where a setting on some 'constant quality' scale (which differs between encoders) is used and the bitrate will fall where it may. Improved future versions can be expected to offer improved quality at the same setting. Independent implementations may adopt a different approach.

Opus is able to seamlessly adapt its mode of operation without glitches or sound interruption (an illustrative demonstration of [http://opus-codec.org/examples/#gauge bitrate scalability] is on the Opus Examples page), which can be particularly useful for mixed-content audio or varying network conditions, making the unified Opus codec superior to a suite of different codecs that might otherwise cover the same range of bitrate and quality settings and would require out-of-band signalling to instigate codec switching. The switching includes the choice of mono, stereo and other channel mappings, the use of the speech-oriented SILK layer, the general-purpose CELT layer or the hybrid of both, and the use of different audio bandwidths (4kHz, 6kHz, 8kHz, 12kHz, 20kHz) as well as the quality adjustments within the same operating mode that are available in most VBR-capable codecs.

Of importance mainly to interactive uses, but potentially useful in time-delayed audio streaming also, Opus includes packet loss concealment (PLC) in all modes and, in the speech-oriented modes where the SILK layer is active it also supports Forward Error Correction (FEC) where the expected rate of packet loss can be indicated to the encoder by the user or by application software and critical frames (e.g. consonant sounds) can be retransmitted at low bitrate to preserve intelligibility.

For music and general audio, the CELT layer of Opus builds on knowledge gained during xiph.org's Vorbis development and ensures as a primary goal that the total energy in each spectral band is preserved while requiring only a modest bitrate overhead to achieve this, thereby eliminating a lot of bitrate-starvation artifacts such as 'birdies' that are common in low-bitrate MP3, especially during transients, applause and cymbal sounds. This technique likewise increases coding efficiency at bitrates targetting transparent music reproduction. Short blocks (2.5 ms) are also possible for efficient transient handling. Short blocks can also be used exclusively, if very low algorithmic delay (5.0ms) is required to enable very low-latency interative audio (e.g. live networked music performances such as remote jam sessions), though greater bitrate is then required to maintain the same quality (illustrated in [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo Monty's CELT demo page] under Constant PEAQ value, varying latency). CELT uses a number of additional techniques and provides additional advanced tools to enable encoder tuning.

Opus natively supports [[gapless playback]] (though [[Gapless_playback#Poorly_designed_playback_systems | poor player design]] might itself induce interruptions during playback). Playback gain is also required, making some form of [[ReplayGain]] or [[ReplayGain_2.0_specification | similar]] volume control possible in any compliant player.

==Bitrate performance==
For mono speech, Opus ranges from intelligible narrowband speech reproduction starting at 6 kbps to medium-band, wideband and superwideband speech, reaching full-band speech by around 32 kbps. Above about 32 kbps, the SILK layer is no longer used at all, as CELT alone gives superior quality.

For music, the SILK modes are quite tolerable and better than CELT at very low bitrates. The hybrid mode is adopted as bitrate increases, extending bandwidth first to 12kHz (comparable with compact cassette) then to the full 20kHz and CELT then takes over. Assuming the source is stereo, the transition from mono to stereo typically happens between the transition from 12kHz to 20kHz.

==Indicative bitrate and quality==
The table below gives illustrative, indicative quality guidance based on typical modes used internally by Opus and a range of listening tests.

In the experimental libopus version 1.1-alpha, automatic detection of speech/music and bandwidth detection have been introduced to improve mode decisions, and VBR is less constrained, all with the aim of maximizing the quality/bitrate tradeoff. Thus changes are likely, and this table is likely to require small updates as the encoder is improved.

===Speech encoding quality===
This table assumes a '''monophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate) but mentions stereo compatibility for 40kbps+. The default 20ms frame size (22.5ms latency) is assumed.

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Bandwidth
!typ SILK/CELT use
!Speech quality notes
!Use cases/notes/competitive codecs
|-
!1 to 5 kbps
| -
| -
| <6kbps bitrate not supported
| Try [http://codec2.org/ codec2] for 1.2-2.4 kbps speech
|-
!6 kbps
|4 kHz
|SILK
|Fair, intelligible
|AMR-NB may be a little better, but higher latency & proprietary, Speex also competitive
|-
!8 kbps
|4 kHz narrowband
|SILK
|Close to telephone quality
|AMR-NB & AMR-WB similar quality, but higher latency & proprietary. Speex competitive.
|-
!12 kbps
|6 kHz medium-band
|SILK
|Medium bandwidth, better than telephone quality
|Similar quality to AMR-WB
|-
!16 kbps
|8 kHz wideband
|SILK
|Wideband speech quality
|Similar to/better than AMR-WB
|-
!24 kbps
|12 kHz super-wideband
|hybrid
|Near transparent speech
|Better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!32 kbps
|20 kHz
|hybrid / possibly CELT
|Essentially transparent speech plus moderately good mono music
|Much better than AMR-WB. Podcasts/audiobooks/talk-radio.
|-
!40 kbps
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, fairly good stereo music
|Stereo podcasts/audiobooks/talk radio with some music
|-
!48 kbps+
|20 kHz
|CELT
|Essentially transparent mono or stereo speech, reasonable music
|Flexible general purpose modes to suit mixed music and speech
|-
|}

===Music encoding quality===
This table assumes a '''stereophonic''' source sampled at CD quality or above (typ 48 kHz sampling rate). Opus will automatically use mono at very low bitrates

{| class="wikitable" style="text-align:center"
|-
!Bitrate target
!Stereo mode
!Bandwidth
!typ SILK/CELT use
!Music quality notes
!Use cases/notes/competitive codecs
|-
!6 kbps
|mono
|4 kHz
|SILK
|Poor, muffled sound but intelligible lyrics
| -
|-
!8 kbps
|mono
|4 kHz
|SILK
|Poor, muffled but OK for bitrate
| -
|-
!14 to 16 kbps
|mono
|6 kHz
|SILK
|Fairly Poor but OK for bitrate
|Perhaps acceptable for incidental music
|-
!22 to 24 kbps
|mono
|8 kHz
|SILK
|Fair but OK for bitrate
|OK for incidental music
|-
!32 kbps
|mono
|12 kHz
|hybrid
|Moderately good mono, reasonably bright treble (c.f. mono cassette)
|Good for podcasts, audiobooks, CELT-only poss for music. HE-AAC @32kbps is stereo full-band but with annoying artifacts.
|-
!39 to 40 kbps
|stereo
|12 kHz
|hybrid/CELT
|Moderately good stereo, reasonably bright treble (c.f. stereo cassette)
|Stereo podcasts, audiobooks, very low bitrate music
|-
!48 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, some artifacts, rarely nasty
|Stereo podcasts, audiobooks, low bitrate music
|-
!64 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, nice sound, detectable differences to original (mostly 'not annoying')
|Music storage & streaming. Beat HE-AAC, Vorbis, MP3 in [http://people.xiph.org/~greg/opus/ha2011/ listening test]
|-
!96 kbps
|stereo
|20 kHz
|CELT
|Full bandwidth stereo music, good quality approaching transparency
|Music storage & high quality streaming.
|-
!112 kbps
|stereo
|20 kHz
|CELT
|Fairly close to transparency (needs more testing)
|Music storage & high quality streaming. Very low-latency stereo networked music performance/jam sessions at OK quality (see below table)
|-
!128 kbps
|stereo
|20 kHz
|CELT
|Very close to transparency (needs more testing). Most modern codecs competitive (AAC-LC, Vorbis, MP3)
|Music storage & streaming. Future download music sales.
|-
!256 kbps
|stereo
|20 kHz
|CELT
|Transparent with very low chance of artifacts (a few killer samples still detectable). Most old & new lossy codecs competitive.
|Music storage & streaming, dedicated limited-bandwidth audio links (e.g. wireless, [http://en.wikipedia.org/wiki/Bluetooth_profile#Advanced_Audio_Distribution_Profile_.28A2DP.29 A2DP-bluetooth] type links).
|-
!510 kbps
|stereo
|20 kHz
|CELT
|Maximum possible stereo bitrate target (actual rate often less than 510 for default frame size). Most old and new lossy codecs competitive, plus near-lossless [[lossyWAV]] and [[WavPack | WavPack lossy]]
|Music storage, dedicated limited-bitrate audio links (e.g. wireless, minimum latency high quality audio. LossyWAV and WavPack lossy are very competitive for storage, and WavPack lossy --blocksize=256 may be competitive with minimum latency mode also.
|-
!>510 kbps
| -
| -
| -
|Above Opus bitrate range allowed for stereo sources
|Settle for 510kbps or use [[lossless]], [[lossyWAV]], [[WavPack | WavPack lossy]] or lossy transform/subband codecs like [[Vorbis]], [[Musepack]] at very high settings.
|-
|}

===Lower latency versus quality/bitrate trade-off===
====Packet overhead in interactive applications====
For interactive use on the Internet or other packet-based networks, total bandwidth used will be subject to packet overhead. The more packet headers that are transmitted every second, the greater will be the overhead that is required. For this reason, Opus, while defaulting to 20.0ms frames, supports 60.0ms frames to reduce overhead when transporting low-bitrate SILK frames at the expense of greater latency, which may still be acceptable for speech, and also supports 10.0ms SILK frames to reduce latency somewhat at the expense of packet overhead.

In the CELT layer, which tends to operate at higher bitrates than SILK, 20.0ms frames are the default, but frames of 10.0ms, 5.0ms and 2.5ms are also possible, which directly increases the frame overhead by transmitting more packets per second to achieve lower latency. In addition, as we'll see below it also reduces the quality/bitrate tradeoff of the CELT layer itself.

None of the bitrates mentioned in this article account for the packet overhead.

====CELT layer latency versus quality/bitrate trade-off====
Unlike the SILK layer, which works on fixed 10.0ms blocks, 1, 2 or 6 of which can be combined into an Opus frame, the CELT layer is able to modify the encoding block lengths available to enable its use with shorter frames.

When the CELT layer uses 10.0ms, 5.0ms and 2.5ms frames instead of the default 20.0ms, it must use smaller transform block sizes to achieve this, thereby reducing frequency resolution in the MDCT compared to the default transform window, thus reducing encoding efficiency for tonal signals. To obtain the same frequency precision for a sound divided into shorter transform windows, improved amplitude precision is necessary, resulting in increased bitrate to obtain the same perceptual quality (or conversely lower quality at the same bitrate).

These reduced-latency modes remain efficient for transient signals, which use short blocks anyway.

In all modes, the algorithmic delay consists of the frame size plus an additional 2.5ms delay. The CELT layer requires 2.5ms for MDCT window overlap.

Xiph.org used matched PEAQ scores (approximate perceptual quality assessment made in software) for the CELT0.10 codec that was used as the basis of the CELT layer in the Opus reference release, which indicate the following [http://people.xiph.org/~xiphmont/demo/celt/demo.html#demo approximate equivalent settings] for stereo music.

{| class="wikitable" style="text-align:center"
|-
!Frame size
!Algorithmic delay
!Bitrate to match 64kbps@22.5ms delay
!fractional bitrate increase
|-
!20.0 ms
|22.5 ms
|64.0 kbps
|0.0 %
|-
!10.0 ms
|12.5 ms
|70.4 kbps
|10.0 %
|-
!5.0 ms
|7.5 ms
|84.8 kbps
|32.5 %
|-
!2.5 ms
|5.0 ms
|112.0 kbps
|75.0 %
|-
|}

N.B. This table is useful for streaming only. For music storage & delayed playback, latency reduction is not important and the default 20.0ms frame size is preferable.

== Hardware & Software Support ==

Much of this section is based heavily on the Jan 12th 2013 version of the '''Support''' section of the [http://en.wikipedia.org/wiki/Opus_%28audio_format%29 Wikipedia article], which is more likely to be kept updated and to provide links to further information about the supporting platforms.

The format and algorithms are openly documented and the reference implementation is published as free software. The reference implementation (Opus Audio Tools, opus-tools), consisting of separate encoders and decoders, is published under the terms of a BSD-like license. It is written in C programming language and can be compiled for hardware architectures with or without floating point unit. The accompanying diagnostic tool opusinfo reports detailed technical information about Opus files, including information on the standard compliance of the bitstream format. It is based on ogginfo from the vorbis-tools and therefore, unlike the encoder and decoder, available under the terms of version 2 of the GPL.

=== Commandline binaries ===
The commandline tools are available pre-compiled for the most popular operating systems at [http://opus-codec.org opus-codec.org]

=== VoIP software ===
* The voice-chat software Mumble supports Opus as its main codec.
* SIP softphones Phoner and PhonerLite support Opus
* The SIP and IAX2 client SFLphone is being fitted with Opus support.
* Integration of Opus into the Skype client is finished, although no version with Opus support has yet been published.
* TrueConf video conferencing solutions support Opus.
* Opus support is planned for Jitsi 2.0, together with VP8 video
* Empathy may use any format supported in GStreamer, including Opus.
* Line2 has replaced their current codec with Opus. Their iOS app will be the first to be released with the Opus. The Android app will follow later.
* CSipSimple supports Opus, Codec2, G.726 and G.722.1 with an additional plug-in.
* The voice-chat software TeamSpeak 3 supports Opus for voice and music in pre-release server 3.0.7-pre2 and beta client version 3.0.10

=== Web frameworks and browsers ===
* Opus support is mandatory for WebRTC implementations.
* Mozilla supports Opus beginning with version 15 of Firefox and Thunderbird, plus Seamonkey, which is uses shared codebase.
* Depending on the backend in use, Opera supports inline playback of embedded Opus files. Official support for Opus and WebRTC are on the development roadmap.
* Chromium and Google Chrome will have audio support as of version 25.
* Maxthon Cloud Browser

=== Streaming audio ===
* Icecast.
* Krad Radio
* Liquidsoap

=== Operating systems and desktop multimedia frameworks ===
* In Debian GNU/Linux the Opus development tools and supporting libraries can be installed from the preconfigured repositories in the next stable version ("wheezy") that is expected to be released in early 2013.
* For Microsoft Windows, there are DirectShow filters supporting Opus, including DC-Bass Source Mod and the LAV Filters.
* In GStreamer the integration of Opus support is complete.
* FFmpeg supports decoding and encoding Opus via the external library libopus.

=== Hardware support ===
* Support in [[Rockbox]] is available in the developer version. This means hardware support for a series of portable media players (including some products from the iPod series by Apple and Sansa, iriver and Archos devices) and with "Rockbox as an Application" (RaaA) also on Android devices.

=== Player software ===
* VLC media player supports Opus since version 2.0.4
* AIMP supports Opus natively as of version 3.20 build 1125 beta 1.
* [[foobar2000]] supports the format natively as of v1.1.14 beta 1.
* Mpxplay supports Opus (using a decoder DLL) as of v1.60 alpha 2
* Android has a number of player apps supporting Opus, including PowerAmp and others.

=== Other software ===
* CDBurnerXP
* MediaCoder
* Report-IT

== References & Notes ==

*{{note|homepage|a}}[http://opus-codec.org/ opus-codec.org homepage]
*{{note|FAQ|b}}[http://wiki.xiph.org/OpusFAQ Opus FAQ]
*{{note|RFC|c}}[http://tools.ietf.org/html/rfc6716 IETF RFC 6716]

[[Category:Codecs]]
[[Category:Lossy]]
[[Category:Encoder/Decoder]]

Opus

2013-01-13T00:42:14Z

Dynamic: /* Hardware & Software Support */ Edited copy of Wikipedia Support section (public domain) with references and wikilinks stripped out and some info added

Opus

2013-01-12T18:37:33Z

Dynamic: /* Audio players */

Opus

2013-01-10T18:18:00Z

Dynamic: /* Music encoding quality */ comma

Opus

2013-01-10T18:15:56Z

Dynamic: /* Music encoding quality */

Opus

2013-01-10T18:15:16Z

Dynamic: /* Speech encoding quality */ column heading

Opus

2013-01-10T18:13:39Z

Dynamic: Removed A2DP-type links from 510kbps use cases, as max A2DP bitrate is about 372 kbps.

LossyWAV

2013-01-10T15:49:30Z

Dynamic: Undo revision 23902 by Dynamic (talk)

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.3.0
| preview_release = <none>
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 1.3.0 release thread]<br />[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 1.3.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF_WAVE|WAV]] file format. Proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least_significant_bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaudio.org/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.3.0, Copyright (C) 2007-2011 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, not fully transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10; E=7.5; H=5; S=2.5; C=0; P=-2.5; X=-5).
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --adaptive <n/t> modify settings for Sebastian Gesemann's adaptive noise
shaping method. takes a parameter to set the order of the
FIR filter, (32<=n<=96; default=64; multiple of 8 only);
"OFF" to disable adaptive shaping; "NOWARP" to disable
default frequency warping;
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=6; default=3,
i.e. 32, 64 & 1024 samples. n=2, remove 32 sample FFT;
n>3 add 512; n>4, add 256; n>6, add 128) nb. FFT lengths.
stated are for 44.1/48kHz audio, higher sample rates will
automatically increase all FFT lengths as required.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(10000<=n<=20000; default=16000).
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0<=n<=16; default=3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
--scale <n> factor to scale audio by; (0.0625<n<=8.0; default=1).
-s, --shaping [n] enable fixed noise shaping, takes optional parameter [n]
to allow user defined shaping proportion (0.0<=n<=1.0),
otherwise default to quality setting dependent value.
Disables adaptive noise shaping.
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
7<=n<=28, limited to bits-per-sample - 4).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist enable frequency analysis display of input data.
-H, --histogram show sample value histogram (input, lossy and correction).
--longdist show long frequency distribution data (input/lossy/lwcdf).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%1" flac -d "%1" --stdout --silent|lossywav - --stdout --standard --stdinname "%1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag --fromfile "%1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c).

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.flac
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is : lossless or hybrid
Highest BPS mode supported: 24 </pre>

lossyTAK settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.tak
Parameters : /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wv
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wma
Parameters : /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|
c:\"program files"\bin\wmaencode - %d --codec lsl --ignorelength
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[Category:Software]]

LossyWAV

2013-01-10T15:48:39Z

Dynamic: /* External links */ added Category:Lossy

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.3.0
| preview_release = <none>
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 1.3.0 release thread]<br />[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 1.3.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF_WAVE|WAV]] file format. Proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least_significant_bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaudio.org/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.3.0, Copyright (C) 2007-2011 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, not fully transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10; E=7.5; H=5; S=2.5; C=0; P=-2.5; X=-5).
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --adaptive <n/t> modify settings for Sebastian Gesemann's adaptive noise
shaping method. takes a parameter to set the order of the
FIR filter, (32<=n<=96; default=64; multiple of 8 only);
"OFF" to disable adaptive shaping; "NOWARP" to disable
default frequency warping;
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=6; default=3,
i.e. 32, 64 & 1024 samples. n=2, remove 32 sample FFT;
n>3 add 512; n>4, add 256; n>6, add 128) nb. FFT lengths.
stated are for 44.1/48kHz audio, higher sample rates will
automatically increase all FFT lengths as required.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(10000<=n<=20000; default=16000).
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0<=n<=16; default=3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
--scale <n> factor to scale audio by; (0.0625<n<=8.0; default=1).
-s, --shaping [n] enable fixed noise shaping, takes optional parameter [n]
to allow user defined shaping proportion (0.0<=n<=1.0),
otherwise default to quality setting dependent value.
Disables adaptive noise shaping.
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
7<=n<=28, limited to bits-per-sample - 4).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist enable frequency analysis display of input data.
-H, --histogram show sample value histogram (input, lossy and correction).
--longdist show long frequency distribution data (input/lossy/lwcdf).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%1" flac -d "%1" --stdout --silent|lossywav - --stdout --standard --stdinname "%1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag --fromfile "%1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c).

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.flac
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is : lossless or hybrid
Highest BPS mode supported: 24 </pre>

lossyTAK settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.tak
Parameters : /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wv
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wma
Parameters : /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|
c:\"program files"\bin\wmaencode - %d --codec lsl --ignorelength
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[Category:Software]]
[[Category:Lossy]]

Opus

2013-01-10T15:37:59Z

Dynamic: /* References & Notes */ Added Category:Encoder/Decoder as article refers heavily to the reference implementation

Opus

2013-01-10T15:35:38Z

Dynamic: /* References & Notes */ Added Category:Lossy

Opus

2013-01-10T15:34:14Z

Dynamic: Added Category:Codecs to page

Opus

2013-01-10T13:58:03Z

Dynamic: /* CELT layer latency versus quality/bitrate trade-off */ clarification

Opus

2013-01-10T13:55:03Z

Dynamic: Removed {{stub}} and expanded content greatly - included bitrate/usage guidance tables (improvements welcome) - cleaned up references

LossyWAV

2013-01-08T13:33:05Z

Dynamic: /* Codec compatibility */ Added link to forum thread on bit-reduced Lord Of The Rings DVD-A ~~~~

{{Software Infobox
| name = lossyWAV
| logo =
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.3.0
| preview_release = <none>
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 1.3.0 release thread]<br />[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 1.3.0 development thread]
}}
lossyWAV is a [[Wikipedia:Free software|free]], [[lossy]] pre-processor for [[PCM]] audio contained in the [[RIFF_WAVE|WAV]] file format. Proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of (blocks of) samples which will then allow the FLAC lossless encoder to make use of its wasted bits feature. The aim is to transparently reduce audio bit depth (by making some lower significant bits ([[Wikipedia:Least_significant_bit|lsb]]'s) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. The added noise is adaptively shaped by default, however the user can select parameters to make the added noise fixed shaped or simply [[Wikipedia:white noise|white noise]]. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al. is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--high
!--standard
!--economic
!--portable
!--extraportable
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 627 kbit/s
| 548 kbit/s
| 477 kbit/s
| 442 kbit/s
| 407 kbit/s
| 353 kbit/s
| 311 kbit/s
|-
!Nick.C's Full Collection
| FLAC
| 882 kbit/s
| -
| -
| -
| -
| -
| -
| 307 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The --check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--quality insane: (-q I or -q 10) Highest quality preset, generally considered to be excessive;
*--quality extreme: (-q E or -q 7.5) Higher quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--quality high: (-q H or -q 5.0) High quality preset, midway between extreme and standard;
*--quality standard: (-q S or -q 2.5) Default preset, generally accepted to be transparent;
*--quality economic: (-q C or -q 0.0) Intermediate preset midway between standard and portable;
*--quality portable: (-q P or -q -2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]
*--quality extraportable: (-q X or -q -5.0) Lowest quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]].[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning for version 1.0.0 was performed on quality preset --standard with higher presets being more conservative. For versions 1.1.0, 1.2.0 and 1.3.0, tuning effort has been focused on the lowest quality preset in an effort to achieve an effective compromise between resultant bitrate and perceived quality. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

The upper frequency limit used in the calculation of minimum signal power varies, dependent on quality preset, in the range 15.159kHz to 16.682kHz

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders. At least one [http://www.hydrogenaudio.org/forums/index.php?showtopic=98609&hl= commercial DVD-A] uses constant bit-depth reduction with lower bit-depth on rear channels.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.3.0, Copyright (C) 2007-2011 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Process Description:

lossyWAV is a near lossless audio processor which dynamically reduces the
bitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noise
to the processed output. The amount of permissible added noise is based on
analysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, they
can be swamped by the added noise. This is usually inaudible, but the behaviour
can be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies and
forcing lossyWAV to keep the added noise level lower than the content at these
frequencies can increase the bitrate dramatically for no perceptible benefit.

The noise added by the process is shaped using an adaptive method provided by
Sebastian Gesemann. This method, as implemented in lossyWAV, aims to use the
signal itself as the basis of the filter used for noise shaping. Adaptive noise
shaping is enabled by default.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <t> where t is one of the following (default = standard):
I, insane highest quality output, suitable for transcoding;
E, extreme higher quality output, suitable for transcoding;
H, high high quality output, suitable for transcoding;
S, standard default quality output, considered to be transparent;
C, economic intermediate quality output, likely to be transparent;
P, portable good quality output for DAP use, may not be transparent;
X, extraportable lowest quality output, not fully transparent.

Standard Options:

-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.
-w, --writetolog create (or add to) lossyWAV.log in the output directory.

Advanced Options:

- take WAV input from STDIN.
-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-q, --quality <n> quality preset (-5.0<=n<=10.0); (-5=lowest, 10=highest;
default=2.5; I=10; E=7.5; H=5; S=2.5; C=0; P=-2.5; X=-5).
--, --stdout write WAV output to STDOUT.
--stdinname <t> pseudo filename to use when input from STDIN.

Advanced Quality Options:

-A, --adaptive <n/t> modify settings for Sebastian Gesemann's adaptive noise
shaping method. takes a parameter to set the order of the
FIR filter, (32<=n<=96; default=64; multiple of 8 only);
"OFF" to disable adaptive shaping; "NOWARP" to disable
default frequency warping;
-a, --analyses <n> set number of FFT analysis lengths, (2<=n<=6; default=3,
i.e. 32, 64 & 1024 samples. n=2, remove 32 sample FFT;
n>3 add 512; n>4, add 256; n>6, add 128) nb. FFT lengths.
stated are for 44.1/48kHz audio, higher sample rates will
automatically increase all FFT lengths as required.
-l, --limit <n> set upper frequency limit to be used in analyses to n Hz;
(10000<=n<=20000; default=16000).
--linkchannels revert to original single bits-to-remove value for all
channels rather than channel dependent bits-to-remove.
--maxclips <n> set max. number of acceptable clips per channel per block;
(0<=n<=16; default=3,3,3,3,3,2,2,2,2,2,1,1,1,0,0,0).
-m, --midside analyse 2 channel audio for mid/side content.
--nodccorrect disable DC correction of audio data prior to FFT analysis,
default=on; (DC offset calculated per FFT data set).
--scale <n> factor to scale audio by; (0.0625<n<=8.0; default=1).
-s, --shaping [n] enable fixed noise shaping, takes optional parameter [n]
to allow user defined shaping proportion (0.0<=n<=1.0),
otherwise default to quality setting dependent value.
Disables adaptive noise shaping.
--static <n> set minimum-bits-to-keep-static to n bits (default=6;
7<=n<=28, limited to bits-per-sample - 4).
-U, --underlap <n> enable underlap mode to increase number of FFT analyses
performed at each FFT length, (n = 2, 4 or 8, default=2).

Output Options:

--bitdist show distrubution of bits to remove.
--blockdist show distribution of lowest / highest significant bit of
input codec-blocks and bit-removed codec-blocks.
-d, --detail enable per block per channel bits-to-remove data display.
-F, --freqdist enable frequency analysis display of input data.
-H, --histogram show sample value histogram (input, lossy and correction).
--longdist show long frequency distribution data (input/lossy/lwcdf).
--perchannel show selected distribution data per channel.
-p, --postanalyse enable frequency analysis display of output and
correction data in addition to input data.
--sampledist show distribution of lowest / highest significant bit of
input samples and bit-removed samples.
--spread [full] show detailed [more detailed] results from the spreading/
averaging algorithm.
-W, --width <n> select width of output options (79<=n<=255).

System Options:

-B, --below set process priority to below normal.
--low set process priority to low.
-N, --nowarnings suppress lossyWAV warnings.
-Q, --quiet significantly reduce screen output.
-S, --silent no screen output.

Special thanks go to:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement his method as lossyWAV.

Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.

Sebastian Gesemann for the adaptive noise shaping method and the amount of
help received in implementing it and also for the basis of
the fixed noise shaping method.

Matteo Frigo and for libfftw3-3.dll contained in the FFTW distribution
Steven G Johnson (v3.2.1 or v3.2.2).

Mark G Beckett for the Delphi unit that provides an interface to the
(Univ. of Edinburgh) relevant fftw routines in libfftw3-3.dll.

Don Cross for the Complex-FFT algorithm originally used.</pre>

===Example drag 'n' drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist "%1" flac -d "%1" --stdout --silent|lossywav - --stdout --standard --stdinname "%1"|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag --fromfile "%1" "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===lossyWAV and FFTW===
Since version 1.2.0, lossyWAV has been compatible with [[Wikipedia:FFTW|FFTW]] although not dependent on it. Should the user wish to take advantage of the increased processing speed available when using FFTW (from superior FFT implementations), libfftw3-3.dll should be placed in a directory on the host computer which features on the path.

===lossyWAV and WINE===
The cause of lossyWAV's WINE incompatibility was found and removed during the development of 1.2.0 and retrospectively amended for 1.1.0b in a maintenance release (1.1.0c).

===lossyWAV and [[foobar2000]]===
Example [[foobar2000]] converter settings:

lossyFLAC settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.flac
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\flac - -b 512 -5 -f -o%d --ignore-chunk-sizes
Format is : lossless or hybrid
Highest BPS mode supported: 24 </pre>

lossyTAK settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.tak
Parameters : /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wv
Parameters: /d /c C:\"Program Files"\bin\lossywav - --quality standard --silent --stdout|
C:\"Program Files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWMALSL* settings:<pre>Encoder: C:\Windows\System32\cmd.exe
Extension : lossy.wma
Parameters : /d /c c:\"program files"\bin\lossywav - --quality standard --silent --stdout|
c:\"program files"\bin\wmaencode - %d --codec lsl --ignorelength
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

Enclose the element of the path containing spaces within double quotation marks ("), e.g. C:\"Program Files"\directory_where_executable_is\executable_name. This is a Windows limitation.

lossyWMALSL conversion uses WMAEncode.exe by lvqcl found [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=90519&view=findpost&p=767754 here].

===lossyWAV and EAC===
:''For example settings, see [[EAC and LossyWAV]].''

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. E.g. a lossless file you receive could be transcoded from [[MP3]] without your knowledge. To distinguish a lossyWAV file from lossless files it is recommended to use the extension .lossy.EXT where EXT is the original extension e.g. .lossy.flac

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact]]s
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=96635 lossyWAV 1.3.1 Delphi to C++ translation thread]
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=81002 lossyWAV 1.3.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=90104 lossyWAV 1.3.0 release thread] - Release of version 1.3.0 on 06 August 2011
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=65499 lossyWAV 1.2.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=77042 lossyWAV 1.2.0 release thread] - Release of version 1.2.0 on 16 December 2009
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008
----
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original MATLAB script to Delphi and evolution of the method
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008

[[Category:Software]]

AoTuV

2012-11-02T19:17:32Z

Dynamic: Updated out-of-date information and provided a little summary of version history up to beta 6.

{{title|aoTuV}}
'''aoTuV''' is an abbreviation for '''Aoyumi's Tuned Vorbis'''; it is third-party development/tuning of the [[Vorbis]] encoder.

aoTuV versions improves significantly on Vorbis quality: Most people agree '''aoTuV beta 4 (and newer)''' achieves [[transparency]] at -q 5.

Released in December 2005, '''aoTuV beta 4.51''' improved further on low bit-rate and after peer review was rebranded '''aoTuV Release 1''' with some reports that -q 1 (approximately 80 kbps) is good enough for streaming.

In June 2007, the '''aoTuV beta 5''' versions including 5.7 underwent peer review and superceded Release 1 as the HA recommended Vorbis encoders, improving the low bit-rate quality in relation to [[Noise normalization]] without sacrificing compression ratio.

'''aoTuV Beta 6''' versions released in 2011 made further improvements on pre-echo and post-echo handling, stereo mode decisions and noise normalization at low bitrates but have not been extensively peer-reviewed by the HydrogenAudio community.

See [[Recommended Ogg Vorbis]] page for more details.

== Links ==
* [http://www.geocities.jp/aoyoume/aotuv/ aoTuV's home page].
* [[Lancer]]: [[BlackSword]]'s accelerated version of aoTuV binaries, courtesy of the Ogg Vorbis Acceleration Project
* [http://www.hydrogenaudio.org/forums/index.php?showtopic=44681&hl= How to prounounce "aoTuV"]
* How to compile aoTuV under Linux: [[Compiling aoTuV]]

[[Category:Software]]
[[Category:Encoder/Decoder]]

Bit reservoir

2012-07-10T18:36:48Z

Dynamic: Rewritten example in bits per frame and kbps equivalent to avoid mixing bits per 'moment' and kilobits per second as if they were the same.

The term '''bit reservoir''' is used exclusively in the [[MP3]] specification.

[[CBR]] (and also to some degree [[ABR]]) uses a constant defined [[bitrate]]. Because that bitrate is taken into consideration at every frame, there will be certain moments of such complexity that they can't be properly encoded within the limitations of the chosen bitrate; they need a higher [[bitrate]] than the defined one. Therefore, the MP3 spec defines a bit reservoir, intended especially to allow transient sounds to be encoded better.

'''Example:'''

An MP3 file is to be encoded at 160 kbps CBR from CD source material, sampled at 44100 stereo samples per second.

Because each MP3 frame is 1152 samples long, allowing 160000 bits per second, each frame can contain 4179 bits of data (= 160000 x 1152 / 44100).

Imagine that a certain frame needs only 3056 bits to be properly encoded (as calculated by the [[Psychoacoustic#Psychoacoustic Model|psymodel]] and the settings of the encoder). This is equivalent to about 117 kbps momentary bitrate (=3056 x 44100 / 1152). 1123 bits are not used (4179 - 3056 = 1123). Those bits together with any unused bits in the next few frames can be saved to a reservoir for use in following frames that may require more than 4179 bits each.

To limit the stream's complexity, the maximum reservoir size is 4088 bits (511 bytes). If the maximum reservoir of 4088 bits is available in addition to the 4179 bits allocated by the 160 kbps CBR bitrate of our example, there are 8267 bits available, equivalent to a momentary bitrate of 316 kbps for one frame, but if used completely, no reservoir would be available to the very next frame, restricting that frame to 4179 bits (160 kbps), even if the psymodel deems that more than this is desirable. This limits the ability to cope with sustained passages of complex sound, but often proves adequate to cope with brief transients such as percussive sounds without setting the constant bitrate to, say, 320 kbps throughout the file.

With [[VBR]], the encoder can choose the needed framesize for each moment, again as defined by the psymodel and the quality settings. So VBR (e.g. in [[LAME]]) doesn't use bit reservoir nearly as much, but still may do so to collect bits that would otherwise be wasted to fill an available framesize. For example, our example frame requiring 3056 bits (117 kbps) might be stored in a 3343 bit frame (128 kbps) and donate the extra 287 bits to the reservoir, or it might be stored in a 2925 bit frame (112 kbps) using 131 bits from the existing bit reservoir from preceding frames to make up the shortfall.

[[Category:Technical]]

Original ReplayGain specification

2011-07-29T08:18:06Z

Dynamic: /* Replay gain */ minor typo 'This single can be used...' to 'This single gain can be used'

Although music is encoded to a digital format with a clearly defined maximum peak amplitude, and although most recordings are normalized to utilize this peak amplitude, not all recordings sound equally loud. This is because once this peak amplitude is reached, perceived loudness can be further increased through signal-processing techniques such as dynamic range compression and equalization.<ref>Source: Wikipedia - [http://en.wikipedia.org/wiki/Loudness_war Loudness war]</ref> Therefore, the loudness of a given album has more to do with the year of issue or the whim of the producer than the intended emotional effect. Because of this, a random play through a music collection can have one leaping for the volume control every other track.

There is a solution to this annoyance: within each audio file, information can be stored about what volume change would be required to play each track or album at a standard loudness, and players can use this "replay gain" information to automatically nudge the volume up or down as required.

The ReplayGain specification is a standard which defines an appropriate reference level, explains a way of calculating and representing the ideal replay gain for a given track or album, and provides guidance for players to make the required volume adjustment during playback. The standard also specifies a means to prevent clipping when the calculated replay gain exceeds the limits of digital audio, and it describes how the replay gain information is stored within audio files.

==Loudness measurement==
Loudness is a subjective measure of the intensity of sound. The correlation of perceived loudness to sound pressure level is determined by the peculiarities of the auditory system. ReplayGain attempts to model those peculiarities with the following measurement procedure.

===Loudness filter===
[[File:RG_Equal_loudness_all.gif‎|frame|Figure 1: Loudness filter target response (blue), high-pass response (green) and composite response (red)]]

The human ear does not perceive sounds of all frequencies as having equal loudness. For example, a full-scale sine wave at 1 kHz sounds much louder than a full scale sine wave at 100 Hz, even though the two have identical energy. To account for this, the signal is filtered by an inverted approximation of the equal loudness curves (sometimes referred to as Fletcher–Munson curves) which describe the sensitivity of the ear as a function of frequency. The desired filter response derived from the equal loudness curves is shown in figure 1 (blue).

At higher frequencies a 10th order IIR filter designed by MATLAB's "yulewalk" function is an excellent approximation to the target. This is cascaded with a 2nd order Butterworth high pass filter, with a high pass frequency of 150 Hz (Figure 1 [green]). The resulting combined response (Figure 1 [red]) is close to the target response, and is used by ReplayGain.

[[File:RG_IIR-filter.png|frame|Figure 2: IIR filter topology used by "yulewalk" and Butterworth filter components]]

The filter topology used for the components of the loudness filter is shown in figure 2. The filter coefficients for 48 and 44.1 kHz sample rates are given for the Butterworth and "yulewalk" components in tables 1 and 2 respectively. When using other sample rates, coefficients must be transformed to maintain the same filter response.

{| class="wikitable" style="text-align:center"
|+Table 1a: Butterworth filter coefficients (F<sub>s</sub>=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98621192462708
|-
| ''a(1)'' || 1.97223372919527 || ''b(1)'' || -1.97242384925416
|-
| ''a(2)'' || -0.97261396931306 || ''b(2)'' || 0.98621192462708
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 1b: Butterworth filter coefficients (F<sub>s</sub>=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.98500175787242
|-
| ''a(1)'' || 1.96977855582618 || ''b(1)'' || -1.97000351574484
|-
| ''a(2)'' || -0.97022847566350 || ''b(2)'' || 0.98500175787242
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2a: "Yulewalk" filter coefficients (F<sub>s</sub>=48 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.03857599435200
|-
| ''a(1)'' || 3.84664617118067 || ''b(1)'' || -0.02160367184185
|-
| ''a(2)'' || -7.81501653005538 || ''b(2)'' || -0.00123395316851
|-
| ''a(3)'' || 11.34170355132042 || ''b(3)'' || -0.00009291677959
|-
| ''a(4)'' || -13.05504219327545 || ''b(4)'' || -0.01655260341619
|-
| ''a(5)'' || 12.28759895145294 || ''b(5)'' || 0.02161526843274
|-
| ''a(6)'' || -9.48293806319790 || ''b(6)'' || -0.02074045215285
|-
| ''a(7)'' || 5.87257861775999 || ''b(7)'' || 0.00594298065125
|-
| ''a(8)'' || -2.75465861874613 || ''b(8)'' || 0.00306428023191
|-
| ''a(9)'' || 0.86984376593551 || ''b(9)'' || 0.00012025322027
|-
| ''a(10)'' || -0.13919314567432 || ''b(10)'' || 0.00288463683916
|-
|}

{| class="wikitable" style="text-align:center"
|+Table 2b: "Yulewalk" filter coefficients (F<sub>s</sub>=44.1 kHz)
|-
| colspan="2" |
| ''b(0)''
| 0.05418656406430
|-
| ''a(1)'' || 3.47845948550071 || ''b(1)'' || -0.02911007808948
|-
| ''a(2)'' || -6.36317777566148 || ''b(2)'' || -0.00848709379851
|-
| ''a(3)'' || 8.54751527471874 || ''b(3)'' || -0.00851165645469
|-
| ''a(4)'' || -9.47693607801280 || ''b(4)'' || -0.00834990904936
|-
| ''a(5)'' || 8.81498681370155 || ''b(5)'' || 0.02245293253339
|-
| ''a(6)'' || -6.85401540936998 || ''b(6)'' || -0.02596338512915
|-
| ''a(7)'' || 4.39470996079559 || ''b(7)'' || 0.01624864962975
|-
| ''a(8)'' || -2.19611684890774 || ''b(8)'' || -0.00240879051584
|-
| ''a(9)'' || 0.75104302451432 || ''b(9)'' || 0.00674613682247
|-
| ''a(10)'' || -0.13149317958808 || ''b(10)'' || -0.00187763777362
|-
|}

Input samples from the audio file to be analysed must be run in cascade manner through both of these filter components before being analysed further.
<br style="clear:both" />

===RMS level calculation===
Next, the energy during each moment of the signal is determined by calculating the Root Mean Square (RMS) of the filtered signal every 50ms.<ref>The block length of 50ms was chosen after studying the effect of values between 25ms and 1s. 25ms was too short to accurately reflect the perceived loudness of some sounds. Beyond 50ms there was little change (after statistical processing). For this reason, 50ms was chosen.</ref>

The signal is chopped into 50ms long blocks. Then, for each block:<ref>If these steps are read backward, it should be clear why the process is called Root Mean Square averaging.</ref>
# Every sample value is squared (multiplied by itself).
# The mean average is taken.
# The square root of the average is calculated.

For stereo signals, in step 3, the mean average of all squared samples from both channels over the 50ms measurement interval is taken.<ref>One could sum channels of a stereo signal to mono before calculating the RMS level, but then any out-of-phase components (having the opposite signal on each channel) would cancel out to zero (i.e. silence). That's not how humans perceive them, so it's not a good solution.</ref>

The result of this calculation is then converted to a decibel representation as follows:

:<math>L=20 \log_{10} \frac{2{L_{RMS}}}{L_{p-p}}</math>

Where:

:<math>L_{RMS}</math> is the RMS value calculated above
:<math>L_{p-p}</math> is the maximum peak-to-peak range of the samples in the audio file

===Statistical processing===
Where the average energy level of a signal varies with time, the louder moments contribute most to perception of overall loudness. For example, in human speech, over half the time is silence, but the perceived loudness of speech is primarily determined by the levels between silences.

A good method to determine the overall perceived loudness is to sort the RMS values into numerical order, and then pick a value near the top of the list. For highly compressed pop music (e.g. Figure 5(c), where there are many values near the top), the choice makes little difference. For speech and classical music (Figures 5(a) and 5(b) respectively), the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is 95%,<ref>Based on experiments performed by David Robinson, "I tried values from 70% to 95%. For highly compressed pop music, the choice makes little difference. For speech and classical music, the choice makes a huge difference. The value which most accurately matches human perception of perceived loudness is around 95%, so this value is used by Replay Level."</ref> so this value is used by ReplayGain.

<gallery caption="Figure 5: Loudness histograms">
File:RG_Statistical_speech.gif‎‎|(a) Speech
File:RG_Statistical_classic.gif‎‎|(b) Classical music
File:RG_Statistical_pop.gif‎‎|(c) Pop music
</gallery>

==Reference level==
The audio industry does not have a standard for playback system calibration, but in the movie industry a calibration standard has been defined by the Society of Motion Picture and Television Engineers (SMPTE).<ref>SMPTE RP 200:2002 – Relative and Absolute Sound Pressure Levels for Motion-Picture Multichannel Sound Systems – Applicable for Analog Photographic Film Audio, Digital Photographic Film Audio and D-Cinema</ref> The standard states that a single channel pink noise signal with an RMS level of -20 dB relative to a full-scale sinusoid<ref>"dB relative to a full-scale sinusoid" is preferred over "dBFS" as a unit of measure in this specification because there is some ambiguity whether the reference for dBFS is a full-scale square wave (peak reference) or a sine wave (RMS reference).</ref> should be reproduced at 83 dB SPL.<ref>Measured using a C-weighted, slow averaging SPL meter.</ref>

ReplayGain adapts the SMPTE calibration concept for music playback. Under ReplayGain, audio is played so that its loudness, as measured using the procedures described in [[#Loudness measurement|Loudness measurement]] above, matches the loudness of a pink noise signal with an RMS level of -14 dB relative to a full-scale sinusoid,<ref>The initial ReplayGain proposal used the same -20 dB reference used by SMPTE. The reference was raised to -14 dB early on in ReplayGain development. This reference is used in all current ReplayGain implementations.</ref> also measured using the procedures described above.

In ReplayGain implementations, the reference level is described in terms of the SMPTE SPL playback level. By the SMPTE definition, the 83 dB SPL reference corresponds to -20FS dB system headroom. The -14 dB headroom used by ReplayGain therefore corresponds to an 89 dB SPL playback level on a SMPTE calibrated system and so is said to be operating with an 89 dB reference level.

SMPTE cinema calibration calls for a single channel of pink noise reproduced through a single loudspeaker. In music applications, the ideal level of the music is actually the loudness when both speakers are in use. So, ReplayGain is calibrated to two channels of pink noise.<ref>In reality, a monophonic pink noise wave file is used, and ReplayGain automatically assumes the file is being played through both speakers, as would any monophonic file.</ref>

==Gain calculation==
RG achieves loudness compensated playback by applying gain (or attenuation) dependent on the measured loudness of the audio file relative to the established reference level. The gain is calculated as follows:
:<math>RG=L_{n14}-L</math>
Where all quantities are expressed in decibels:
:<math>RG</math> is the replay gain adjustment,
:<math>L_{n14}</math> is the measured loudness of the -14 dB pink noise reference and
:<math>L</math> is the measured loudness of the audio file.

Replay gain is positive if the loudness of the audio file is lower than the pink noise reference. The gain is negative (representing an attenuation) if the loudness of the audio file is higher than that of the reference. The gain is stored as metadata with the audio file as described below and is used by players to adjust output volume of tracks as they are played as described in [[#Player requirements|Player requirements]] below.

==Metadata==
For ReplayGain to do its work during playback, four values must be stored as metadata<ref>Metadata is "data about data." For example, the ID3 ''de facto'' standard provides a way to store artist, title, album title, track number, and other metadata in data blocks called "tags" immediately before or after the audio data in an MP3 file. Other metadata storage/tagging standards and conventions exist for other audio file formats.</ref> with or within the audio file:
# Peak track amplitude
# Peak album amplitude
# Track replay gain
# Album replay gain

If calculated for an individual track, the loudness measurement (as specified above) yields track replay gain. If calculated on an album basis, with all tracks concatenated to make one long audio file, the loudness measurement yields album replay gain.

===Replay gain===
Under some listening conditions, it's useful to have every track sound equally loud. The problem with a track-by-track approach is that tracks which should be quiet in the context of the album on which they reside will be brought up to the level of all the rest. For casual listening, or in a noisy background, this can be a good thing. For serious listening, it does not respect the intent of the artist or mastering engineer; a tender ballad track will be blasting at the same loudness as a hard rock track on the same album. It's generally ideal to leave the intentional loudness differences between tracks in place, yet still correct for unmusical and annoying loudness differences between albums. To accomplish this, ReplayGain suggests that two different gain adjustments should be stored as metadata with each sound file.

''Album replay gain'' represents the ideal listening gain for an entire album. ReplayGain reads the collection of tracks that comprise a album, and calculates a single replay gain for the whole set. This single gain can be used for playback of all tracks of the album. Intentionally quiet tracks then stay appropriately quieter than the rest. It still solves the basic problem (annoying, unwanted level differences between discs) because quiet or loud discs are still adjusted overall—so the pop CD that's 20 dB louder than the classical CD will be brought into line.

===Peak amplitude===
Scanning a track or album for the peak amplitude can be a time-consuming process. Therefore, it's helpful if this single value is stored as metadata. This is used to predict whether the required replay gain adjustment will cause clipping during playback.

The maximum peak amplitude value is stored as a floating point number, where 1.0 represents digital full scale. As with replay gain values, separate peak amplitude values are stored per track and per album.

For uncompressed files simply, scanners store the maximum absolute sample value held in the file on any channel for positive or negative excursion. The single sample value should be converted to a floating-point representation, such that digital full scale is equivalent to a value of 1.0.

Psychoacoustically coded audio, such as MP3, does not exist as a sequence of samples until it is decoded. Psychoacoustic coding of a heavily limited file can lead to sample values larger than digital full scale upon decoding. The coded files must be decoded using a fully compliant decoder that allows peak overflows (i.e. has headroom) and may result in peak amplitude values greater than 1.0.

==Metadata format==
From the standpoint of metadata storage, each audio file format presents a unique situation. There are three favored schemes defined for storage of ReplayGain metadata: '''ID3v2''', '''Vorbis comments''' and '''APEv2'''. A survey of file formats is listed below with metadata schemes in order of preference for each:
* .aac (Advanced Audio Coding raw format) – No metadata support (use .mp4 instead)
* .aiff, .aif, .aifc (Apple Interchange File Format) – '''ID3v2''' (in "ID3" IFF chunk)
* .ape, .apl (Monkey's Audio) – '''APEv2'''
* .bwf (Broadcast Wave Format) – '''ID3v2''' (in RIFF chunk)
* .flac (Free Lossless Audio Codec) – '''Vorbis comments'''
* .mp3 (MPEG audio layer 3) – '''ID3v2''', LAME VBR proposed tag specification
* .mp4 also .m4a, .m4b, .m4p, m4r (MPEG-4 Part 14) – '''ID3v2''' (in "ID32" box)
* .mpc (Musepack) – '''APEv2'''
* .ogg (Ogg Vorbis) – '''Vorbis comments'''
* .tta (True Audio) – '''ID3v2''', '''APEv2'''
* .wma (Windows Media audio) - Advanced Systems Format (not supported by ReplayGain)
* .wav (Windows PCM) – No metadata support (use .bwf instead)
* .wv (WavePak) – '''APEv2'''

===ID3v2===
The ID3v2 standard<ref>The ID3v2 format is explained at [http://www.id3.org/ www.id3.org]. The most useful document is the [http://www.id3.org/id3v2.3.0.html ID3v2 v2.3.0 standard]. Although this document has been superseded by v2.4.0, the earlier document is complete (rather than an update), and in indexed HTML form. As such, it represents a better technical introduction to ID3v2.</ref> defines a ''tag'' which is situated before the data in an MP3 file.<ref>The original ID3 (v1) tags resided at the end of the file, and contained a few fields of information. The ID3v1 tag is not extensible and therefore cannot support ReplayGain metadata.</ref> ID3 is used primarily with MP3 audio files but means of adapting the system to other file types have been developed.

The ID3v2 tag is divided into ''frames''. The preferred means of storing ReplayGain metadata is use of ''TXXX'' key/value pair frames. Two other legacy schemes for storing ReplayGain metadata exist: [[ReplayGain_legacy_metadata_formats#ID3v2_RGAD|RGAD]] and [[ReplayGain_legacy_metadata_formats#ID3v2_RVA2|RVA2]]. These formats are documented in the [[ReplayGain legacy metadata formats|appendix]]. Players may choose to look for these formats if metadata in the ''TXXX'' format is not found in the ID3v2 tag. New scanners may write these older formats in addition to the newer (TXXX) ones if they wish to remain backwards compatible with older players.

ReplayGain uses four TXXX frames. The header of a TXXX frame is coded as follows:

Frame ID $54 58 58 58 ("TXXX")
Size $xx xx xx xx (size of frame excluding this header)
Flags $40 $00 (discard frame if audio data is altered)

Frame data is coded as follows:

Text encoding $00 (ISO-8859-1 encoding)
Description <key string> $00
Value <value string>

The four frames associated with ReplayGain metadata use the following key/value pairs

{| class="wikitable"
|+Table 3: Metadata keys and value formatting
|-
!Metadata
!Key
!Value format
|-
|Track replay gain
|REPLAYGAIN_TRACK_GAIN
|[-]a.bb dB
|-
|Peak track amplitude
|REPLAYGAIN_TRACK_PEAK
|c.dddddd
|-
|Album replay gain
|REPLAYGAIN_ALBUM_GAIN
|[-]a.bb dB
|-
|Peak album amplitude
|REPLAYGAIN_ALBUM_PEAK
|c.dddddd
|}

Gains are specified textually in decibels. Negative gains (attenuation) are prefixed with a '-'. Positive gains have no prefix. Integral portion of the gain (a) may be one or two numeric (0-9) digits. If there is no integral portion the field is '0'. The decimal portion of the gain (bb) is two numeric digits. Gains are suffixed with a space followed by 'dB'.

Peak levels are specified textually as a positive decimal. Peak level is a dimensionless quantity with 1.000000 representing full scale. No suffix is included on peak values. The integer field (c) is typically 1 or 0. Six numeric digits in the decimal field (dddddd) is adequate to accurately represent peak values for 16-bit audio data.

A robust player should be prepared to parse the following variations in either replay gain or peak level metadata:
*Positive gains with leading '+'
*More or fewer significant digits than specified in any field
*Leading zeros or spaces in integer fields
*Missing or malformed 'dB' suffix (e.g. no space between numeric digits and suffix, alternate capitalization)
*Alternate capitalization of keys

Other formatting errors indicate more severe problems and should result in player ignoring data as if the frame did not exist.

===Vorbis comments===
A Vorbis comment<ref>[http://www.xiph.org/vorbis/doc/v-comment.html Vorbis comment metadata format]. ReplayGain metadata is documented on the [http://wiki.xiph.org/VorbisComment#Replay_Gain Xiph Wiki].</ref> uses an ASCII <tt>key=value</tt> format. When Vorbis comments are used, the four ReplayGain metadata items are stored as separate comments. The ''keys'' and formatting for ''values'' is the same as specified for ID3v2. Keys and values are required by the Vorbis comment specification to b separated by '=' (equal character).

===APEv2===
The APEv2 metadata format<ref>[http://wiki.hydrogenaudio.org/index.php?title=APEv2_specification APEv2 Specification at Hydrogen Audio Wiki]</ref> also organizes data into key/value pairs. Keys are ASCII format. A flags field allows support for several value formats including UTF-8 and binary. Under APEv2, ReplayGain meta data is stored using the same keys and data as ASCII values in the same format as specified for ID3v2.

==Player requirements==
[[File:RG_Player_control.gif‎|frame|Figure 8: Example ReplayGain control panel]]

Loudness normalization, pre-amplification and clipping prevention are the operations performed by a ReplayGain player.

===Loudness normalization===
To properly normalize loudness, the player needs to determine if the user desires Track style level normalization (all tracks same loudness), or Album style level normalization (all albums same loudness, tracks of an album played at the same relative level as on the original release). This option should be selectable in the ReplayGain control panel (Figure 8). The player reads the corresponding gain metadata value from the file and scales the audio data as appropriate. Scaling the audio data simply means multiplying each sample value by a constant value. This constant is given by:

:<math>10^\frac{gain}{20}</math>

Or, in words, replay gain divided by 20 all raised to the power of ten.<ref>After any such operation, it's a good idea to dither the result. If this calculation and the pre-amp are implemented separately, then dither should only be added to the final result, just before the result is truncated back to 16 bits, or 24, or 8, as limited by the soundcard—not the file (i.e. after ReplayGain adjustment, an 8-bit file should be sent to a 16-bit soundcard at 16-bits).</ref>

If the file only contains one of the replay gain adjustments (e.g. Album) but the user has requested the other (Track), then the player should use the one that is available (in this case, Album). If neither (Track or Album) gain metadata is available, then the player needs to choose a suitable default gain. Potential choices include unity gain (0 dB) or an average of gains from other tracks in the album or playlist.

===Pre-amplification===
Although the calibration level used by ReplayGain suggests that the average level of an audio track should be 14 dB below full scale, some pop music is dynamically compressed to peak at 0 dB and average around 3 dB below full scale. This means that, when the replay gain is applied, the level of such tracks will be reduced by 11 dB! If users are listening to a mixture of highly compressed and more dynamic tracks, ReplayGain will make the listening experience more pleasurable by bringing the level of the compressed tracks down into line with that of the others. However, if users are only listening to highly compressed music, then they may complain that all their files are now too quiet.<ref>This problem can be especially noticeable on portable players with limited output or gain.</ref>

To address this problem, a pre-amp feature should be incorporated into the player. A user-supplied pre-amp setting is an adjustment to the calculated replay gain. It should default to perform no adjustment. This means that casual users will experience a moderate reduction in the loudness of their compressed pop music. Less-compressed material can generally be played at the same loudness without clipping. Normalization of more dynamic material may cause clipping or invoke the [[#Clipping prevention|clipping prevention]] mechanism (see below). Power users and audiophiles can reduce the pre-amp gain to enjoy the full dynamic range of all of their music.

If enabled, the player should read the user selected pre-amp gain, and scale the audio signal by the appropriate amount. For example, a +6 dB gain requires a scale of 10<sup>6/20</sup>, which is approximately 2. The replay gain and pre-amp scale factors can be combined<ref>Scale factors in Decibel units are added to produce the same effect as multiplying scale factors in linear units.</ref> for simplicity and ease of processing.

===Clipping prevention===
ReplayGain's suggestion of a -14 dB average playback level leaves sufficient headroom for the bulk of modern recordings. Nevertheless, there exists the possibility that after application of replay gain and pre-amp adjustment, a track may exceed full scale during its dynamic peaks. Without intervention, this will result in clipping, a severe form of distortion. Factors introducing the possibility of clipping include:

# Recordings from certain genres and certain periods in the history or commercial recordings require additional headroom. Although these recordings can be accommodated through a downwards adjustment of the pre-amp setting, it may be difficult to determine a safe adjustment and it may be undesirable to lower average level to accommodate the rare track which requires it.
# ReplayGain will make loud dynamically compressed tracks quieter, and quiet dynamically uncompressed tracks louder. The average levels will then be similar, but the quiet tracks will actually have louder peaks. If the user pushes the pre-amp gain upwards the peaks of the (originally) quieter tracks will be pushed well over full scale.
# In coded audio (e.g. MP3 files) a file that was hard-limited to digital full scale before encoding will often be pushed over the limit by the psychoacoustic compression. A decoder with headroom can recover the over full scale signal by reducing the gain.

ReplayGain suggests two possible solutions which prevent clipping in these situations. A player should support one or both of these.

====Audio limiting====
In situation 2 above, the user clearly wants all the music to sound very loud. To give them their wish, any signal which would peak above digital full scale should be hard limited at just below digital full scale. This is also useful at lower pre-amp gains, where it allows the average level of classical music to be raised to that of pop music, without distorting. The exact type of nature limiting or compression an implementation choice for the player.<ref>Something like the Hard Limiter found in Cool Edit Pro (Syntrillium) would be appropriate for pop music at least.</ref>

====Reduced gain====
The audiophile user will not want any compression or limiting on the signal. In this case the only option is to automatically and temporarily reduce the pre-amp gain below the user-selected setting for tracks where clipping would otherwise occur. Clipping can be predicted by examining the peak level of the track or album being played.

The player must read the peak amplitude metadata. If peak level metadata is unavailable, the player should assume a peak level of 1.0. If the peak level for both track and album is stored as metadata in the file, it is possible to calculate if, following the replay gain adjustment and pre-amp gain, the signal will clip at some point. If it won't, then no further action is necessary.

An overall scale factor for loudness normalization taking into account replay gain, pre-amp setting and clipping prevention through gain reduction is given below.

:<math>min( 10^\frac{RG + G_{pre-amp}}{20}, \frac{1}{peak amplitude} )</math>

===Hardware implementation===
The above three steps are appropriate to software players operating on the digital signal in order to scale it. However, it is possible to send the digital signal to the DAC without level correction, and to place an attenuator in the analogue signal path. The attenuator can then be driven by the Replay Gain value. The clipping problem can be addressed by providing adequate headroom in the analog circuitry. Bit transparency and maximum signal to noise ratio is maintained in the digital signal and DAC process.<ref>A system using today's 24-bit converters is unlikely to appreciate any overall gain in system performance with such an arrangement. A digitally-controlled analog gain element typically introduces significant noise and distortion.</ref>

==Acknowledgements==
The [http://replaygain.hydrogenaudio.org/proposal original ReplayGain proposal] (an [http://replay.waybackmachine.org/20090306202649/http://www.replaygain.org/ archive] is also available) was developed by David Robinson and was published 10 July 2001. Additional updates were published by David Robinson through 10 October 2001.

The following acknowledgement was included with the original proposal, "The algorithm to calculate an ideal replay gain has grown from my research into human hearing, with many additional ideas drawn from the work of E. Zwicker, and Brian Moore. I am currently completing my PhD at the University of Essex, and have been funded by the EPSRC." Additionally David Robinson credited Glen Sawyer (Snelg) and Jim Casaburi (Walrus) for software contributions and Bob Katz and Matt Ashland for ideas.

This updated ReplayGain specification reflecting current and recommended practice was prepared by Kevin Gross in 2011.

==Contact==
For ReplayGain-related questions or contributions, please post in the [http://www.hydrogenaudio.org/forums/index.php?showforum=1 General Audio] section of the Hydrogen Audio forums.

==Appendix==
# [[ReplayGain legacy metadata formats]]

==Notes==
<references />

LossyWAV

2008-12-18T20:02:37Z

Dynamic: Modified "website" link in Software Infoxbox to version 1.1.0 release thread instead of 1.0.0 (which is out of date and doesn't support the --stdout features needed for EAC anf FB2K integration)

{{Software Infobox
| name = lossyWAV
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.1.0b
| preview_release = 1.1.1e
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=64666&view=findpost&p=577042 Hydrogenaudio]
}}
lossyWAV is a new free lossy pre-processor for [[PCM]] audio contained in the [[WAV]] file format. Proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson], it reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyWAV is based on the lossyFLAC idea proposed by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 David Robinson] at Hydrogenaudio, which is a method of carefully reducing the bitdepth of samples, therefore utilising the wasted bits feature of the FLAC lossless codec. The aim is to transparently reduce audio bit depth (by making some lower significant bits (lsb's) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick Currie] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 Horst Albrecht] at Hydrogenaudio. Although the current lossyWAV implementation has built on David's original method, the method itself still very much belongs to its author.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure variable bit-depth pre-processor in that the overall sample size remains the same after processing but the number of significant bits used for the samples in a codec-block can change on a block-by-block basis. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds [[Wikipedia:white noise|white noise]] to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set (16 bit / 44.1kHz)
!Codec
!lossless
!--insane
!--extreme
!--standard
!--portable
!-q 0
|-
!10 Album Test Set
| TAK
| 820 kbit/s
| 615 kbit/s
| 532 kbit/s
| 447 kbit/s
| 359 kbit/s
| 266 kbit/s
|-
!10 Album Test Set
| FLAC
| 854 kbit/s
| 632 kbit/s
| 548 kbit/s
| 463 kbit/s
| 376 kbit/s
| 285 kbit/s
|-
!10 Album Test Set
| Wavpack
| 852 kbit/s
| 641 kbit/s
| 563 kbit/s
| 481 kbit/s
| 390 kbit/s
| 296 kbit/s
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The -check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*--insane: (-q 10) Highest quality preset, generally considered to be excessive;
*--extreme: (-q 7.5) High quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--standard: (-q 5) Default preset, generally accepted to be transparent;
*--portable: (-q 2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]]. [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning has been performed on quality preset --standard with higher presets being more conservative. Quality preset --standard is generally accepted to be (and from testing so far is) transparent. If you find a track which --standard fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|}

* Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name.

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.1.0, Copyright (C) 2007,2008 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-I, --insane highest quality output, suitable for transcoding;
-E, --extreme high quality output, also suitable for transcoding;
-S, --standard default quality output, considered to be transparent;
-P, --portable good quality output for DAP use. Not considered to be fully
transparent, but considered fit for its intended purpose.

Standard Options:

-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-o, --outdir <t> destination directory for the output file(s).
-v, --version display the lossyWAV version number.

Special thanks:

David Robinson for the publication of his lossyFLAC method, guidance, and
the motivation to implement the method as lossyWAV.
Horst Albrecht for ABX testing, valuable support in tuning the internal
presets, constructive criticism and all the feedback.
Sebastian Gesemann for the noise shaping coefficients and help in using them
in the lossyWAV noise shaping implementation.
Don Cross for the Complex-FFT algorithm used.</pre>
===Example drag'n'drop batch file===
Simply drag the FLAC files onto this batch file and it will process, recode in FLAC and copy ALL of the tags from the input FLAC file, placing the output lossyFLAC file in the same directory as the input FLAC file. Requires flac.exe and [http://www.synthetic-soul.co.uk/tag/ tag.exe] to be somewhere on the path.
<pre>@echo off
:repeat
if %1.==. goto end
if exist %1 flac -d %1 --stdout --silent|lossywav - --stdout --standard --stdinname %1|flac - -b 512 -o "%~dpn1.lossy.flac" --silent && tag --fromfile %1 "%~dpn1.lossy.flac"
shift
goto repeat
:end</pre>

===Example Foobar2000 converter settings===
lossyFLAC settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension : lossy.flac
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|
c:\"program files"\bin\flac - -b 512 -5 -f -o%d
Format is : lossless or hybrid
Highest BPS mode supported: 24 </pre>

lossyTAK settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension : lossy.tak
Parameters : /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|
c:\"program files"\bin\takc -e -p2m -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24</pre>

lossyWV settings:<pre>Encoder: c:\windows\system32\cmd.exe
Extension : lossy.wv
Parameters: /d /c c:\"program files"\bin\lossywav - --standard --silent --stdout|
c:\"program files"\bin\wavpack -hm --blocksize=512 --merge-blocks -i - %d
Format is : lossless or hybrid
Highest BPS mode supported: 24</pre>

There is a known problem within foobar2000 (although more likely to do with cmd.exe itself) when running an executable within the cmd.exe command line from a path which includes spaces. The suggested fix for this is to enclose the element of the path which contains spaces within double quotation marks ("), e.g. c:\"program files"\directory_where_executable_is\executable_name

===Example EAC settings===
Please see [[EAC and LossyWAV]].

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. e.g. If a WAV file is encoded to mp3 and then transcoded to a lossless codec, how can this pre-processing be easily determined?

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact|artifacts]]
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522 Original lossyFLAC thread] - Introduction of the concept by David Robinson (Replay Gain developer) and initial development;

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=65499 lossyWAV 1.2.0 Development Thread] - Latest release candidate and beta version in the first post;

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 lossyWAV 1.1.0 development thread]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=64617 lossyWAV 1.1.0 release thread] - Release of version 1.1.0 on 12 July 2008;

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 lossyWAV Development thread] - Conversion of the original Matlab script to Delphi and evolution of the method;
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0 release thread] - Release of version 1.0.0b on 12 May 2008;

LossyWAV

2008-06-15T19:20:06Z

Dynamic: /* Frequently asked questions */

{{Software Infobox
| name = lossyWAV
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.0.0b
| preview_release = 1.0.1t
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 Hydrogenaudio]
}}
lossyWAV is a new free lossy pre-processor for [[PCM]] audio contained in the [[WAV]] file format. It reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyFLAC is an idea started by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 2Bdecided] at Hydrogenaudio, utilising the wasted bits feature of the FLAC lossless codec with the aim of transparently reducing audio bit depth (making some lower significant bits (LSB's) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick.C has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 halb27] at Hydrogenaudio.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure [[Wikipedia:variable bitrate|variable bitrate]] pre-processor. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[Wikipedia:fast Fourier transform|fast Fourier Transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 32, 64 & 1024 [[Wikipedia:Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds [[Wikipedia:white noise|white noise]] to the output, however the level of the added noise associated with the removal of a number of bits has been pre-calculated and the number of bits to remove will depend on the level of the noise floor of the codec-block in question. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set
!Version
!FLAC -8
!-q 10
!-q 9
!-q 8
!-q 7
!-q 6
!-q 5
!-q 4
!-q 3
!-q 2
!-q 1
!-q 0
|-
!53 sample "problem" set
| 1.0.0b
| 784 kbit/s
| 654 kbit/s
| 626 kbit/s
| 596 kbit/s
| 565 kbit/s
| 534 kbit/s
| 501 kbit/s
| 470 kbit/s
| 447 kbit/s
| 408 kbit/s
| 366 kbit/s
| 329 kbit/s
|}

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set
!Version
!FLAC -8
!--insane
!--extreme
!--standard
!--portable
!-q 0
|-
!55 sample "problem" set
| 1.0.1t
| 780 kbit/s
| 656 kbit/s
| 583 kbit/s
| 508 kbit/s
| 425 kbit/s
| 321 kbit/s
|-
!10 album test set
| 1.0.1t
| 854 kbit/s
| 632 kbit/s
| 548 kbit/s
| 462 kbit/s
| 376 kbit/s
| 285 kbit/s
|}
==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the --merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length 'fact' chunk into the WAV file immediately after the 'fmt ' chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV 'fact' chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The -check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*-q 10 to -q 8: Highest quality presets, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*-q 7 to -q 6: High quality presets, disc space-saving alternative to lossless archiving for large audio collections;
*-q 5: Default preset, generally accepted to be transparent;
*-q 4 to -q 0: DAP quality presets of reducing bitrate with reducing quality preset number, for usage on a compatible [[Wikipedia:Digital audio player|DAP]]. [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

New quality preset synonyms introduced at 1.0.1j/k (and -q <n> quality parameter moved to advanced settings):
*--insane: (-q 10) Highest quality preset, generally considered to be excessive;
*--extreme: (-q 7.5) High quality preset, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*--standard: (-q 5) Default preset, generally accepted to be transparent;
*--portable: (-q 2.5) DAP quality preset for use on a compatible [[Wikipedia:Digital audio player|DAP]]. [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning has been performed on quality preset -q 5 with higher presets being more conservative. Quality preset -q 5 is generally accepted to be (and from testing so far is) transparent. If you find a track which -q 5 fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Free Lossless Audio Codec|FLAC]]
| '''Yes'''
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| '''Yes'''
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| '''Yes'''
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[TAK]]
| '''Yes'''
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|-
! [[WavPack]]
| '''Yes'''
| --'''blocksize'''=512
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| '''Yes'''
| —
| lossy'''WMALSL'''
|}

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.
===Important note===
'''NB: when encoding using a lossless codec, please ensure that the block size of the lossless codec matches that of lossyWAV (default = 512 samples). If this is not done then the lossless encoding of the processed WAV file will (almost certainly) be larger than it would otherwise have been. This is achieved by adding the "Encoder Parameters" in the table above to the command line of the lossless codec in question.'''
===Bonus feature===
Another, possibly not obvious, feature of lossyWAV is that the processed output can be "transcoded" from one lossless codec to another lossless codec with absolutely no loss of quality whatsoever. This is solely due to the fact that lossyWAV output is designed to be losslessly encoded - something that lossless codecs do very well indeed.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.0.0b, Copyright (C) 2007,2008 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <n> quality preset (10=highest quality, 0=lowest bitrate;
-q 5 is generally accepted to be transparent)
default=-q 5.

Standard Options:

-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-N, --noclips set allowable number of clips / channel / codec block to 0;
default=3,3,3,3,2,1,0,0,0,0,0 (-q 0 to -q 10)
-o, --outdir <dir> destination directory for the output file(s).
-v, --version display the lossyWAV version number.

Special thanks:

David Robinson for the method itself and motivation to implement it.
Don Cross for the Complex-FFT algorithm used.
Horst Albrecht for valuable tuning input and feedback.</pre>
===Example Foobar2000 converter settings===
[[Image:lossyWAV_fb2k_CLI_Settings.PNG]]

===Example flossy.bat file called from Foobar2000===
<pre>
@echo off
z:\bin\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 --below --nowarnings --quiet
z:\bin\flac -5 -f -b 512 "%~N1.lossy.wav" -o"%~N2.flac"
del "%~N1.lossy.wav"
</pre>

==Frequently asked questions==
*'''Question:''' Why is the ".wav" file extension used?
*'''Answer:''' The ".wav" file extension is used because lossyWAV is a digital signal processor and not a codec. No decoding is required for any program to play a WAV file which has been processed with lossyWAV as it remains compliant with the RIFF WAVE format.

*'''Question:''' Why create a processor which means that I cannot be sure that a lossless file is truly lossless?
*'''Answer:''' Unless one creates the lossless file personally, one can '''never''' be completely sure that the file is indeed lossless. e.g. If a WAV file is encoded to mp3 and then transcoded to a lossless codec, how can this pre-processing be easily determined?

*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Do I need to re-process to change lossless codecs?
*'''Short answer:''' No.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' At preset --standard, almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact|artifacts]]
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522&st=0 Original lossyFLAC thread] Where David Robinson (Replay Gain developer) introduces the method and a MATLAB implementation.
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 Original development thread up to 1.0.0b release]
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0b release thread]

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 Current development thread] You will find the latest release candidate and latest beta version in post #1 of this thread.

[[Category:Software]]
[[Category:Encoder/Decoder]]
[[Category:Lossy]]

LossyWAV

2008-05-18T14:37:32Z

Dynamic: /* External links */

{{Software Infobox
| name = lossyWAV
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.0.0b
| preview_release =
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 Hydrogenaudio]
}}
lossyWAV is a new free lossy pre-processor for [[PCM]] audio contained in the [[WAV]] file format. It reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyFLAC is an idea started by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 2Bdecided] at Hydrogenaudio, utilising the wasted bits feature of the FLAC lossless codec with the aim of transparently reducing audio bit depth (making some lower significant bits (LSB's) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick.C has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 halb27] at Hydrogenaudio.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure [[variable bit rate]] pre-processor. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[fast Fourier transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 64 & 1024 [[Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the added noise has been pre-calculated and its level will be at or below the noise floor of the codec_block in question. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set
!Version
!FLAC -8
!-q 10
!-q 9
!-q 8
!-q 7
!-q 6
!-q 5
!-q 4
!-q 3
!-q 2
!-q 1
!-q 0
|-
!53 sample "problem" set
| 1.0.0
| 784kbps
| 654kbps
| 626kbps
| 596kbps
| 565kbps
| 534kbps
| 501kbps
| 470kbps
| 447kbps
| 408kbps
| 366kbps
| 329kbps
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the -merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length FACT chunk into the WAV file immediately after the FMT chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV FACT chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The -check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*-q 10 to -q 8: Highest quality presets, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*-q 7 to -q 6: High quality presets, disc space-saving alternative to lossless archiving for large audio collections;
*-q 5: Default preset, generally accepted to be transparent;
*-q 4 to -q 0: DAP quality presets of reducing bitrate with reducing quality preset number, for usage on a compatible [[Wikipedia:Digital audio player|DAP]]. [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning has been performed on quality preset -q 5 with higher presets being more conservative. Quality preset -q 5 is generally accepted to be (and from testing so far is) transparent. If you find a track which -q 5 fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Free Lossless Audio Codec|FLAC]]
| Yes
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| Yes
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| Yes
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[TAK]]
| Yes
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|-
! [[WavPack]]
| Yes
| --'''blocksize'''=512
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| Yes
| —
| lossy'''WMALSL'''
|}

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.0.0b, Copyright (C) 2007,2008 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <n> quality preset (10=highest quality, 0=lowest bitrate;
-q 5 is generally accepted to be transparent)
default=-q 5.

Standard Options:

-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-N, --noclips set allowable number of clips / channel / codec block to 0;
default=3,3,3,3,2,1,0,0,0,0,0 (-q 0 to -q 10)
-o, --outdir <dir> destination directory for the output file(s).
-v, --version display the lossyWAV version number.

Special thanks:

David Robinson for the method itself and motivation to implement it.
Don Cross for the Complex-FFT algorithm used.
Horst Albrecht for valuable tuning input and feedback.</pre>
===Example Foobar2000 converter settings===
[[Image:lossyWAV_fb2k_CLI_Settings.PNG]]

===Example flossy.bat file called from Foobar2000===
<pre>
@echo off
z:\bin\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 --below --nowarnings --quiet
z:\bin\flac.exe -5 -f -b 512 "%~N1.lossy.wav" -o"%~N2.flac"
del "%~N1.lossy.wav"
</pre>

==Frequently asked questions==
*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' Almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact|artefacts]]
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522&st=0 Original lossyFLAC thread] Where David Robinson (Replay Gain developer) introduces the method and a MATLAB implementation.

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0b release thread]

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 Current development thread] You will find the latest release candidate and latest beta version in post #1 of this thread.

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 Old development thread for 1.0 release] The release candidates and beta versions are in post #1 of this thread.

[[Category:Software]]
[[Category:Encoder/Decoder]]
[[Category:Lossy]]

LossyWAV

2008-05-18T14:35:44Z

Dynamic: /* External links */

{{Software Infobox
| name = lossyWAV
| screenshot =
| caption =
| maintainer = [http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C]
| stable_release = 1.0.0b
| preview_release =
| operating_system = [[Wikipedia:Microsoft Windows|Windows]]
| use = [[Wikipedia:Digital signal processing|Digital signal processing]]
| license = [[Wikipedia:GNU General Public License|GNU GPL]]
| website = [http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 Hydrogenaudio]
}}
lossyWAV is a new free lossy pre-processor for [[PCM]] audio contained in the [[WAV]] file format. It reduces [[Wikipedia:Audio bit depth|bit depth]] of the input signal, which, when used in conjunction with certain lossless codecs, reduces the bitrate of the encoded file significantly compared to unpreprocessed compression.
lossyWAV's primary goal is to maintain [[transparency]] with a high degree of confidence when processing any audio data.

==History==
lossyFLAC is an idea started by [http://www.hydrogenaudio.org/forums/index.php?showuser=409 2Bdecided] at Hydrogenaudio, utilising the wasted bits feature of the FLAC lossless codec with the aim of transparently reducing audio bit depth (making some lower significant bits (LSB's) zero), consequently taking advantage of FLAC's detection of consistently-zeroed lower significant bits within each single frame and significantly increasing coding efficiency.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498179] In this way the user can enjoy audio encoded using the same codec (which may be all important from a hardware compatibility perspective) at a reduced bitrate compared to the lossless version.

[http://www.hydrogenaudio.org/forums/index.php?showuser=42400 Nick.C] ported the original [[Wikipedia:MATLAB|MATLAB]] implementation to [[Wikipedia:Borland Delphi|Delphi]] (Many thanks [[Wikipedia:CodeGear|CodeGear]] for Turbo Explorer!!) with a liberal sprinkling of [[Wikipedia:IA-32|IA-32]] and [[Wikipedia:x87|x87]] Assembly Language for speed.

Subsequently, lossyFLAC proved itself to work with other lossless codecs, so the application name was changed to lossyWAV.

Since then, Nick.C has heavily developed and built upon lossyWAV, with valuable tuning performed by [http://www.hydrogenaudio.org/forums/index.php?showuser=25015 halb27] at Hydrogenaudio.

==Indicative bitrate reduction==
It must be stressed that lossyWAV is a pure [[variable bit rate]] pre-processor. Bits-to-remove from the audio data are calculated on a block-by-block basis (codec-block length = 512 samples, 11.6msec @ 44.1kHz) using overlapping [[fast Fourier transform]] (FFT) analyses of at least two lengths (default quality preset (-q 5) = 64 & 1024 [[Sampling %28signal processing%29|samples]]). After some manipulation, the results of each FFT analysis for a specific codec-block are then grouped and the minimum value used to determine bits-to-remove for the whole codec-block. Bit removal adds noise to the output, however the added noise has been pre-calculated and its level will be at or below the noise floor of the codec_block in question. Each sample in the codec-block is then rounded such that the first <bits-to-remove> lsb's are zero. In this way the wasted bits feature of [[FLAC]] et al is exploited.

{| class="wikitable" style="text-align:center"
|-
!lossyWAV Test Set
!Version
!FLAC -8
!-q 10
!-q 9
!-q 8
!-q 7
!-q 6
!-q 5
!-q 4
!-q 3
!-q 2
!-q 1
!-q 0
|-
!53 sample "problem" set
| 1.0.0
| 784kbps
| 654kbps
| 626kbps
| 596kbps
| 565kbps
| 534kbps
| 501kbps
| 470kbps
| 447kbps
| 408kbps
| 366kbps
| 329kbps
|}

==File identification==
lossyWAV-processed WAV files are named with a double filename extension, .lossy.wav, to make them instantly identifiable. e.g. ".lossy.flac" would indicate an audio file which was processed using lossyWAV, and subsequently encoded using FLAC.[http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=498559]

The --correction parameter is used when processing to create a correction file which is named with the .lwcdf.wav double filename extension. When "added" to the corresponding .lossy.wav, using the -merge parameter, the original file will be reconstituted.

Combinations of lossyWAV with each specific encoder are referred to as lossy'''X''', where '''X''' is an abbreviation of the lossless codec name. Combination names are listed in the "[[LossyWAV#Known supported codecs|known supported codecs]]" section below.

lossyWAV inserts a variable-length FACT chunk into the WAV file immediately after the FMT chunk. This takes the form:<pre>fact/<size>/lossyWAV x.y.z @ dd/mm/yyyy hh:mm:ss, -q 5</pre>Where the version, date & time and user settings are copied. Additionally, if a lossyWAV FACT chunk is found in a file, the processing will be halted (exit code = 16) to prevent re-processing of an already processed file.

The -check parameter can be used to determine whether a file has previously been processed without trying to process it, exit code = 16 if already processed; exit code = 0 if not.

==Quality presets==
*-q 10 to -q 8: Highest quality presets, disc space-saving alternative to lossless archiving for large audio collections, considered to be suitable for transcoding to other lossy codecs;
*-q 7 to -q 6: High quality presets, disc space-saving alternative to lossless archiving for large audio collections;
*-q 5: Default preset, generally accepted to be transparent;
*-q 4 to -q 0: DAP quality presets of reducing bitrate with reducing quality preset number, for usage on a compatible [[Wikipedia:Digital audio player|DAP]]. [http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=56129&view=findpost&p=531316]

All tuning has been performed on quality preset -q 5 with higher presets being more conservative. Quality preset -q 5 is generally accepted to be (and from testing so far is) transparent. If you find a track which -q 5 fails to achieve transparency after processing, please post a sample (no more than 30 seconds) in the development thread.

==Supported input formats==
*[[WAV]]: 9-bit to 32-bit integer; 1 to 8 channels; sample rate ≥ 32kHz [[Pulse Code Modulation|PCM]]. Very high sample rates (>48kHz) have not been extensively tested. Tunings have been focussed on 16-bit, 44.1kHz samples (i.e. [[Wikipedia:Red Book (audio CD standard)|CD]] PCM).

==Codec compatibility==
{| class="wikitable" style="text-align:center"
|-
!Codec
!Supported
!Encoder parameters
!Combination name
|-
! [[Apple Lossless]]
| No
| —
| —
|-
! [[Free Lossless Audio Codec|FLAC]]
| Yes
| -'''5''' -'''b''' 512 --'''keep-foreign-metadata'''
| lossy'''FLAC'''
|-
! [[Lossless Audio|LA]]
| No
| —
| —
|-
! [[Lossless Predictive Audio Compression|LPAC]]
| Yes
| -'''b'''512
| lossy'''LPAC'''
|-
! [[Monkey's Audio]]
| No
| —
| —
|-
! [[Wikipedia:Audio Lossless Coding|MPEG-4 ALS]]
| Yes
| -'''l''' -'''n'''512
| lossy'''ALS'''
|-
! [[OptimFROG]]
| No
| —
| —
|-
! [[TAK]]
| Yes
| -'''fsl'''512
| lossy'''TAK'''
|-
! [[Wikipedia:TTA (codec)|TTA]]
| No
| —
| —
|-
! [[WavPack]]
| Yes
| --'''blocksize'''=512
| lossy'''WV'''
|-
! [[Windows Media Audio#Windows Media Audio Lossless|WMA Lossless]]
| Yes
| —
| lossy'''WMALSL'''
|}

There is also [http://www.hometheaterhifi.com/volume_8_4/dvd-benchmark-part-6-dvd-audio-11-2001.html#Meridian%20Lossless%20Packing%20(MLP)%20in%20a%20Nutshell evidence] — so-called "Bit Shifting" — to suggest that lossyWAV may work with [[Wikipedia:Meridian Lossless Packing|MLP]], but this remains untested due to prohibitive prices of encoders.

A comparison of portable media players is [[Wikipedia:Comparison of portable media players#Audio Formats|here]], which shows FLAC and WMA Lossless compatibility among listed players.
Any player supported by [http://www.rockbox.org Rockbox] can use FLAC or WavPack files after installing Rockbox.

==Using lossyWAV==
===Application settings===
<pre>
lossyWAV 1.0.0b, Copyright (C) 2007,2008 Nick Currie. Copyleft.

This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <http://www.gnu.org/licenses/>.

Usage : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <n> quality preset (10=highest quality, 0=lowest bitrate;
-q 5 is generally accepted to be transparent)
default=-q 5.

Standard Options:

-c, --check check if WAV file has already been processed; default=off.
errorlevel=16 if already processed, 0 if not.
-C, --correction write correction file for processed WAV file; default=off.
-f, --force forcibly over-write output file if it exists; default=off.
-h, --help display help.
-L, --longhelp display extended help.
-M, --merge merge existing lossy.wav and lwcdf.wav files.
-N, --noclips set allowable number of clips / channel / codec block to 0;
default=3,3,3,3,2,1,0,0,0,0,0 (-q 0 to -q 10)
-o, --outdir <dir> destination directory for the output file(s).
-v, --version display the lossyWAV version number.

Special thanks:

David Robinson for the method itself and motivation to implement it.
Don Cross for the Complex-FFT algorithm used.
Horst Albrecht for valuable tuning input and feedback.</pre>
===Example Foobar2000 converter settings===
[[Image:lossyWAV_fb2k_CLI_Settings.PNG]]

===Example flossy.bat file called from Foobar2000===
<pre>
@echo off
z:\bin\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 --below --nowarnings --quiet
z:\bin\flac.exe -5 -f -b 512 "%~N1.lossy.wav" -o"%~N2.flac"
del "%~N1.lossy.wav"
</pre>

==Frequently asked questions==
*'''Question:''' Is it [[Variable Bitrate|VBR]]?
*'''Short answer:''' Yes.

*'''Question:''' Is it [[transparency|transparent]]?
*'''Short answer:''' Almost certainly.

*'''Question:''' Is it [[lossless]]?
*'''Short answer:''' No.

*'''Question:''' Will it ever have a [[Constant Bitrate|CBR]] mode?
*'''Short answer:''' No.

*'''Question:''' Why should I use this?
*'''Answer:'''
:*high quality
:*extremely low chance of audible [[artifact|artefacts]]
:*reasonable [[bitrate]]s
:*usable with unmodified, established lossless formats.

==External links==
*[http://www.hydrogenaudio.org/forums/index.php?showtopic=55522&st=0 Original lossyFLAC thread] Where David Robinson (Replay Gain developer) introduces the method and a MATLAB implementation.

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63225 lossyWAV 1.0.0b release thread]

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=63254 Current development thread] You will find the latest release candidate and latest beta version in post #1 of this thread.

*[http://www.hydrogenaudio.org/forums/index.php?showtopic=56129 Old development thread for 1.0 release] The release candidates beta versions were added to post #1 of this thread.

[[Category:Software]]
[[Category:Encoder/Decoder]]
[[Category:Lossy]]

ABC/HR

2007-05-15T21:20:03Z

Dynamic: /* Difference with ABX */ changed "subjective bias" to "personal bias" as with previous minor edit in previous section

'''ABC/HR''' is an abbreviation of ABC/Hidden Reference.

It is a method to 'score' the quality of audio encodings by comparing them alongside a reference sample. The 'hidden' comes from the fact that the listener does not know which sample -- the left or the right one -- is the reference.

The reference used usually is a sample of higher quality than the tested samples, e.g. uncompressed/[[lossless|losslessly]] compressed audio tracks, or a [[lossy|lossily]] compressed track with higher [[bitrate]].

==Purpose==
Blind comparison and blind quality rating to remove the effects of personal bias and the [[placebo effect]].

==Difference with [[ABX]]==
[[ABX]] is used to detect audible differences blindly, thereby removing personal bias or the placebo effect and, over multiple trials, estimating the probability that the tester was guessing.

While an ABC/HR tool can do the same, it adds to that the ability to provide a quality rating on a standardized scale, and multiple participants' results can be statistically evaluated to estimate error bars and statistical significance of differences between encoders or encoder parameters in ranking their quality or tying them.

ABC/HR tends to find particular application in low-to-medium bitrate listening tests (below the quality expected to constitute [[transparency]]). As part of a well-designed [[listening test]], useful quality comparison can be made between a selection of encoders plus a high-quality anchor (high anchor) and low-quality anchor (low anchor) without the tester being aware of which encoder is being evaluated at any time.

===Standardized quality or impairment scale===

It is most common to use the 1.0 to 5.0 scale defined by [http://www.hydrogenaudio.org/forums/index.php?showtopic=53107 ITU-R BS.1116]. Any value (including fractions) between 1.0 and 5.0 is valid, with the exact whole number representing the following definitions to describe that degree of impairment:
* 5.0 : Imperceptible
* 4.0 : Perceptible, but not annoying
* 3.0 : Slightly annoying
* 2.0 : Annoying
* 1.0 : Very annoying

==References==

==Links==

{{stub}}

ABC/HR

2007-05-15T21:18:59Z

Dynamic: /* Purpose */

'''ABC/HR''' is an abbreviation of ABC/Hidden Reference.

It is a method to 'score' the quality of audio encodings by comparing them alongside a reference sample. The 'hidden' comes from the fact that the listener does not know which sample -- the left or the right one -- is the reference.

The reference used usually is a sample of higher quality than the tested samples, e.g. uncompressed/[[lossless|losslessly]] compressed audio tracks, or a [[lossy|lossily]] compressed track with higher [[bitrate]].

==Purpose==
Blind comparison and blind quality rating to remove the effects of personal bias and the [[placebo effect]].

==Difference with [[ABX]]==
[[ABX]] is used to detect audible differences blindly, thereby removing subjective bias or the placebo effect and, over multiple trials, estimating the probability that the tester was guessing.

While an ABC/HR tool can do the same, it adds to that the ability to provide a quality rating on a standardized scale, and multiple participants' results can be statistically evaluated to estimate error bars and statistical significance of differences between encoders or encoder parameters in ranking their quality or tying them.

ABC/HR tends to find particular application in low-to-medium bitrate listening tests (below the quality expected to constitute [[transparency]]). As part of a well-designed [[listening test]], useful quality comparison can be made between a selection of encoders plus a high-quality anchor (high anchor) and low-quality anchor (low anchor) without the tester being aware of which encoder is being evaluated at any time.

===Standardized quality or impairment scale===

It is most common to use the 1.0 to 5.0 scale defined by [http://www.hydrogenaudio.org/forums/index.php?showtopic=53107 ITU-R BS.1116]. Any value (including fractions) between 1.0 and 5.0 is valid, with the exact whole number representing the following definitions to describe that degree of impairment:
* 5.0 : Imperceptible
* 4.0 : Perceptible, but not annoying
* 3.0 : Slightly annoying
* 2.0 : Annoying
* 1.0 : Very annoying

==References==

==Links==

{{stub}}

ABC/HR

2007-05-01T23:15:29Z

Dynamic: /* Standardized quality scale */

'''ABC/HR''' is an abbreviation of ABC/Hidden Reference.

It is a method to 'score' the quality of audio encodings by comparing them alongside a reference sample. The 'hidden' comes from the fact that the listener does not know which sample -- the left or the right one -- is the reference.

The reference used usually is a sample of higher quality than the tested samples, e.g. uncompressed/[[lossless|losslessly]] compressed audio tracks, or a [[lossy|lossily]] compressed track with higher [[bitrate]].

==Purpose==
Blind comparison and blind quality rating to remove the effects of subjective bias and the [[placebo effect]].

==Difference with [[ABX]]==
[[ABX]] is used to detect audible differences blindly, thereby removing subjective bias or the placebo effect and, over multiple trials, estimating the probability that the tester was guessing.

While an ABC/HR tool can do the same, it adds to that the ability to provide a quality rating on a standardized scale, and multiple participants' results can be statistically evaluated to estimate error bars and statistical significance of differences between encoders or encoder parameters in ranking their quality or tying them.

ABC/HR tends to find particular application in low-to-medium bitrate listening tests (below the quality expected to constitute [[transparency]]). As part of a well-designed [[listening test]], useful quality comparison can be made between a selection of encoders plus a high-quality anchor (high anchor) and low-quality anchor (low anchor) without the tester being aware of which encoder is being evaluated at any time.

===Standardized quality or impairment scale===

It is most common to use the 1.0 to 5.0 scale defined by [http://www.hydrogenaudio.org/forums/index.php?showtopic=53107 ITU-R BS.1116]. Any value (including fractions) between 1.0 and 5.0 is valid, with the exact whole number representing the following definitions to describe that degree of impairment:
* 5.0 : Imperceptible
* 4.0 : Perceptible, but not annoying
* 3.0 : Slightly annoying
* 2.0 : Annoying
* 1.0 : Very annoying

==References==

==Links==

{{stub}}