Difference between revisions of "Create a long-term archive"

From Hydrogenaudio Knowledgebase
Jump to: navigation, search
m (Encoding: Rewording)
(Encoding: Removed misleading/incorrect section on single-file vs. individual track.)
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
''This is still being discussed in [http://www.hydrogenaudio.org/forums/index.php?showtopic=47612&hl= this HA thread]. Feel free to barge in. Result of the discussions will be generalized and put in this article.''
 
''This is still being discussed in [http://www.hydrogenaudio.org/forums/index.php?showtopic=47612&hl= this HA thread]. Feel free to barge in. Result of the discussions will be generalized and put in this article.''
  
 
+
== Why do we need a long-term archive? ==
==Why do we need a long-term archive?==
+
Prior to the invention of [[Compact Disc]]s, audio is stored in analog media such as vinyls, magnetic reels, and cassette tapes. These media are very prone to environmental damage; e.g. vinyls may be scratched, and magnetic media may be crinkled or demagnetized. Even the CD itself is not perfect. A scratch will damage the media, rendering audible errors into its audio tracks. And the error protection of [[Red Book]] CDs are not perfect; sometimes errors cannot be corrected, and it is up to the player to 'repair' by interpolation.
 
+
Prior to the invention of [[CD|Compact Discs]], audio is stored in analog media such as vinyls, magnetic reels, and cassette tapes. These media are very prone to environmental damage; e.g. vinyls may be scratched, and magnetic media may be crinkled or demagnetized. Even the CD itself is not perfect. A scratch will damage the media, rendering audible errors into its audio tracks. And the error protection of [[Red Book]] CDs are not perfect; sometimes errors cannot be corrected, and it is up to the player to 'repair' by interpolation.
+
  
 
On the other hand, audio encoding technology (and data storage technology) has progressed to the point where we can (somewhat) easily make perfect (or near-perfect) copy of all audio tracks in the world, with proper error-detection-and-correction.
 
On the other hand, audio encoding technology (and data storage technology) has progressed to the point where we can (somewhat) easily make perfect (or near-perfect) copy of all audio tracks in the world, with proper error-detection-and-correction.
  
So, for posterity (and personal enjoyment well into your old age), it is very plausible -- and feasible -- to create a long-term archive of your audio collection.
+
So, for posterity (and personal enjoyment well into your old age), it is very plausible and feasible to create a long-term archive of your audio collection.
 
+
 
+
==Considerations for creating a long-term archive==
+
  
 +
== Considerations for creating a long-term archive ==
 
Making a long-term archive is not something to be lightly undertaken. <u>You must plan it</u>. Here we attempt to provide you with a general guide to making your long-term audio archive.
 
Making a long-term archive is not something to be lightly undertaken. <u>You must plan it</u>. Here we attempt to provide you with a general guide to making your long-term audio archive.
  
 
+
=== Replication ===
===Replication===
+
 
+
 
First of all, you must decide on how to replicate (i.e. copy) your audio tracks into your computer.
 
First of all, you must decide on how to replicate (i.e. copy) your audio tracks into your computer.
  
If your source is an Audio CD, it's rather easy. Use the freely available [[secure ripper]] such as [[EAC]] or [[CDex]] to rip your CD into WAV (or [[lossless]]) files. Accuracy of ripping can be improved further by using the [[AccurateRip]] database (unfortunately, the AccurateRip database is not complete; your CD may not exist there).
+
If your source is an Audio CD, it's rather easy. Use the freely available [[secure ripper]] such as [[EAC]] or [[CDex]] to rip your Audio CD into WAV (or [[lossless]]) files. Accuracy of ripping can be improved further by using the [[AccurateRip]] database (unfortunately, the AccurateRip database is not complete; your disc may not exist there).
  
 
If your source is analog, then things get more complicated:
 
If your source is analog, then things get more complicated:
Line 30: Line 24:
 
# Finally, you must use a good wave-recording program to encode the pumped audio into a huge WAV file.
 
# Finally, you must use a good wave-recording program to encode the pumped audio into a huge WAV file.
  
 
+
=== Encoding ===
===Encoding===
+
 
+
 
Second consideration, is the encoding used. Although you can use WAV's (i.e. PCM) to store your replicated audio tracks, it is not recommended for several reasons:
 
Second consideration, is the encoding used. Although you can use WAV's (i.e. PCM) to store your replicated audio tracks, it is not recommended for several reasons:
  
* No tagging capability -- although information on the tracks may be stored in text files, it is much more practical to store them in tags within the encoded file itself.
+
* No tagging capability although information on the tracks may be stored in text files, it is much more practical to store them in tags within the encoded file itself.
* No error detection -- it is extremely vital for long-term archives; if your audio track develops an error, then you can discard it and (hopefully) restore it from a backup-of-backup.
+
* No error detection it is extremely vital for long-term archives; if your audio track develops an error, then you can discard it and (hopefully) restore it from a backup-of-backup.
* Big size -- since WAV's are uncompressed, storing audio tracks in WAV's will require a much greater amount of media.
+
* Big size since WAV's are uncompressed, storing audio tracks in WAV's will require a much greater amount of media.
  
 
Most of the modern-day [[lossless]] encoding should suffice. The most popular format for long-term archiving seems to be [[FLAC]], [[WavPack]], and [[Monkey's Audio]]. However, other formats may provide better compression.
 
Most of the modern-day [[lossless]] encoding should suffice. The most popular format for long-term archiving seems to be [[FLAC]], [[WavPack]], and [[Monkey's Audio]]. However, other formats may provide better compression.
Line 43: Line 35:
 
We strongly advise you to '''not use [[lossy]] encoding'''. With lossy encoding, you will not end up with a bit-by-bit identical archive.
 
We strongly advise you to '''not use [[lossy]] encoding'''. With lossy encoding, you will not end up with a bit-by-bit identical archive.
  
===Media===
+
=== Media ===
 
+
 
Third consideration, is what media to use. There are a lot of usable media out there, so let's go over them one-by-one.
 
Third consideration, is what media to use. There are a lot of usable media out there, so let's go over them one-by-one.
  
Line 77: Line 68:
 
* Expensive per megabyte
 
* Expensive per megabyte
 
|-
 
|-
!Hard disks
+
!Hard disks<br/>(NOT RAID)
 
|
 
|
 
* Honkin' big capacity
 
* Honkin' big capacity
Line 83: Line 74:
 
|
 
|
 
* Big & heavy
 
* Big & heavy
* Impractical -- rather complex to mount/unmount
+
* Impractical rather complex to mount/unmount
 +
* Only usable in PC's
 
* Electromechanical components
 
* Electromechanical components
 +
* Sensitive to magnetic fields
 +
|-
 +
!Backup Tapes
 +
|
 +
* Huge capacity
 +
* Relatively cheap per megabyte
 +
|
 +
* Very slow for reading and writing
 +
* Sensitive to magnetic fields
 
|}
 
|}
 +
 +
'''Special note about RAID:''' RAID technology is designed for '''100% availability''' instead of data archiving. In other words, a RAID system will prevent loss of data and availability in case a drive fails. However, RAID by itself is not an archiving method. A local-area disaster (e.g. flooding, fire, tornado, etc.) will swipe away your data, possibly irretrievably.
 +
 +
=== Storage & Redundancy ===
 +
If you just want to save your original audio media from degradation due to use, storage might not matter much. After all, if the copy media gets damaged, you can re-rip/re-record the original media.
 +
 +
However, if you are serious about long-term archival, you will also store your copy media as you store your original media, and use a copy of the copy media (i.e. a redundant copy) for daily usage. This strategy is called redundancy, and it is to prevent the possible scenario where your archive somehow developed an error. The second media need not be the same as the first media. In fact, the redundant copy is usually stored off-site to prevent total loss in case of local-area disaster.
 +
 +
Ideally, both the first copy and the redundant copy are stored in secure, climate-controlled safebox. However this is impractical for daily use. Thus, keep the first copy at your house, and only store the redundant copy in '''off-site''' safeboxes. Store your originals on yet another '''off-site''' safebox.
 +
 +
For daily usage, just pop in your first copy in the player, and enjoy.
 +
 +
=== Error-Correction Codes ===
 +
The above method should give acceptable protection against data loss. However, if you are paranoid enough about data loss, you can further protect your archived media with an Error-Correction system. Two most popular systems are QuickPar2 and dvdisaster. With both system, the principle is to create an error-detection-and-correcting file which can help you detect errors ''and'' correct them (subject to some limitations).
 +
 +
dvdisaster is perhaps the easier to use, and it runs in two modes: The RS01 mode creates an '''.ecc''' file for error detection and correction. The RS02 mode augments an '''.iso''' image with interspersed error-correction data.
 +
 +
The drawback to dvdisaster is that it seems to require an error-free .ecc data, which means that the .ecc file needs to be stored somewhere safe, or several .ecc files can be written into a media that itself is protected by an .ecc file.
 +
 +
QuickPar2 seems to be the preferred method by HydrogenAudio enthusiasts, as the .par files created by QuickPar2 has some error-resiliency of its own. It is recommended to make a QuickPar2 against the .iso image of a backup media (CD or DVD).
 +
 +
== Cataloging ==
 +
Now that you have built yourself a library of your audio tracks, it is important to make a catalog out of your tracks. This will help you, for instance, when you want to hear a certain song but you don't know exactly where it is.
 +
 +
It is possible to actually print out the list of songs within your media and glue the list to your media's case (e.g. as a CD label). However this means that you still have to handle the copy media. More handling means shorter lifetime.
 +
 +
To reduce handling your media, do a cataloging of your media. Ideally you would use a cataloger that not only store the list of files  within each media, but also recognizes the tags inside your archive files. This way you can not only search based on file name, but also on other data.
 +
 +
For cataloging to be successful, you also must have a consistent strategy for tagging, filenaming, and pathnaming.
 +
 +
[[Category:Guides]]

Latest revision as of 13:35, 15 April 2008

This is still being discussed in this HA thread. Feel free to barge in. Result of the discussions will be generalized and put in this article.

Why do we need a long-term archive?

Prior to the invention of Compact Discs, audio is stored in analog media such as vinyls, magnetic reels, and cassette tapes. These media are very prone to environmental damage; e.g. vinyls may be scratched, and magnetic media may be crinkled or demagnetized. Even the CD itself is not perfect. A scratch will damage the media, rendering audible errors into its audio tracks. And the error protection of Red Book CDs are not perfect; sometimes errors cannot be corrected, and it is up to the player to 'repair' by interpolation.

On the other hand, audio encoding technology (and data storage technology) has progressed to the point where we can (somewhat) easily make perfect (or near-perfect) copy of all audio tracks in the world, with proper error-detection-and-correction.

So, for posterity (and personal enjoyment well into your old age), it is very plausible – and feasible – to create a long-term archive of your audio collection.

Considerations for creating a long-term archive

Making a long-term archive is not something to be lightly undertaken. You must plan it. Here we attempt to provide you with a general guide to making your long-term audio archive.

Replication

First of all, you must decide on how to replicate (i.e. copy) your audio tracks into your computer.

If your source is an Audio CD, it's rather easy. Use the freely available secure ripper such as EAC or CDex to rip your Audio CD into WAV (or lossless) files. Accuracy of ripping can be improved further by using the AccurateRip database (unfortunately, the AccurateRip database is not complete; your disc may not exist there).

If your source is analog, then things get more complicated:

  1. First of all, you must ensure that your source is not damaged in any way.
  2. Then, you must find a tunable player to ensure faithful reprodution of the audio track.
  3. Next, you must have a high-quality-recording-capable audio card.
  4. After that, you must connect the player to your audio card through a quality connection; by "connection", I also mean such active elements like filters, EQs, amps.
  5. Finally, you must use a good wave-recording program to encode the pumped audio into a huge WAV file.

Encoding

Second consideration, is the encoding used. Although you can use WAV's (i.e. PCM) to store your replicated audio tracks, it is not recommended for several reasons:

  • No tagging capability – although information on the tracks may be stored in text files, it is much more practical to store them in tags within the encoded file itself.
  • No error detection – it is extremely vital for long-term archives; if your audio track develops an error, then you can discard it and (hopefully) restore it from a backup-of-backup.
  • Big size – since WAV's are uncompressed, storing audio tracks in WAV's will require a much greater amount of media.

Most of the modern-day lossless encoding should suffice. The most popular format for long-term archiving seems to be FLAC, WavPack, and Monkey's Audio. However, other formats may provide better compression.

We strongly advise you to not use lossy encoding. With lossy encoding, you will not end up with a bit-by-bit identical archive.

Media

Third consideration, is what media to use. There are a lot of usable media out there, so let's go over them one-by-one.

( One requirement is that the media must be big enough to store at least 1 CD's worth of audio track )

Media Pros Cons
CD-R
  • Cheapest per disc
  • Vulnerable to environment damage
  • Vulnerable to optical damage
DVD-R
  • Great capacity
  • Rather expensive
  • Media damages easily
Memory cards
  • Very practical size
  • May be playable in portable digital audio players
  • Fast and easy seeking
  • Longevity relatively unknown
  • Expensive per megabyte
Hard disks
(NOT RAID)
  • Honkin' big capacity
  • Very fast for reading/writing
  • Big & heavy
  • Impractical – rather complex to mount/unmount
  • Only usable in PC's
  • Electromechanical components
  • Sensitive to magnetic fields
Backup Tapes
  • Huge capacity
  • Relatively cheap per megabyte
  • Very slow for reading and writing
  • Sensitive to magnetic fields

Special note about RAID: RAID technology is designed for 100% availability instead of data archiving. In other words, a RAID system will prevent loss of data and availability in case a drive fails. However, RAID by itself is not an archiving method. A local-area disaster (e.g. flooding, fire, tornado, etc.) will swipe away your data, possibly irretrievably.

Storage & Redundancy

If you just want to save your original audio media from degradation due to use, storage might not matter much. After all, if the copy media gets damaged, you can re-rip/re-record the original media.

However, if you are serious about long-term archival, you will also store your copy media as you store your original media, and use a copy of the copy media (i.e. a redundant copy) for daily usage. This strategy is called redundancy, and it is to prevent the possible scenario where your archive somehow developed an error. The second media need not be the same as the first media. In fact, the redundant copy is usually stored off-site to prevent total loss in case of local-area disaster.

Ideally, both the first copy and the redundant copy are stored in secure, climate-controlled safebox. However this is impractical for daily use. Thus, keep the first copy at your house, and only store the redundant copy in off-site safeboxes. Store your originals on yet another off-site safebox.

For daily usage, just pop in your first copy in the player, and enjoy.

Error-Correction Codes

The above method should give acceptable protection against data loss. However, if you are paranoid enough about data loss, you can further protect your archived media with an Error-Correction system. Two most popular systems are QuickPar2 and dvdisaster. With both system, the principle is to create an error-detection-and-correcting file which can help you detect errors and correct them (subject to some limitations).

dvdisaster is perhaps the easier to use, and it runs in two modes: The RS01 mode creates an .ecc file for error detection and correction. The RS02 mode augments an .iso image with interspersed error-correction data.

The drawback to dvdisaster is that it seems to require an error-free .ecc data, which means that the .ecc file needs to be stored somewhere safe, or several .ecc files can be written into a media that itself is protected by an .ecc file.

QuickPar2 seems to be the preferred method by HydrogenAudio enthusiasts, as the .par files created by QuickPar2 has some error-resiliency of its own. It is recommended to make a QuickPar2 against the .iso image of a backup media (CD or DVD).

Cataloging

Now that you have built yourself a library of your audio tracks, it is important to make a catalog out of your tracks. This will help you, for instance, when you want to hear a certain song but you don't know exactly where it is.

It is possible to actually print out the list of songs within your media and glue the list to your media's case (e.g. as a CD label). However this means that you still have to handle the copy media. More handling means shorter lifetime.

To reduce handling your media, do a cataloging of your media. Ideally you would use a cataloger that not only store the list of files within each media, but also recognizes the tags inside your archive files. This way you can not only search based on file name, but also on other data.

For cataloging to be successful, you also must have a consistent strategy for tagging, filenaming, and pathnaming.