Usability - Productivity - Business - The web - Singapore & Twins

Backup vs. Archival and Thoughts on Archival

Archival often gets confused with backup. The activities are (technically) very similar and invite such a confusion. Both are the action of moving bits from "the place where everybody looks" (the mailbox, the current database, the file share, the intranet etc.) to some other place (a backup tape, a cheaper storage, a CD-ROM /dev/null etc).

Backup is for the sole purpose to keep data available in case the main storage area is no longer available (due to accidental deletion, soft- or hardware problems).
Archival is the removal of data from the "main area" to an "archive area" for later retrieval for historic or compliance reasons. A secondary motive for archival is to remove obsolete or less relevant data from the active work area to improve performance, shorten search time or save on storage in the system hosting the active work area. To confuse matters further: quite often technologies designed for backup are successfully used for archival (e.g. copy data to a removable storage like a tape or optical disk).

In other terms: you don't expect to ever restore a backup unless something went wrong, while accessing an archive can be part of a regular business process. There are a few perceptions about archival that need to be put into perspective:

Archival does not save any storage space!
At least not when you look at all storage across the Enterprise. However it can help saving storage on your active work area (which is most likely the most expensive one) and so help saving storage cost. IMHO the biggest advantage of archival is the reduction of data a user would look for, since the current work area only would contain relevant data. This is also the greatest peril of archival: when data gets archived too early and the archival location turns into yet-another-work-area-to-check. (OK your archive might use a better compression that your life system - but are you sure that is isn't just a backup?)

Archival needs information life cycle management
Every information has a certain life cycle. Like food items information has a "best use before" data (that varies depending on the purpose). It follows roughly the following pattern:
  • New: freshly created, might not be relevant yet (e.g. upcoming policy change)
  • Current: data supports one or more business processes and is actively used
  • Reference: data is no longer actively use, but is regularly required for reports or comparison
  • Compliance: data is obsolete but needs to be kept for compliance (e.g. business records in Singapore : 7 years)
  • Historic: the data doesn't need to be kept, it doesn't serve any active business process, but might be of historic interest. This state of information is a field of tension between (corporate) lawyers and historians: historians like to keep everything, while lawyers see a potential discovery risk (cost and content) in every piece of data kept. When analyzing the archival policies of any organization one can find out who won this conflict.
  • Obsolete: In 2050 really nobody cares how many rolls of toilet paper you bought at what price (while the price volume of toilet paper might still be of historic interest as curiosity how mankind could be so wasteful with resources before they had the self cleaning buttock nano coating)
Data might skip some of the phases. As one might notice I'm speaking about "data" in general. The life cycle applies not only to documents but to all sort of information. Now to have a successful archival strategy the status of information in that life cycle should be explicit known for each piece. Unfortunately this is still the exception rather the rule. Short of an explicit expiry data we make implicit assumptions like "Unless stated otherwise a document in this place expires xx days after last update" or "Unless stated otherwise a document in this place expires xx days after last use". Since usage is much harder to track (if one looks at an information to then figure out that wasn't what she was looking for, an automated system would count that as usage - bad. Or I use the search engine and the search result shows already the information, so I never open the location - document expires being unused - bad) the most prevalent measure is "last update". Some clever verification  cycle asking the owner to extend the validity is needed. But better have a clever one. If that turns into a one-bye-one update exercise nobody will bother. A very good rule engine can help there. Most of the technical troubles (short of broken equipment) you might experience with archival are rooted in strategic (mis-)decisions.

What's your Retention/Archival policy?

Posted by on 27 May 2010 | Comments (1) | categories: Business Software


  1. posted by Nigel C on Thursday 27 May 2010 AD:
    Where I work, there are many parties involved. We have a records management specialist on the business side. On the IT side we have the ILM team (from the infrastructure side). Then there are the data architects and the team lead for the technology that does the document management. Then there are all the business users (VPs, directors, etc.).

    Poor records management specialist.