• Facebook
  • RSS Feed
  • Instagram
  • LinkedIn
  • Twitter
Jan 222013

An issue which many small to medium companies struggle with is Enterprise Vault data on disk growing over time.  There are two aspects to the issue, firstly there is the overall size.  Over time small companies still generate sizeable quantities of data, and with limited budgets that becomes hard to manage.  Really it’s not the volume of data which is the underlying problem, it is the quantity of files.  Many, many, many small files are created by Enterprise Vault on the data partitions along with many different folders.  The high quantity of small files makes backing up those files slow.

Backups taking a long time is the issue.. One or two years after implementing Enterprise Vault environments begin to struggle because of the backups taking longer and longer, daily backups become almost impossible (unless budget is available for faster, bigger, better backup devices!)

In this article I’ll show you a small, worked-through example of what is happening, and offer a potential solution.  The solution comes with some caveats, which I’ll go into towards the end of the article.

Environment Configuration
In my sample environment I’m running Enterprise Vault 10.0.2 with a number of small archives, including some mailbox archives and some FSA archives.  The net result is that I have:

114,834 files
747 folders
2.48 Gb of data

These are all stored locally on the EV server, on a single drive (spread across a number of different Enterprise Vault partitions)

I am going to use Windows Server Backup from Windows 2008 R2 in order to backup the folder, to another drive.  I know that some 3rd party (ie non-Microsoft) products can perform better in certain environments, but I’m going with the ‘free’ option, and using what is built in to the Operating System.  The source server is running in VMWare Workstation 8, and the two drives are on their own Solid State Disk (the same physical drive for the two drives which Windows sees).

Granted the setup is not as large as I would like, nor is the underlying configuration as tuned as I would like, but you’ll see it will demonstrate the fundamental principal.

Windows Server Backup Testing
I ran Windows Server Backup three different times, when there was little to no activity on the Enterprise Vault environment.

Run 1 = 35 minutes
Run 2 = 42 minutes
Run 3 = 41 minutes

Average across 3 runs = 39.3 minutes

Enabling Collections
The ‘solution’ I offer is using Enterprise Vault collections.  This is something that is enabled on each of the partitions, and has a number of configuration options such as :

Start and End Times – for the collector process (part of StorageFileWatch) to run
Maximum Size of Collection Files – default is 10 Mb, and I have not changed that
Age at which files will be eligible for collections – default is 10 days, and I have not changed that

Once collections were configured on each of my five partitions, I then issues a ‘Run Now’ for each of the partitions.  ‘Run Now’ is another option on the properties of each partition.  There is no harm in running collections on multiple partitions, but as with many aspects of Enterprise Vault, if you deploy this in a real environment you may want to stagger the running of the collections to balance the load.

Further Windows Server Backup Testing
Once collections were enabled, and the collection run finished, I then observed:

37015 files
747 folders
2.48 Gb of data

You’ll notice that it’s the same amount of data, and the same number of files, but a vastly reduced quantity of files.

On a subsequent set of three backup runs I saw the following:

Run 1 = 23 minutes
Run 2 = 23 minutes
Run 3 = 22 minutes

Average across 3 runs = 22.6 minutes

Solved?  Not quite – the Caveats

This appears to have solved the problem!  My backups are now ‘super’ fast; at least 33% faster for the SAME amount of data.  Unfortunately this all comes with a price.

First of all you have to fit into your busy server schedule the running of the collection processes.  You then have to take in to account what Enterprise Vault needs to do in order to retrieve an item.  It needs to locate the CAB file, and extract the item from the CAB file, before it can be delivered to a client (or client process).  This extra hop obviously takes a little bit of time, a little bit of time for every retrieval.  This could become more pronounced if you need to rebuild an index for an archive.

Deletions also have to be taken in to consideration, if they’re allowed, and Storage Expiry, if that is enabled.  Deletions will lead to items being dereferenced from CAB files.  This may lead to what Enterprise Vault calls ‘Sparse Collections’.  These are collections which are taking up space (in terms of the CAB file) but don’t actually contain too much data.

Note: When items are deleted ‘from CAB files’ what happens is that Enterprise Vault reduces a reference count in the database, the CAB file is not touched.

These ‘sparse CAB files’ can be restructured, and StorageFileWatch will do so as part of it’s scheduled running – but it’s another thing that adds a little bit of time, and processing overhead.

In the end it is the same approach as with many aspects of Enterprise Vault, it is necessary to carefully consider this options benefits (of faster backup) with the downsides.  Each environment is unique, so it’s not really something that can be recommended, or not.  It is definitely worth considering though… but the final thought… once enabled, there is no going back.  You can create new ‘non-collected’ partitions, but uncollecting the already collected partitions is not to be taken lightly, and will likely need involvement from Symantec Support.


If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.

  2 Responses to “Enterprise Vault Backups, Collections and you”

  1. EV Collections… DON’T DO THEM. The amount of disk I/O and the recurring nature of them can eventually bite you.

    In your post you mentioned the Collections have to be scheduled. Each vault store partition has to be scheduled to run collections at a different time, and while collections are running some otherwise routine processes are held automatically, such as archiving. In our case we have 50 vault store partitions on our larger servers, and with their collections strung out every 15 minutes, that’s basically half a day where you can’t do archiving.

    Without collections, the main I/O on your vault store partitions is the original archiving of a message (creating files), then reading those files to index them for a user’s archive, and thereafter only when retrieved by a user or by another feature such as discovery extracts or rebuilding or upgrading indexes. With collections, you have to add: 1) a pass to collect the files into CAB files, 2) every time an item is retrieved by a user, or discovery, or indexing it has to be extracted temporarily as *.ARCHDVS* files, 3) daily passes to check for new files to collect (even on closed parititions) and to delete the temporary ARCHDVS files because nothing else deletes them, 4) while without collections you could fill a vault store partition to 95% full and know that the daily change rate would be minimal, WITH collections you have to leave enough room for all the temporary files that will be extracted for retrievals by users, discovery, or indexing processes (20% free and hope that’s enough for every day). The extra I/O cycles are not worth collections. Fix the backup.

    My system (75Tb in size) ran into the same problems with backing up large #’s of small files, then with collections enabled we eventually ran into backup speed problems even with smaller #’s of larger files. The solution to backup speed problems was to use a storage-level backup, in our case we switched to Netapp storage with Snapmanager backups. Backups that were previously taking 36 – 72 hours for a good full backup are now taking less than two hours for the entire system to incrementally go through each volume on each server and take current snapshots.

    • Joe,

      Thanks for taking the time to add such an insightful and thoughtful comment, I really appreciate it. You have raised many good points, and they’re definitely the sort of things that people should take into account.

      One thing that I would say is that ARCHDVS files *should* get removed after 1 day since their last access. If you’ve not seen that then it might be worth (or have been worth) talking to Symantec Support about that. That being said, I was testing something in this area just the other day, and I noticed that they don’t appear to get removed! So you’re right you’ll end up with close on twice the original amount of storage if nothing comes to clean things up.

      Thanks (again)

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>