Common Files Technology
Contents |
Introduction
Most of accounts backup System State and/or full volumes. This leads to storage waste because like OS files or commonly used applications are kept multiple times in private storage of every such account. Common files technique is intended to eliminate or at least dramatically decrease storage waste of that kind.
Main principles
The main goal of common files technique is achieved by organizing centralized common file storage maintained by service provider. Backup clients being aware of this storage avoid common content duplication during backup.
Common files storage structure is similar to private backup storage. It consists of common files register and common files content storage.
Common files storage
Common files storage is upgraded manually with certain files (installation packages) using specially developed tool — common files storage builder. Common files are identified by hash build upon file content. Hashes are stored in common files register.
Each common file is cut into relatively big (64M) blocks for processing convenience. Block description is stored in common files register, block content is stored in common files content storage. Block content is compressed and commonly encrypted.
Common files storage is platform-dependent: Win, Linux, FreeBSD or Darwin.
Common storage building
In order to extend common files storage with new common files cs.build command of ServerTool (to download it please follow the link for Windows x64 or link for Windows x86) is used. The algorithm of adding new common files is as follows:
- Input:
- source-path — path to the files which will be backed up as common.
- protocol — server protocol where common files storage is located.
- server — server address where common files storage is located.
- user — user name with write rights to the common_user folder (you need to create it before starting work with common storage and give him write rights to the common_user folder e.g. new_user).
- password — user password.
- platform — targeting OS: Win, Linux, FreeBSD or Darwin.
e.g.> ServerTool.exe cs.build -source-path c:\temp -protocol FTPS -server 127.0.0.1 -user new_user -password QWERTY123456QWERT -platform Win
- Local copy of common files register is updated from server.
- Folder is recursively traversed and every file is processed separately.
- Files having size less than 64Kb are skipped to keep common file register smaller.
- For every file both full hash and partial hash are calculated and checked against file table. Duplicates are skipped.
- File is cut into blocks regardless file-type-aware slicing. Content of resulting blocks is compressed, encrypted and stored in common content storage; block information is registered in block table. No deduplication on block level is performed.
- Common file is registered in file table.
- When all files are processed, common files register is atomically committed to the server.
Backup client
Backup client has local copy of common files register and maintains it in up-to-date state. Common files register is used during backup and restore. Common files technique is applicable only for File System, System State and Network Shares plugins.
Backup
Before every backup session local copy of common files register is updated from server and indices are rebuilt (should not take a lot of time because common files register is expected not to be quite big).
Restore
Before restore session local copy of common files register is updated from server, but only if there is no local copy or it is corrupted. Only nodes that have common flag set are restored using common files storage.
Retrospective common files detection
Retrospective common files detection goal is to detect common files that were previously backed up as non-common and re-backup them in a common-file way in order to utilize server storage. There are two cases have to be considered:
- Potentially common files were backed up before common files technique was introduced.
- Potentially common files were backed up before common files storage was upgraded with such files.
First case can be resolved by single-shot Recheck command issued right after common file technique is activated. To resolve second case new background process is introduced. It traverses last node versions and searches for common file candidates.
| Functions and Features | ||
|---|---|---|
| Backup | 2-Tier Backup • Advanced Scheduling • Pre-/Post-Backup Scripts | Microsoft Exchange • Microsoft SharePoint • Microsoft SQL Databases • Oracle Databases • VMware Virtual Machines |
| Restore | LocalSpeedVault • Restore Cache • Restore on Desktop • VMware Per File Restore | |
| Management | Audit • Color Coding • Console • Remote Commands • Remote Installation • Saved Connections • Users | |
| Other | Archiving • Bandwidth Throttling • Branding and Deep Branding • Common Files • Virtual Drive | |