Overview
Critical data must be protected and made recoverable in the event it is no longer accessible. Each school and department has a fundamental need to protect their information and intellectual property. Considerable resources are required to do so, and the backup demand should be aggregated and satisfied by a central service to achieve economies of scale and consistent polices.
Stanford’s diverse community and its data require a body of services to protect it. The backup service portfolio seeks to offer options that best fit the information that needs to be protected. The options range from file servers that can simply hold copies of files, to a versioning system that performs backups on an automatic schedule and stores copies in a remote location. Network file systems are discussed in a separate strategy document, Networked Storage.
IT Services historically has offered a single backup service, but has added options to meet new demands in the past few years. IT Services will continue to expand backup and archive services by adding additional backup and archive tiers, improving utilization, reducing dependency on low-density tape by leveraging disk, and reducing and consolidating the existing environment to lower maintenance and operations costs.
Current State
- IBM's Tivoli Storage Manager (TSM) is the primary system used by IT Services to back up its supported servers (it is often used a data recovery tool of last resort because of the time required to restore from tape). TSM's data transfer method is to use the operating system client to periodically transmit new data. TSM's relevancy to the University has waned in the past few years as more transparent, feature-rich, and lower-cost alternatives have been developed.
- A more popular method that is beginning to replace TSM is disk-to-disk replication. The production copy needs to be stored on IT Services Tier 2, LCCS (Low Cost Central Storage), or ULCCS (Ultra Low Cost Central Storage) using NetApp storage systems, and data is replicated from one primary system to two secondary systems one to six times a day. Geodiversity of storage is provided through the auxiliary data center in Livermore.
- Data protection for desktops and some servers are delivered by Iron Mountain Connected, MozyPro and CrashPlan. For small data sets (or ones with low daily change), external service providers can offer inexpensive alternatives to an internal system when the bandwidth throughput and latency between the client and the service is not an issue.
- Individual schools, departments, and even IT Services' services have moved away from TSM for cost and transparency reasons. Lowering the rate and improving reporting should help address this issue.
- There is an ongoing effort to reduce the use of low-density tapes for off-site storage and eliminate on-site tapes in favor of disk-based storage.
- Data transported offsite (typically tape to Iron Mountain tape storage facility) has the option of being encrypted to protect against physical loss or theft of the media.
- Red Hat (Linux) is being tested as a replacement for IBM's AIX.
Vision
Because the university's information is diverse and the systems that facilitate its use are equally varied, a data archive and backup strategy must support this diversity while maintaining several core characteristics.
- Availability/Redundancy: Restored data must be usable regardless of location or failure. To guard against individual component or system failures, backup data should be stored in at least two locations: on-site and off-site. Backup data should be stored independently from primary data.
- Performance: Not all information is equal, and data requiring the shortest recovery time must be rapidly restorable.
- Security: Authenticated access to backup resources must be provided. A backup service must have flexible authentication mechanisms to meet the needs of private data and shared data confidential to a particular group. Data storage media, whether tape or disk, must be encrypted.
- Scalability: A central backup service must handle hundreds of simultaneous client connections and reliably manage backups for thousands of clients.
- Simplicity: Backup service should be simple, well-defined and provide a transparent interface to the end user. It should integrate with all operating systems supported by IT Services.
A data backup and archive service is only as relevant as the policies that govern it. Partnership with data owners in these areas is necessary:
- Definition: Desired data sets to be backed up should be clearly defined and agreed upon. Unnecessary data should be excluded, as should data that will be replaced using other means.
- Validation: Formally tested and documented policies should exist to validate integrity, consistency and usability of data that has been backed up.
- Retention policy: The service should allow multiple versions of the data to be retained as required by the data owner. In addition, it should be able to retain a particular version for a desired interval of time, which may be dictated by law.
- Recoverability: A well-documented disaster recovery plan should exist for rebuilding the backup service in case of a catastrophic disaster.
Technology trends that IT Services is tracking in developing its strategy in this space:
- Utilizing and leveraging disk rather than tape as a media target through virtual tape libraries, disk-to-disk replication, and various combinations that leverage faster and more flexible media.
- Data de-duplication is an active area of industry development to increase storage efficiency and negate the proliferation of duplicate data generated by poor data management practices and virtual server sprawl.
- Continuous data protection is an important enhancement which enables the protection of data in real-time as it changes. A time lapse between data modification and backup can result in data loss.
- Self-service file restoration using automated snapshots significantly enhances the user experience and reduces the workload on storage and support staff.
- Ability to store data on low-cost platforms (tape or nearline systems) using native interfaces (Windows Explorer or Mac OS X Finder).
- Desktop backup to the larger external services, using third-party providers.
- Cloud storage can be integrated into on-site backup services or used as a backup application in the cloud
Goals
- Pursue a centrally funded backup offering combined with file storage.
- Investigate an Apple Time Machine backup target to harness the existing capability of this popular platform.
- Integrate storage systems and backups so that entire storage systems are backed up.
- Increase usage of storage efficiency technologies, such that a higher percentage of each asset directly supports data recovery.
- Replicate data into redundant data centers (ECHs, Livermore, and potentially North Campus) and de-duplicate data in each data center but not between them.
- Make adjustments to backup service so that researchers find it relevant.
- Define and build an archive service that is attractive to researchers.
- Make adjustments to backup service so that data is protected soon after it is changed.
- Evaluate backup service rates and formulate a pricing strategy: bundled services, one-time fees, transfer fees, etc.
- Deploy a reporting interface for users to view usage, billing, and settings.
- Choose and implement solutions that are centrally managed and user-friendly at an acceptable level of support for a reasonable cost.
Roadmap
- Continue work in progress to reduce hardware infrastructure and increase use of storage efficiency technologies.
- Partner with Business Services to identify client pain points and develop new rate models.
- Evaluate other backup and replication tools (alternatives to TSM) and choose a continuous data protection tool.
- Publish recommended backup configurations for client operating systems and use cases for each data backup and archive tool.
- Deploy reporting tool based on native TSM operational reporting and/or custom scripts.
Measures of success
- Client feedback on rate changes and reporting tool.
- Results of the BCDR (Business Continuity and Disaster Recovery) test.
- Trend data: number of client systems backed up and terabytes managed.

