Overview
Document management refers to systems that allow users to collaborate on documents, keep track of revisions, and access them as needed. Document management systems are often referred to as content management systems.
IT Services currently operates multiple services used for various levels of document management. Some of these systems are becoming increasingly difficult to maintain, are limited in capability, and/or are expensive to license. They do not easily allow content to be moved from one system to another or to be shared across systems. Collaboration among users is generally limited, but also limited by the lack of integration between systems. The lack of features and capabilities have led clients to purchase and install local solutions. A single common system could reduce system and support expense, and improve options for broader collaboration. As Stanford looks to provide additional tools for the Work Anywhere initiative and more secure methods of storing Prohibited, Restricted, and Confidential data, document management systems could help by allowing departments, organizations, and individuals to automate processes for using and sharing their documents while further protecting the data they contain. Document management capabilities might also help address some audit/risk management concerns.
Current State
There is no clear definition in IT Services for what is meant by document management. As a result, in addition to "official" document management products, clients use data storage features, such as AFS (Andrew File System) and CIFS (Common Internet File System), as well as collaboration tools such as WordPress, MovableType, MediaWiki, and Livespace to store and manage documents at a base level. There is a lack of clarity about when it is best to use storage or document management, and there is little integration between the tools.
Currently, IT Services has two document management offerings in place: Docushare and SharePoint.
- DocuShare version 5.0.02 is installed on a Linux O/S, Oracle database, and Apache Tomcat web server. As of April 2010, DocuShare has over 2,100 user accounts, spanning several organizations and departments, including IT Services, Administrative Systems, and academic and research departments. IT Services provisions the DocuShare user accounts and sets up DocuShare groups and top-level collections, but the hierarchy of collections residing beneath these top-level structures, as well as collection and document access permissions, are maintained by designated DocuShare owners and groups. DocuShare provides workflow features such as approval routing and notifications and versioning. DocuShare uses Kerberos and WebAuth. DocuShare currently does not integrate with Workgroup Manager.
- SharePoint version 2007 is running on a Windows 2003 front end with a SQL 2005 backend. It is secured with SSL (Secure Sockets Layer) encryption and Kerberos authentication. SharePoint is completely integrated with Active Directory and anyone with a SUNetID may access it. Once the initial site is provisioned, management is decentralized, so that local site administrators have complete control over which users and groups have access to their data, as well as being able to customize the look and feel of their sites. SharePoint has a built-in search feature that indexes a variety of documents, including Word, PDF, and any other text data stored within the application. SharePoint has customizable workflows for document creation, approval and notification of changes.
Vision
A document management service offered by IT Services should expand on enterprise document storage options to provide document management capabilities at a level most commonly needed at Stanford. The service should be easy-to-use and maintain, should be web-accessible, and should be platform- and browser-agnostic. Document management is not intended to meet the more specific needs of other data types, such as pictures, graphics, and video.
Minimally, the service should facilitate:
- Information creation: Content can be created easily within the system.
- Storage: In addition to content created within the system, the system allows for import, storage, and maintenance of commonly used formats (e.g., Microsoft Office documents).
- Access: Authorized users can easily set individual or group level access permissions for content; users can "check out" documents and lock them from modification by others users while they are in use.
- Search and retrieval: The system automatically indexes all content, regardless of origin, based on content and metadata tags; users can search by string, but see only those documents to which they have access.
- Versioning: The application tracks the revision history of documents, so that users can retrieve and work from previous versions of a document.
- Workflow: Users should be able to create a rules-based workflow that can route documents to specified groups or users for approval or review.
- Security: The system may need to accommodate documents with Prohibited and Restricted data.
A document management system is expected to be able to work with static documents such as digital images, which cannot be edited or made machine-readable without further processing (i.e., optical character recognition) as well as editable documents including word processing documents and spreadsheets. While modifiable, these documents: 1) are tied explicitly to one application (e.g., Word); 2) contain little or no information about themselves; and 3) are typically flat files, or information blobs, prohibiting access of specific information elements within them. The trend is away from the management of single, static documents. Instead, complex, compound documents are becoming the rule. They are not tied to one application or platform, they are dynamic and constantly changing, and they are "intelligent," carrying information about their content and structure. In this way, documents reflect the trend toward object-oriented architectures, where information is contained in objects — units of information of a finer granularity than traditional documents — which also contain information about themselves and the applications they were created with. Documents in this new object-oriented conceptualization are considered to be containers of a wide variety of information, rather than single flat files or blobs. Instead, they are simply a collection of pointers to external elements that are dynamically assembled as they are retrieved. They contain pointers to information objects, the content, which can be assembled temporarily for a specific purpose and later reused in other documents. They also contain information about document behaviors, such as who is allowed to see the document, who must approve it and in what sequence, and document metadata such as the authors, revision history, and status, as well as links to other external elements such as datasets, graphics, images, and fonts.
Roadmap
Determine whether IT Services should provide a central document management service:
- Assess whether clients want/need more than they have today; do they want a central offering?
- Assess the requirements of different clients who will be using a central document management application. Different departments will have different requirements and needs, and within those departments, there will be different roles for faculty, staff, and students.
- Evaluate whether or not it makes sense to provide multiple offerings to meet various needs at Stanford.
- Evaluate costs and benefits for pursuing a single, central solution: Would the university save money and promote efficiency? Would IT Services save money by reducing options to a single platform?
Based on the above information, investigate which document management solutions, if any, would be embraced by the Stanford community and would realize the IT Services document management service strategy:
- Work with clients, Administrative Services, and peer institutions on discovery of possible alternative solutions and compare to existing IT Services' solutions.
- Track which of the existing offerings and services members of the Stanford community are currently using or are aware of. Research what offerings are available at other academic institutions, and track their users' expectations and experiences to help inform inform Stanford's evaluation of potential central document management tools and services.
- Pilot one or two solutions to evaluate cost and the extent of features that meet the guiding principles of document management.
- Scalable: The application must scale well to a growing number of users and to an increasing volume of data content. Content must also be able to be exported in a format that makes it easy (or at least possible) to import that content into a different document management system.
- Integrated: The application should integrate with Stanford infrastructure (SUNet ID authentication, workgroups via Workgroup Manager integration). Integration could include open standards such as WebDAV and LDAP (Lightweight Directory Access Protocol), and the application should have a rich and well-documented API (Application Programming Interface) that allows programmers to write scripts to integrate it with other applications.
- Community-driven: While the document management system is supported by IT Services, the content and permissions must be maintained by the Stanford user community in order to meet individual and group needs and to ease the overhead burden on IT.
- Secure: The application must meet the security guidelines, as outlined by ISO (Information Security Office). This can be accomplished through the use of firewalls, WebAuth, Kerberos authentication, and access permissions set at the application level.
- In evaluating potential offerings, identify and support "champions" who can participate in an evaluation of multiple solutions with a focus on overall Stanford needs; some of the top contenders for an evaluation include Documentum, Sharepoint, Xythos, and Docushare. The products should integrate with other tools (e.g., collaboration tools, email) and replace at least one existing service.

