Overview
To ensure that servers and applications are running properly, system administrators make use of monitoring tools. However, because of the diverse computing environment at Stanford, there is a wide range of such tools. The goal is to develop a centralized tool that ties together the information coming from disparate servers and applications, at both the external and internal level, and standardizes the information for ease of comprehension.
Current State
Computing Services today manages Microsoft Windows and Linux servers. To add to the complexity of managing more than one operating system, the the type of counters used to get a overall view of a server is different. The complexity of the event logs and performance counts can also be very different between applications like MySQL, Oracle, and Microsoft SQL.
The external approach to monitoring servers can be done with a single application, because basic tests like ping and SNMP do not need to know anything specific about the operating system or application. To get an more in-depth look at how a server or application is running, monitoring is required from an internal point of view, which requires user authentication, as well as operating system-level knowledge of services or where logs are kept.
Monitoring tools vary from group to group, and include applications such as Nagios, Howis, Mom, Whats up, Smarts (used by IT Operations Center and Networking), Cisco, and various homegrown tools. There are situations where the monitoring is incomplete and/or outdated in the case of older systems.
Vision
The goal is to have a unified, central monitoring tool that provides:
- Overall visibility into system, network and application availability and health.
- Couples robust discovery and monitoring capabilities with linkages to a broad array of external monitors, including Nagios, Patrol, Howis, SiteScope, and more.
- Performs end-to-end testing for complex systems to determine whether the problem is at a particular tier, such as the failure of a database in a three-tier application.
- Reporting tools that can push or have data pulled into an enterprise resource planning (ERP) system for central reporting of availability and performance for clients and IT Services staff.
Roadmap
- Upgrade the Operations Manager to System Center 2007 R2.
Measures of success
- Able to present or push data to a central location for reporting and review.
- Offer end-to-end application testing of web and database applications.
- Offer role-based management and access to monitoring console to be able to get service mapping from CMDB (Configuration Manager Database).
- Integrate client needs.
- Expand current capabilities.

