Linux System Administration Service Description

Overview

University IT Technology Consulting provides technical support for servers running Linux operating systems.

Standard OS, Basic OS, and Non-Standard support are offered. Standard support includes 24x7 monitoring with paging and response to issues and can include changes confined to published maintenance windows. For standard support, non-urgent issues or requests will be handled during business hours. Basic support includes 8x5 paging support and response to systems issues. Non-Standard support refers either to hardware or operating system configurations that do not fall within the Unix group support list. Non-Standard support includes all services available as part of the Standard OS support service.

Included as part of the standard service offering are a set of fundamental services, detailed below. Clients may request additional custom service modules be applied. Those custom service modules may incur additional charges.

In addition to fundamental services and modular (custom) services, University IT systems administrators may participate in project-related services. Project-based service activities are classified and managed separately from fundamental and modular (i.e., "on-going") service activities. To request project-based services, please contact your University IT Business Partner.

University IT staff will not provide application programming support, although certain staff may be able to answer questions pertaining to languages and tools.

The following OS types are supported:

Red Hat Enterprise Linux 5 and 6 (most current minor release, 32 and 64 bit)
Debian Squeeze/Wheezy (32 and 64 bit)
Ubuntu Long Term Support edition (12.04) (32 and 64 bit)

Other OS’s, including any custom Linux kernels, may be subject to the non-standard OS rate.

University IT supports Dell and HP hardware, as well as some other manufacturers. Please confer with Technology Consulting and Business Partners prior to making purchases. Additional storage or other hardware devices may qualify for a non-Standard Hardware rate.

Description of Services

Planning

When new hardware is to be deployed, University IT should be engaged as early as possible in the project planning cycle to provide input to hardware specification and capacity requirements. Recommendations provided by the systems administration team will include hardware components, operating system levels, and recommendations specific to conformance to the Stanford Computing environment (e.g., authentication, file sharing, etc.). Integration and support of third party products that have not met the product acceptance criteria are not part of the core systems administrative services, but may be considered for additional charges.

Security planning, architectural analysis, application dependency analysis, changes to data classification, and project management that requires facilitation between multiple groups external to University IT, are not part of core systems administrative services. If additional planning services are required, the client should work with their designated Business Partner to obtain a proposal with estimated costs and timelines, as these would be considered project-based services.

In addition to responding to alerts from system monitoring tools, the systems administration staff will perform a review every four months of all supported servers, and make recommendations as needed. These recommendations may include configuration or software updates, or capacity recommendations. It is expected that the client will take action on any recommendations to avoid any potential system, application or user issues.

Technology Consulting will work with clients to plan a periodic (generally 3-year) hardware replacement cycle. Clients are responsible for purchasing hardware, as well as any facilities fees that arise during the transition between pieces of hardware. Technology Consulting will collaborate with the client on hardware purchasing decisions, and after the hardware has been racked and cabled, will build the new systems and bring them to a state where client applications can be installed. In cases where the client requires the systems to be run in a parallel production or near-production mode for longer than one month, both the old and the new system may be charged for UNIX systems support.

Installation and Deployment

University IT is responsible for:

Verifying all rack, power, and cabling requests have been completed in a satisfactory manner
Installing and configuring operating system software and associated patches or updates
Installing and configuring components required to work with Stanford’s central computing infrastructure (Kerberos, AFS, and associated PAM modules, WebAuth, remctl, wallet client, etc.)
Installing and configuring application software per special arrangement with client
Performing standard installation verification activities Executing application installation commands that require special privileges
Requesting and ensuring proper backup and system firewall templates have been applied
Coordinating system-related activities between University IT groups as appropriate to ensure the successful installation of a device
Installing systems and configuration management tools
Setting up basic monitoring probes (disk, load, swap, ping), as well as any application-specific monitoring probes per agreement with the client.
Configure backups as required by the client.

Ongoing Support

University IT will:

Respond to monitoring alerts and client-reported problems. During non-business hours, support will be provided when either the hardware, operating system, or infrastructure software is unavailable or the ability to use these resources is severely degraded. Please note, should after-hours support be required due to a change that was performed by the application owner without proper planning or notification, emergency support fees may apply.
Troubleshoot and resolve system-related problems
Monitor vendor resources for any required operating system patches or upgrades
Monitor vendor resources for any required hardware upgrades
Monitor for file system intrusion (intrusion detection)
Manage hardware warranties
Coordinate hardware upgrades with hardware support vendors as needed
Monitor security advisories for operating system and infrastructure software, and take appropriate actions to safeguard resources
Implement security patches as needed
System account management
Root e-mail review
Per client request, restore files that have been backed up.
Document and submit change management requests for proper approval as required. Change Management is required for any change that may impact end-users.
Install security patches, upgrade software packages, and update system configuration to meet UNIX group best practices (patching every four months, or more frequently as needed).
Firmware upgrades as required
Maintain operating system and supported software documentation
System-level housekeeping activities to ensure systems are operating at optimal levels
Backup management
Requesting firewall configuration

Support requests that fall outside of normal operations will be reviewed on a case-by-case basis and may incur additional costs. Only vendor-supported versions of operating system or application software will be maintained. Clients should work in conjunction with the systems administration team to plan for upgrades when vendors announce that supported versions of software are being removed from support services.

It is expected that all application and system changes that may impact services will adhere to standard Change Management policies as specified in the Service Agreement. Clients should submit any non-impacting or non-urgent system requests via HelpSU. Exceptions to address emergency situations should be handled via notification of the IT Operations Center. HelpSU and notification procedures are specified in the SLA.

Non-urgent and low severity HelpSU tickets will be assigned within 8 business hours. The client can expect an acknowledgement via email that states either:

the request has been assigned and additional information is required in order to complete the request
the request has been assigned and a target timeframe as to when the request might be completed
the request has been assigned and completed

Completed requests will be indicated via a ticket status of “resolved”. Routine support requests are typically resolved within 24 hours of assignment. Clients will receive email notification when a request has been resolved.

All systems managed by Technology Consulting are built and patched from automated build systems maintained by Technology Consulting. Existing systems that enter Unix systems coverage will be rebuilt from the Operating System level to ensure consistency and meet Technology Consulting best practices. Technology Consulting recommends that systems be rebuilt to migrate between major versions of any supported Operating System. Upgrades in-place are possible, but require an extended outage window (over 24 hours) to deal with issues that may arise as part of the operating system upgrade.

Linux systems managed by Technology Consulting are managed by a centralized server automation infrastructure, called “puppet”. Systems Administrators create a complete model of the system in puppet that includes package lists, configuration files, and user management. The state of a running system is compared to this model every half hour by the puppet daemon. Any configuration that doesn’t match this model will be reverted automatically. This means that changes to all files managed by Puppet will need to be coordinated through Technology Consulting.

ssh and remctl will be set up on all systems. All remote connections will use Kerberos for authentication. Root access via ssh is disabled for security reasons. Where superuser access by a client is required, sudo with password is the preferred method, with all commands issued as root (or via sudo) being logged on the local system. Where group accounts are required, the only supported mechanism for authentication is using a Kerberos .k5login file with user accounts listed individually for revocation and audit purposes. Any use of ssh host keys or other public key authentication must be included as a variance to the Service Description. Host-based access or shared passwords are not recommended and must also listed as a variance.

Technology Consulting will install any package that is part of the Operating System distribution. Alternate versions, if they can be easily incorporated into the package management infrastructure, may also be installed, but a variance is required. Installation of any package that isn’t part of the Operating System distribution is a variance, and additional charges may be required for maintenance of that package. Linux systems are set up in accordance with the Linux File Hierarchy Standard.

Technology Consulting will install and configure application software per special arrangement with client.

AFS is installed on all systems managed by Technology Consulting. Home directories for all users are stored in AFS by default.

University IT systems group manages various types of logs via log rotation. System-level log files are collected and filtered to notify Technology Consulting. Application-level log files are rotated locally for applications installed by Technology Consulting, and client logs may be rotated or filtered by arrangement. Time and materials fees may apply for any custom development.

Technology Consulting will manage all accounts present on the system. However, any changes to accounts from the client’s organization must be coordinated by the client with the UNIX group.

Technology Consulting uses the University IT Change Management Process for all changes that may affect the service provided to the client or the University community. Tools and configuration used internally by UNIX Systems staff for administrative or maintenance actions on a system may be installed and updated outside of the change management process provided that those tools are only used interactively by UNIX Systems staff and not part of any automated or continuous management or monitoring of the system, or are unrelated to any services running on the system and not intended for use by clients, only by UNIX Systems staff.

Examples include:

remctl backend for forcing a Puppet run
Configuration for remctl ACL files to install for that backend
root bash shell configuration

but not:

remctl backend for Nagios monitoring
Script used to generate new iptables rules after rule updates
Apache management scripts run by the client

Periodically, Technology Consulting will update software on the system, usually as part of a new minor release of the operating system. These changes will be coordinate with the clients via the change management.

Periodically, Technology Consulting will release new systems configuration via Puppet. In some cases, these changes need to be applied to all systems managed by the UNIX systems group simultaneously.

For Standard and Non-Standard clients, work on the system that is controlled under the change management process can be confined to maintenance windows by arrangement with the client. Technology Consulting maintenance windows are Tuesday 5-8 PM, Thursday mornings 5-7 AM, and Weekend mornings 5-8 AM. Any work performed during this time will be coordinated via the change management process.

Security

Security policy is put forth by the University. The systems administration team will adhere to all security policies documented in the Stanford Administrative Guide. Please refer to the following link for any specific security policies related to systems administration:

https://security.stanford.edu

Technology Consulting installs file system Intrusion Detection software on all hosts, and is responsible for the ongoing maintenance of that software. This file system Intrusion Detection software is intended to monitor the integrity of system software and packages installed by Technology Consulting, not client-installed application software. Directories that show frequent client-initiated changes will be excluded from the intrusion detection report.

University IT also performs regular network security system scans. Any high-risk security vulnerability discovered by this (or other) process will be addressed as soon as possible and managed via Urgent or Emergency Change Management Request. Security warnings will also be addressed via Urgent Change Management Requests.

Monitoring and Alerting

The health of a server is checked via the system monitoring tool, Nagios. Checks for Web, Tomcat, and other ports or services are not included in the fundamental service, but are available as by arrangement with the client. Frequencies and thresholds of monitoring checks are set according to industry best practices. Changes to the frequency or threshold of a monitoring check will be considered a special request and may incur additional costs.

Based on the standard configuration setup, clients will not receive system-level alerts. System-level alerts are routed directly to systems administrators so that appropriate action can be taken. Based on the type and severity of the alert, time of day, and the potential impact to end-users, the client-designated technical contact may be called. In all cases, the systems administrator will send email notification to the client regarding any alert received and any action taken. It is the responsibility of the client to ensure contact information is kept current for notification purposes. Requests for special processes or procedures in response to alerts will be considered as a modular service.

Requests to implement any special monitoring against applications or components that have been configured by the client or are run by the client will be considered as a modular service. Such alerts will be sent directly to the client-designated technical contact so that appropriate action can be taken. Systems administration will not respond directly to these types of alerts. In all cases, it is imperative that clients keep required contact information current so that alerts can be responded to in a timely fashion.

Clients can view the state of their servers at any time via the link:

https://monitoring.stanford.edu/

Monitoring may be temporarily suspended during maintenance activities or to eliminate false positive alerts. Clients must provide notification to University IT of any planned maintenance activity so that monitoring can be temporarily disabled if needed. Failure to provide proper notification of planned system changes may result in emergency support fees should University IT staff be paged during non-business hours.

In addition, University IT uses munin, which collects system performance metrics and operational data.

Documentation

There are numerous sources of systems documentation that clients can refer to for information. Detailed configuration information about each host is available via request. Nagios provides a real-time view into the current state of a given server. The following list provides links to various sources of information:

munin server metrics (by arrangement with Technology Consulting by client)
Monitoring: https://monitoring.stanford.edu/
Security Information: https://security.stanford.edu

Responsibility Matrix

The Responsibility Matrix indicates whether University IT or the client is ultimately responsible for performing the listed task. In instances where there are check marks (✓) in both columns, both the client and University IT must coordinate their efforts to ensure the successful completion of the task. It is not the intent of any Responsibility Matrix to absolutely define every process, function or task performed as a contracted function.

Technology Consulting	CLIENT	University IT
Keep client contact information current	✓
Keep systems administrator contact information current		✓
Approve all hardware and software	✓	✓
Resolve or coordinate with vendors to resolve hardware problems		✓
Propose hardware and software configuration and installation standards		✓
Review hardware and software configuration and installation standards	✓
Configure and install hardware and OS software		✓
Understand and document the hardware and software business application requirements	✓
Perform routine housekeeping and system maintenance activities as required and approved		✓
Start and stop processes that require privileged access as requested by the client and as required by documented procedures		✓
Use Change Management Systems (CMS) to schedule server maintenance		✓
Use CMS to schedule application maintenance/upgrades	✓
Use CMS to provide notification of emergency changes	✓	✓
Approve or deny in a timely fashion the installation of new OS releases, upgrades and patches unless it is related to an exploitable security fix	✓
Install new OS releases, upgrades and patches in the production environment		✓
Support OS software		✓
Support application software	✓
Maintain OS documentation		✓
Maintain application software documentation	✓
Provide and implement monitoring processes and/or tools		✓
Use automated system software tools and/or procedures to proactively monitor, manage and report on server performance.		✓
Perform proactive fault detection and diagnostic procedures		✓
Determine server vs. application issues		✓
Monitor application availability	✓	✓
Monitor/recommend tuning of performance of all servers		✓
Provide outage notification of failed hardware and software		✓
Respond to audit requirements	✓	✓
Set user and file permissions		✓
Provide account management		✓
Track unused accounts	✓
Track server license compliance	✓	✓
Track application software license compliance	✓
Monitor security using Stanford standard security		✓
Review security reports		✓
Install security patches		✓
Troubleshoot and solve problems with hardware, OS, and supported software		✓
Provide 24x7 on-call support for production servers		✓
Coordinate acquisition of vendor software	✓	✓
Acquire SSL certifications as appropriate		✓
Forecast resource requirements.	✓	✓
Identify hardware/software (CPU, memory) needs		✓
Configure hardware/software installation/upgrades		✓
Provide and implement application monitoring processes and tools	✓
Identify all authorized users	✓
Administer user accounts (adds, changes, deletes)	✓
Provide access to all authorized users	✓	✓
Install application(s) on server	✓
Install and test applications patches in development environment	✓
Monitor application performance	✓
Support remote access users	✓
DATA MANAGEMENT
Identify storage requirements	✓
Manage storage assets		✓
Identify data that will be backed up	✓	✓
Provide data retention requirements	✓
Approve backup and recovery strategy	✓
Install client software on servers to facilitate data backup		✓
Restore/recover data at server level, if necessary		✓
Assist client restore individual files	✓	✓
Assist client to do ad-hoc backups as requested	✓	✓
INCIDENT MANAGEMENT
Provide and maintain a single point of contact for the reporting and tracking of hardware or system software problems.		✓
Resolve HelpSU tickets from client		✓
Adhere to problem management escalation procedures.	✓	✓
Maintain current status on open problems.		✓
Provide status and updates on problems per SLA, at client's request, or according to severity guidelines.		✓
Report on problems within established timeframes.		✓
Perform problem analysis as requested.		✓
Participate in problem analysis if needed.	✓
Implement problem analysis recommendations as requested/assigned for respective areas of service responsibility.	✓	✓
Provide problem trend analysis.		✓

Linux System Administration Service Description

Overview

Description of Services

Planning

Installation and Deployment

Ongoing Support

Security

Monitoring and Alerting

Documentation

Responsibility Matrix

Services

Support

University IT

Connect

UIT Web Editors