IT Services is responsible for purchasing and maintaining server hardware, primarily located in the Forsythe data center. The data center's rack space, power, and cooling capacities are quickly being consumed, and the strategic environmental goals for Stanford, aimed at reducing the university's carbon footprint, require the more efficient use of computing resources. Additionally, in order to sustain and improve its server management numbers, IT Services needs to be able to support a variety of hardware platforms in an efficient manner.
To address these efficiency concerns, IT Services is pursing several vectors, including server virtualization and power management efforts, which are addressed in separate strategy documents (see Server Virtualization and Data Center Facilities). Two different hardware configurations — traditional rackmountservers and blade servers — are addressed here.
The server hardware IT Services primarily supports in the data centers falls in the “small” to “medium” configurations from industry hardware vendors. By design, these hardware platforms emphasize flexibility, scalability, and fault-tolerance. Interestingly, even these “small” configurations are generally over-specified for their intended purposes; for instance, a “sandbox” web server generally doesn't need to be a dual-processor, quad-core, 32GB server.
For the past several years, IT Services has standardized on Dell rackmount servers. The majority of the servers are Dell 1RU servers: specifically the PowerEdge1750/1850/1850 and R610 lines. While this has resulted in significant efficiencies for both purchasing and maintenance, it has reduced IT Services’ ability to work with similar equipment from other vendors, including the ability to compare hardware between vendors and to support a variety of remote management infrastructures. This situation is especially problematic given IT Services’ recent server purchasing contract with Hewlett-Packard. While IT Services currently has several systems on loan from HP, there is no way to readily take advantage of these systems due to console support issues. If IT Services cannot take advantage of this contract, it is effectively locked into purchasing from Dell for the foreseeable future, risking price increases and reduced support offerings.
To offer improved data center density, most vendors now offer blade servers. A blade system generally consists of 10 to 16 servers housed in a central chassis that provides redundant power and cooling, a network fabric (sometimes), a high-level management interface, and a centralized location for cabling. Each vendor has a different take on these blade servers, all of which affect server maintenance and ongoing support. Blade servers have these advantages over traditional rack-mounted servers:
- Greater density of physical servers in a data center.
- Reduced power consumption per equivalent number of stand-alone servers.
- Easier and quicker physical server deployment.
- Lower overall provisioning and associated monthly rates for clients.
- No additional hardware for console management.
- Power, console, and network cabling are simplified and reduced.
Blade systems are not appropriate for all situations, however. For instance, servers that need specific PCI cards are generally better suited for a regular rackmount server. Over the last year, IT Services has deployed blade servers from three separate vendors: Dell, Hewlett-Packard, and Sun/Oracle. It has not yet pursued a plan to compare and contrast these blade systems.
For rackmount servers, IT Services needs to aggressively pursue a plan to evaluate server offerings from multiple vendors and communicate our evaluation criteria to the vendors. In addition to power/performance/scalability issues, these criteria would include:
- How is firmware managed? IT Services should be able to easily query firmware versions from both Linux and Windows systems, and update the firmware without an (immediate) reboot. This ability should be easily scriptable.
- Computing Services staff should have complete control over the Network Interfaces (NICs) on each system.
- Servers must provide consistent serial access to BIOS and ongoing systems.
- Servers must fit into existing rack equipment.
- Cabling issues: How many cables are necessary to take full advantage of the server?
- IT Services should be able to ask the above questions regarding client-chosen equipment as well. Where possible, IT Services should be able to adjust its support rates based on non-compliance with these standards.
With respect to blade servers, IT Services needs to figure out how to best evaluate vendor blade offerings against each other, applying the above questions.
Taking the broader view, IT Services should purchase servers on an ongoing basis at a no-greater-than 85/15 ratio — that is, no more than 85 percent of the systems should be from a single vendor. If this policy is aggressively followed, it will force IT Services to ask the above questions on a consistent basis, and formalize the questions for future vendors. This approach will also provide a basis for vendor interaction that can be easily discussed with clients, differentiating IT Services from our competition.
As servers reach the end of their hardware support contract, IT Services needs to move towards the direction of replacing stand-alone servers with blade servers. IT Services needs to have blade chassis configured and ready to accept new blade servers. Even with the increased efficiency of blade servers, IT Services will need to increase rack power capacity to run four blade chassis on a single 42U APC rack, and may need to develop a separate cost structure for blade server deployment in order to recoup the cost of supporting the chassis itself. The main question here is whether IT Services wants to provide financial incentives to move to blade servers, similar to how server virtualization is being incentivized. Another question is how a single blade server will be shared across multiple groups.
Measures of Success
Success in this area is really a sub-component of success in other areas: reductions in power consumption and space per server, and servers-per-sysadmin.