Skip to content Skip to navigation

Information & Instructions for Web Site Creators

Overview

If you plan to use a Stanford Google Custom Search box on your Stanford web site, please subscribe to search-partners@lists.stanford.edu for notifications of service changes, updates, etc.

How to...

Put a search box on your site

You can add a search box to your web site to help visitors find content on your site. You can restrict the search to a specified directory, or search the entire Stanford web site. The search box will look and behave like this:

The visitor can enter a search term and click the Search button; the page will leave your site and display the search results in a formatted page.

  1. Insert one of the following HTML into your web page where you want the search box to appear:

    Stanford Web Search Box

    Site Specific Search Box (include or exclude a web directory)

  2. Customize the following required parameters:
    • name="q" size="31"
      Sets the width (in number of characters) of the search box. You can change the size to suit your site's layout.
  3. Customize the following optional parameters:

    If you want to restrict your search feature to one specific directory (and its subdirectories), include the following two parameters (as_dt and as_sitesearch).  Restriction to multiple site URLs is not supported.

    If you want the search feature on your site to search the entire Stanford collection, remove these two parameters from your HTML.

    • name="as_dt" value="i"
      This setting determines whether your search should include or exclude the directory specified in "as_sitesearch". Values can be:

      • "i" (include only results in the web directory specified by as_sitesearch)
      • "e" (exclude all results in the web directory specified by as_sitesearch)
    • name="as_sitesearch" value="<yoururl>"
      Pages in the specified directory will be included in or excluded from your search (according to the value of "as_dt").
      e.g.: name="as_sitesearch" value="www.stanford.edu/dept/drama"

      • You must specify the complete canonical name of the host server followed by the path of the directory.
        e.g.:
        • www.stanford.edu/services not www/services
      • If the ("/") character is at the end of the web directory path specified, then only files within that directory will be searched and files in sub-directories will not be considered.
        e.g.:
        • www.stanford.edu/services to include sub-directories
        • www.stanford.edu/services/ to exclude sub-directories
      • as_sitesearch allows allows you to specify one directory (and all its sub-directories) as the domain to be searched—you cannot specify multiple disparate directories using this option.
      • If you want the search feature on your site to search the entire Stanford web site, delete this parameter.

Get pages into the index

Google Custom Search uses the Google index. All you need to do to get your web pages into the Stanford/Google index is:

The Google crawler will pick up changed, new, and removed pages automatically when it visits Stanford web sites.  Content crawl frequency is dependent on how important Google's algorithm believes it to be. For instance, pages that Google believes to be important and quickly changing are crawled frequently, while others are crawled less frequently (up to two weeks before being revisited).

If a page is not in the index, perform a search for all pages that link to your page. The syntax for this search is: "link:yourdomain.com" For example, to see if pages link to your personal page at Stanford, you would enter "link:http://www.stanford.edu/~mypage" into the http://www.stanford.edu search box. The results will give you a list of all pages that link to your page.

If you would like your page(s) to be listed in the Stanford index, visit http://www.stanford.edu/atoz/ and click on the "suggestions" link.

Note that if your web pages do not have any external links from other pages in the Stanford search collection, they won't be picked up by the Google crawler.

Keep pages out of the index

If you don't want a page to be indexed, insert this <meta> tag within your page's <head> tag:

<head>
<meta content="noindex, nofollow">
</head>

This will prevent crawlers (robots) from indexing the page, and from following any links from the page. If the page has already been indexed, it will be removed from the index the next time Google crawls the page.

You can prevent the pages in a directory from being indexed by restricting access to the directory with webauth.

Stanford's configuration of Google custom search

Search domains

Stanford's search collection includes all the web pages in these domains:

...that are not specifically excluded by:

  • the search administrator
  • a noindex <meta> tag in the page's HTML
  • password (including webauth) protection
  • restricted-access files and/or directories

Web pages excluded by the search administrator

Web pages in the following directories (and their subdirectories) are excluded from the Stanford search collection:

  • URLs being phased out of use
    e.g.: http://www-leland.stanford.edu
  • webauth-protected (or otherwise restricted-access) pages and directories
  • specific pages kept out of the index at the request of their owners

These pages have been excluded for a variety of system performance, copyright, license, and University policy reasons.

Additional directories or pages not listed here may have been excluded by the search administrator. If you think your page may have been excluded and don't want it to be, enter a HelpSU request.

Crawling schedule

Google crawls Stanford web sites at different paces depending on how its algorithm handles different factors like relevancy, quality, type, frequency of update, and what other pages link to the content. Please read the Web Search FAQ (linked from the right sidebar) for more information about getting your page or web site indexed.

Last modified May 11, 2012