Topics Map > University of Chicago > IT Services > Services & Support > Web Development/Hosting/Resources

Google Search Appliance (GSA) Overview

The University of Chicago uses a Google Search Appliance (GSA) as the main search tool for public web-based content. The GSA continually indexes new documents as they are posted to the University of Chicago websites, and guides users to relevant content using customized search results. The GSA uses the same technology as google.com; it's a local instance of Google Search focused exclusively on the University of Chicago.

The GSA is managed by the Web Services and Web Administration groups within IT Services. If you have a question or issue with implementing a GSA-powered search form, please contact us at search@lists.uchicago.edu.

How GSA Works

GSA-powered searches differ from a public Google search in several important ways. The GSA is exclusively focused on University of Chicago content, and we are able to control the focus and timing of the content crawl, or indexing). We can customize search results through keymatches, returning top matches for a word or phrase at the top of the results page and define collections of websites for better search focus.

  • Key Matches: if you have a suggestion for a word or phrase that should return a certain site at the top of the results page, contact us to request a keymatch.
  • Collections: if you would like to search specific multiple sites from a single search form, contact us to request a collection.

Indexing and Crawling

The UChicago GSA is set on a continuous crawl of UChicago web content. Once it completes one round of indexing it immediately starts another. The GSA crawls and indexes content on the following domains:

  • uchicago.edu
  • chicagobooth.edu
  • uchospitals.edu
  • uchicagokidshospital.org

The crawl begins with www.uchicago.edu.

GSA crawls by following links. It will only index content if it is linked from another indexed page. It follows HTML links in PDF files, Word documents, and Flash content. The crawler does not follow HTML links embedded in Javascript code, and it cannot submit HTML forms.

We have defined a list of exclusion rules that prevent the GSA from crawling certain sites and types of content, both to prevent high server traffic and to stay within the document-indexing limit defined in our license agreement. The following types of content are not included in the UChicago search collection:

  • Images and media
  • Database files
  • Archive files
  • Binaries and executables
  • Apache directory listings
  • Sites requiring any type of authentication
  • Dynamic calendars that can result in a high number of document counts (unique URLs)
  • Directory database listings
  • Resource reservation systems
  • Other dynamic sites that may provide a high number of document counts

If we find that your site is contributing to a high document count, we will work with you to resolve the issue.

Google Search Resources

Google Search Appliance Email List

If you are using the UChicago GSA for the search form on your site, please subscribe to the Google Appliance Email List. This will allow us to contact you with updates and announcements related to the appliance.




Keywords:web, site, website, index, crawling, keymatch, collections   Doc ID:19389
Owner:Alan T.Group:University of Chicago
Created:2011-08-02 19:00 CDTUpdated:2017-03-16 15:48 CDT
Sites:University of Chicago, University of Chicago - Sandbox
Feedback:  1   2