Topics Map > University of Chicago > IT Services > Services & Support > Web Development/Hosting/Resources
Google Search Appliance - How does the GSA work?
This article explains how the Google Search Appliance (GSA) works.
The University of Chicago uses a Google Search Appliance (GSA) as the main search tool for public web-based content. The GSA continually indexes new documents as they are posted to the University of Chicago websites, and guides users to relevant content using customized search results. The GSA uses the same technology as google.com: it's a locally run instance of Google focused exclusively on the University of Chicago.
The GSA is managed by the Web Services and Web Administration groups within IT Services. If you have a question or issue with implementing a GSA-powered search form, please contact us at firstname.lastname@example.org.
How does the GSA work?
GSA-powered searches differ from a public Google search in several important ways. The GSA is exclusively focused on University of Chicago content, and we are able to control the focus and timing of the content crawl (aka indexing). We can customize search results through “keymatches” (returning top matches for a word or phrase at the top of the results page) and define “collections” of websites to better focus searches.
Key Matches: If you have suggestion for a word or phrase that should return a certain site at the top of the results page, contact us to request a keymatch.
Collections: if you would like to search multiple, specific sites from a single search form, contact us to request a collection.
Indexing and crawling
The UChicago GSA is set on a “continuous crawl” of UChicago web content — once it completes one round of indexing it immediately starts another. The GSA crawls and indexes content on the following domains:
The GSA uses the following as a starting point:
We have defined a list of exclusion rules that prevent the GSA from crawling certain sites and types of content, both to prevent high server traffic and to stay within the document-indexing limit defined in our license agreement. The following types of content are not included in the UChicago search collection:
- images and media
- database files
- archive files
- binaries and executables
- Apache directory listings
- sites requiring any type of authentication
- dynamic calendars that can result in a high number of document counts (unique URLs)
- directory database listings
- resource reservation systems
- other dynamic sites that may provide a high number of document counts
If we find that your site is contributing to a high document count, we will work with you to resolve the issue.
Google Search Resources
Google Search Appliance email list
If you are using the UChicago GSA for the search form on your site, please subscribe to the Google Appliance email list. This will allow us to contact you with updates and announcements related to the appliance.