Flametoad

Words of wisdom from a combustable amphibian.

HiEdWebDev: Google Search Appliance

Posted Tuesday, October 16th, 2007 at 10:45 am

11 am Tuesday session

I have handouts for this presentation, and the slides will be on the conference website in the coming weeks. This session was presented by someone from Yale, who recently implemented the search appliance.

Google won’t budge much on their price. "Hey, we’re Yale. Cut us a deal." "Hey you’re Yale, you can afford it."

The appliance lets you create collections, which are defined by url groups. For instance, you could collect urls of HR related sites then let users narrow their search to only that collection. Helps weed out worthless links.

Search result can be customized by XSLT to create a branded/customized front end. This goes way beyond just putting your logo on your page. You can use XSLT to include an icon indicating file type - PDF, Word, etc.

Keymatches let you promote specific web pages on your site. This works similar to ad link on google’s site. To create a keymatch you must provide the word, phrase, or exact match criteria for which a specific result will be returned.

Onebox serviers delivers relevant , realtime results from certain 3rd party sytems. This works like typing in weather and getting weather reports in google. One of the 3rd partis is Blackboard. That means you should be able to use the campus search to find a course catalog.

Server status reports include info on the status of webcrawls. You can see where you got 404s, etc. Helps you id broken links or areas denied because of authentication, etc. You also have access to search info, so you can watch for search patterns to further refine search results and websites.

Tips and tricks

Google operators (limiting to site, show all pages linking to url, etc)

Use metadata in your pages and docs. Creating a meta tag "date" field will allow for GSA to use that to sort documents rather than the file last modified date.

Integration into apps.

  • Offload overhead of indexing in custom appls onto GSa
  • Send feeds to GSA to index content
  • Query databasese and index results

Search protected content while maintaining the protection.

Monitor activity through web panel

  • basic operations
  • certain dynamic content generates black holes. For intance, GSA gets stuck in an endless crawl loop of a site
  • Example: calendar application with no end, GSA could crawl to the year 5000!
  • drives up license cost

Managers can be assigned to particular parts to distribute maintenance.

Yale has been running it for just over a year.

Bugs

  • Collections can get corrupted and not display results. Must reboot the GSA.
  • Can take up to 4 hours to reboot
  • creating a report for a large amount of collections for a long persiod will fail
  • database queries are not run upon crawl; google’s solution: run a python script that logs in and manually runs the query

Fail over

Initially not purchased with hot standby or 24/s7 support. After realizing the reboot problem, they upped their license to get the hot standby. Those can be purchased at any point.

New 5.0 version was just released on Oct. 11. Includes secure file system crawling; date biasing; "google enterpries labs" such as search as you type.

I have contact info for the presenter in a file.

Popularity: 18% [?]

RSS feed | Trackback URI

Comments »

No comments yet.

Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

Recent Posts

Monthly Wisdom

Older Wisdom

November 2008
M T W T F S S
« Oct    
 12
3456789
10111213141516
17181920212223
24252627282930

About Flametoad

Flametoad is the personal website for Preston DuBose, a full-time e-commerce and credit card security professional for the higher-education market, a part-time RPG publisher, and a full-time husband and father.

I ignore conventional blogging wisdom and refuse to focus on a single topic. This website covers gaming, family life, marketing, security, literature, music, and just about anything else shiny that catches my eye.

Do you think I might be your long lost nephew, to whom you'd like to bequeath your vast financial empire? Find my e-mail address and read more of my bio on the About Flametoad page.

I get a small thrill every time someone bothers to respond to one of my posts. I get a big thrill when you post naked pictures of yourself. Well, not YOU.

Books I Own

Copyright © Flametoad. All rights reserved. Theme developed with the help of the WordPress Theme Generator.