Blog News Channel


Google Suggests Guts Disassembled – Part 2

Posted in Uncategorized by Nathan Weinberg on the December 24, 2004

Slashdot introduces us to this analysis of Google Suggest, which goes even deeper than previous dissections of the Google Suggest engine. Some of the interesting discoveries:

    A simple program which finds all possible suggestions for any given starting term.

    Google Suggest ignores quotes.

    Order is more important than the actual terms.

    The number of suggestions is actually smaller than expected, making it possible for someone to implement Google Suggest on their own server, even implementing the whole thing in RAM (its that small).

The most startling thing is that Google Suggest is actually based more on searches than results. To explain: Google Suggest returns results that are not in Google’s index, or for terms that Google can never get to, because it indexes searches made as well as searches found. What does this mean? If you have typed a UPS tracking number into Google (something typical, because Google has searches for tracking numbers built it), it can find its way into Google Suggest. Just go there and type in “1ze” and watch the numbers pop up (all from packages delivered in the last six weeks). Does this mean credit card numbers could be in there as well? Less likely, but possible. Ironically, if you’ve ever searched for your credit card number to make sure it wasn’t publicly available, you may have inadverantly added it to Google Suggest. Oy.

Related posts:

Google Suggest – 12/10

Google Suggests Goooooooooooooooogle – 12/10

Google Suggest Tools – 12/11

The Google Suggest Complete My Sentence Game – 12/15

Google Suggests Guts Disassembled – 12/18

Google Suggest Poetry Generator – 12/20

2 Responses to 'Google Suggests Guts Disassembled – Part 2'

Subscribe to comments with RSS or TrackBack to 'Google Suggests Guts Disassembled – Part 2'.

  1. Noam said,

    Google Suggest aside, I remember that a while ago, probably in /. as well, someone pointed out that if you want to search for credit card numbers, you can try searches like 1000000000000000..9999999999999999. This will get you all the pages that contain the numbers in between – that is, 16-digit numbers.
    Try entering such ranges in Google Suggest, and you’ll see that other people have tried them as well.

    That said, I hope Google will censor Suggest a bit, to avoid what you just mentioned. Until they manage to make the censoring algorithm smart enough, they can just remove searches with number tokens in them.

  2. Nathan Weinberg said,

    Well, since Google already censors the results for dirty words, it couldn’t be too hard to cut out all 16 digit numbers and queries starting with 1ze.


Leave a Reply