Category: 

What are Statistically Improbable Phrases?

Article Details
  • Written By: Tricia Ellis-Christensen
  • Edited By: O. Wallace
  • Last Modified Date: 13 July 2014
  • Copyright Protected:
    2003-2014
    Conjecture Corporation
  • Print this Article
Free Widgets for your Site/Blog
The highest reliably recorded surface temperature in the world was in Death Valley at 134 F (57 C).  more...

August 20 ,  1955 :  Hundreds of people were killed in anti-French rioting in Morocco and Algeria.  more...

Statistically Improbable Phrases, or SIPs, is a search technology developed by Amazon.com to search the content of books for phrases contained in them that are likely to be unique or occur often. This is part of Amazon’s patented Search Inside!® technology program. Essentially, Search Inside® gives Amazon access to the partial or full text of a book, so that certain phrases can be used to identify that book if statistically improbable phrases are used in a search.

The name for this technology is a bit confusing. When you perform a search, you want what you’re searching for to match closely. By identifying a unique phrase in a book, if you use that phrase to search it is improbable that your search will list something you don’t want. If you’re looking for a specific book and can’t remember the title but can remember a quote from it, you could use the quote to search for the book.

Alternately, you might want to search for a specific subject, within a larger subject matter. For instance, if you wanted to search for a book with career advice, but what you really wanted to read about was how to network for jobs you might search for “networking” instead of “career advice.” Immediately, some of the most relevant searches appear on the Amazon search results page including books like Dig Your Well Before You’re Thirsty: the Only Networking Book You’ll Ever Need.

Ad

If you have searched with these types of statistically improbable phrases, you may note that you can get results that aren’t exactly a good match. For instance, the first search yield for networking is not for career networking, but for computer and technology network information. You can make better statistically improbable phrases by being more specific. For instance, you glean better results by searching under career networking or job networking.

Statistically improbable phrases are actually probable phrases, since it’s likely a phrase unique to a Search Inside!® book will head the list of things you search for. You could for instance, enter a line of Shakespeare from a Shakespeare sonnet to bring up books on Shakespeare. This doesn’t always work well since some very well known quotes are used in a lot of other books as titles. You won’t find Hamlet if you search for “To be or not to be.” Nor will you find Macbeth with statistically improbable phrases like “Out! Damn spot.” In fact, under this latter term, the first book you’ll find is one on stain removal.

Using statistically improbable phrases is also a way to search for web content, and web crawlers may use similar technology so that people can search most effectively and specifically for certain unique lines. It isn’t a perfect technology since a web crawler doesn’t necessarily assess the content. It may look for keyword repetition that allows for people to find pieces with the higher number of keyword repetitions. Not all books on Amazon have Search Inside!® technology, but this appears to be the trend. Ultimately even if the system is slightly imperfect, it could cut down on search time.

Ad

Discuss this Article

Post your comments

Post Anonymously

Login

username
password
forgot password?

Register

username
password
confirm
email