Category: 

What is Canonicalization?

Article Details
  • Written By: Mary Elizabeth
  • Edited By: Bronwyn Harris
  • Last Modified Date: 31 August 2016
  • Copyright Protected:
    2003-2016
    Conjecture Corporation
  • Print this Article
Free Widgets for your Site/Blog
Due to synthetic materials and furnishings, new homes burn about five times faster than those built 30 years ago.  more...

September 25 ,  1789 :  The US Bill of Rights was adopted.  more...

The word canonical means something that conforms to an accepted standard. Canonicalization — or canonicalisation in British English — is the process whereby something is brought into conformity with the accepted standard. In the realm of computers, the term canonicalization is used to refer to meeting standards in several different areas. It is often taken to be the problem, when it is actually the solution to a variety of problems. Since it is such a long word, canonicalization is abbreviated using its first and last letters and the number of letters in between: c14n.

Canonicalization is used in IT (Information Technology) in several settings. It refers to email sender addresses, to filename construction, to string encoding in Unicode, to the use of XML (EXtensible Markup Language), and to URL (Uniform Resource Locator) construction. In every case, the problem is the capacity for multiple formats representing the same item, with canonicalization being the way to consistency and standardization.

Take XML as an example. XML allows for syntactic changes. This means that two documents that are not identical could have the same canonical form, and thus be functionally equivalent. The Canonical XML specification was designed to address this by establishing a method by which the identity of separate documents can be established. The method for generating the canonical form for any given XML document is called the XML canonicalization method.

Ad

For URL canonicalization, the idea is to refer to a specific webpage consistently by one URL. The simplest example is two versions of a homepage, one of which has the three w’s and the other doesn’t:

http://www.wisegeek.com

versus

http://wisegeek.com

This is a problem for SEO (Search Engine Optimization) because it divides the reports for traffic, all of which is actually going to the same place. The result is that the site with multiple URLs for the same pages seems to be performing more poorly than it actually is.

There are other issues besides the w’s. These include trailing slashes and differences between URL versions with upper and lower case letters. Matt Cutts of Google® recommends addressing this by using a permanent (301) redirect of all alternative URLs to the URL you want, allowing search engines to judge which is the canonical URL.

Ad

You might also Like

Recommended

Discuss this Article

Post your comments

Post Anonymously

Login

username
password
forgot password?

Register

username
password
confirm
email