Software

Fact-checked

What is XML or Extensible Markup Language?

R. Kayne

Last Modified Date: February 18, 2024

XML (eXtensible Markup Language) is a less complex, more concise dialect of the larger more complex SGML (Standard Generalized Markup Language). In the simplest terms, this language uses syntax tags to identify various types of data in a file. For example:

XML makes it very easy for various programs to extract data because the tags conform to particular models. Above we see a data model with 7 elements:

Data contained in a webpage written in XML can be manipulated, extracted by various database systems and recycled.

Client (parent element), contains 6 nested elements: name, street, city, state, zip and phone.
Name: XML-aware programs know this field holds the name of a person or company.
Street: This field will hold a street address.
City: Here it will find the city.
State: The abbreviated 2-letter state code.
Zip: The 5-digit zip code.
Phone: The phone number.

To define the data model or content of each of the elements a DTD (Document Type Definition) can be used. A DTD is one way to define the structure or tree of an XML document. A DCD (Document Content Description), DDML (Document Definition Markup Language), SAX (Simple API for XML), and XSCHEMA are others.

Although the simplified example above only hints at the very basics of this language, it should already be clear how it differs from HyperText Markup Language (HTML). The tags used in HTML dictate how material should be presented on a webpage, without indication as to what the material is. Manipulation or reuse of data inside a HTML file for further purposes is not viable. However, data contained in a webpage written in XML can be manipulated, extracted by various database systems and recycled.

When using HTML to create webpages style sheets are often used. Known formally as Cascading Style Sheets, a CSS adds style elements to a HTML webpage.

Webpages can also be written in XML. In this case the equivalent of the CSS is the XSL (eXtensible Stylesheet Language), implemented the same way.

The XSL actually serves two functions. It contains instructions to XML-aware browsers how to render the code for graphic purposes, but it also contains instructions for transforming the data into other formats, referred to as XLST. XSLT can generate a file different in structure from the original. This is especially useful in areas like e-commerce, for example, where customer input like name, credit card number, dollar amount, etcetera is pushed through a series of system programs in the processing of payment. The transforming function of XSLT is unhindered by rendering needs, and solely concerned with moving data successfully between networks and programs for processing purposes. In an age where exchange of information is in such high demand XML is an obvious choice over HTML.

XML, created by the World Wide Web Consortium (W3C), is part of a large family of markup languages and is defined as a metalanguage — a language that describes other languages. One of the goals of W3C was to make XML "optionless" so that it would remain pure, unlike HTML which has many different conventions and as a result is rendered differently on various browsers, making it difficult to present data in a uniform fashion.

As of the March 2005 tests, released by the W3C revealed Microsoft Internet Explorer 6.0 SP2 had limited XML ability, reportedly using it own flavor of the language that may not always comply with the standards set forth. Netscape had good compatibility with a few problems in the 8.0 beta version, while Firefox and Mozilla had the best results of free browsers, with fully implemented, 100% compatible XML rendering ability in all of their available browser versions to date.

Because XML-pages provide so much more flexibility than HTML pages, it is expected to replace HTML as the language of choice. For more information, you can visit the official W3C site. Online tutorials and many books are also available. Learning the language may require a ramp-up but experts predict the investment is well worth the time.

AS FEATURED ON:

Discussion Comments

nony

May 10, 2011

@NathanG - I'd like to add that the biggest advantage of XML data is that it can go through firewalls. In the old days (yes, the Internet is really old) the format chosen to send meaningful data across networks was either text or binary. Text would traverse networks just fine, but binary data would get clobbered by the firewalls because this kind of data can be “executed” on the end user’s machine and thus represented a security threat.

XML data is not binary data—an XML file cannot “do” anything on your computer and thus is not a security threat. However, unlike mere text files an XML file can be viewed with an XML viewer or queried using XML query languages like XPath, making it more like a database as you said.

NathanG

May 8, 2011

@hamje32 - I beg to differ with your claim (and that of this article) that XML will replace HTML. HTML is a markup language designed to provide a convention or style for how data is to be presented on the web. XML is not concerned about presentation. It is concerned about the nature of the data itself—what it’s called and what the names are for the individual fields.

Think of XML as a database “lite.” In fact, there are some open source databases out there that use XML as their data structure because of its hierarchical format.

hamje32

May 7, 2011

XML may indeed replace HTML one day as the de facto standard for data representation, but an XML format is far more selective than HTML is about rules and conventions. For example, in XML you need to enclose both starting and closing brackets in order to complete an item in the XML node, otherwise the browser will complain that your XML is not properly formed after performing an XML validation.

HTML makes no such distinction. In HTML, if you put a paragraph break mark, for example, you can get away with just using the beginning mark and skipping the ending mark altogether. The HTML will still display properly. XML is definitely not a lazy standard, but it certainly is the future of the web.