Schema.org - An Introduction to Structured Data

Schema.org is a standardized way of using HTML on your website that makes it easier for search engines to find stuff. In this post I'll be going over what schema.org is all about, the problems it solves and why it should be an important weapon in your web-designer arsenal.

Schema.org is an HTML microformat. That means it's kind of like a sub-language within HTML. It's a set of HTML tags and attributes that are used in a very specific way, so that search-engines or other robots can instantly recognize it.

That last bit about robots and search-engines is very important. Schema.org is meant to be read my machines not humans, i.e. it's machine-readable . As you might know, machines can be very unforgiving, so if you make a mistake in your schema.org markup, it can cause the machine in question to entirely disregard your bit of schema.org markup.

So What's Wrong With My Existing HTML?

Nothing really. You're probably an intelligent and handsome guy or gal who uses the latest HTML9 responsive goopstrap5.2 web-standards in all your markup, and that's all well and good.

However, let's imagine you're creating a website for a local indian restaurant, and let's say the name of that restaurant is "India Express".

How is Google supposed to know that your website is about India Express the restaurant, and not India Express the newspaper, or India Express the movie?

You can certainly use keywords like 'restaurant' on your page, but with schema.org we can do a lot more.

The schema.org microformat lets you tell google that your page is about a thing. A thing like a person, movie, product, review, restaurant, recipe, business, event, song etc. The list goes on, there are probably thousands of available schema.org item types at this point and the list is growing.

Data, from Structured to unstructured and back

When we use a CMS like wordpress or drupal, we often store our data in a database. It's the job of the database to structure data in way that makes sense to a machine.

However, when we suck that data out of the database and stick it into our HTML templates, the data looses it's structure. We take it from a machine-readable format and turn it into a human-readable format.

From the perspective of a human being, this is is a good thing. It's much more comfortable for us humans to read "1:16pm on April 29th, 2014" than "2014-04-29T13:16:30+00:00" for example. However, just like humans are not very good at reading machine-readable data, machines are not very good are reading human-readable data.

The funny thing about search-engines though, is that all they have to work with is the human-readable data. Our webpages are made for humans after all, not machines.

If Only Humans and Machines Could Just Get Along

The beauty of schema.org is that is provides machine-readable data from inside your human-readable content.

Schema.org markup is visible to machines, and invisible to humans. That means it doesn't effect what people see on your site. It makes your webpage friendly to humans and machines.

And for this, the machines will reward you. Maybe they'll even throw in something extra, like Google does with it's Rich Snippet search results, which are attained by using schema.org:

If this sounds like something you want to learn, then checkout my next article in this series, Schema.org 101 - An Introduction