Tuesday, December 7, 2010

How to upload to Freebase part I: domains and types and properties, oh my!

I've been admiring the Wikipedia of databases, Freebase, from afar for a long time. I've kept my distance because if you think learning wiki syntax is a challenge, try figuring out where to even put factoids about your favorite books or musicians on Freebase. If you want your data to be useful to more people than just yourself, though, it is critical that you get the organization right. In honor of Open Data Day last weekend, I've decided to finally figure out how to upload data to Freebase (instead of adding three things by hand and giving up when the knowledge that I could add 1000 with a little PHP gets too unbearable).

The first step when you'd like to add data to Freebase is pretty easy: determine if your data would be useful to other people, and if you have a right to upload it. If the answer to both is yes, proceed!

But then we're on to step 2 -- where does your data belong? You need to determine your data's structure. My first through tenth passes at adding to Freebase probably went through the Basic Concepts wiki, which falls prey to the downfall of many a wiki page -- really bad organization. (It looks like someone played a hand of Yahtzee with all the abstract concepts you need to understand to put your data in the right place on Freebase, then wrote it up in wiki form. I'd re-organize, but every attempt to cut down redundant information and leave things in reasonable order on Wikipedia has left me reversed, scolded, and frustrated.)

So let me try to clarify here (feel free, Freebase wiki editors, to grab any content that's useful, but please don't just sprinkle it hither and thither within the page).

My first hope that I might make sense of Freebase yet came from the Freebase Schema Explorer app.

Just the name gave me hope. The schema is the structure of the data, so a schema viewer is just what I was in the market for.

Right away, I saw that Freebase data is organized in domains, like Books (accessed at http://www.freebase.com/view/book).



Domains (like Books) have types, like Poem, Short Story, and Book (accessed at http://www.freebase.com/view/book/book). This could be confusing, because other users, probably baffled by the wiki like me, have added things like ISBN and Book Character as types of books. I know we're in data hippie land and everyone is a special flower, but that's frankly wrong. "Book Character" is not a type of book.



Types (like Poem, Short Story, or Book) can have both instances and properties, accessed at http://www.freebase.com/view/book/book. For the domain Books of type Book, an instance would be something like The Catcher in the Rye. (It seems that Freebase calls instances "topics", but the Schema Viewer nails it better with "instance," I think.) Examples of book properties include Characters and Genre.



For the data I'll probably add to Freebase first -- podcasts -- the domain is Broadcast (/broadcast), with the type Podcast Feed (http://www.freebase.com/view/broadcast/podcast_feed).

As of this writing, there are 2,584 Podcast Feed instances/topics. Their properties include Name (example: Wired's Alt Text), Image, and Average Media Length. Already I see a flaw, which I'll need to correct if I'm going to use this data for my Podcast Finder -- there's no podcast creator (or Podcaster) property listed. And the Freebase wiki noted that one can't edit a schema created by another Freebase user -- I'd have to duplicate all 2,584 instances/topics and set up my own schema.

Will I figure out how to add the Podcaster property? Will I duplicate the Podcast Feed type? Will I lose my Internet connection because I stayed up too late blogging, slept through a WebEx, and lost my job?

Find out in the next installment of this exciting series, wherein I shall explore yet another query language, MQL, and hopefully start adding all information ever to an easily-queried free online database.

(Or I'll take another two years off blogging and come back in 2012 blogging about how excited I am about our new lady President. Or arsenic-based space aliens. Stay tuned!)

No comments: