From Topic Maps to MediaWiki – Quick and Dirty

July 1, 2009 by Thomas Hansen

Recently, I needed to make some fairly large bodies of XML available for editing by a group of people. In this case the data was stored in the Topic Maps format (XTM), and –as long as I was the only one editing the files– this had been working just fine.

But with more people about to join in, it was clear that editing the files in a simple text editor wasn’t such a good idea. So, to avoid the risk of ending up with different versions (and people endlessly complaining about editing XML), I decided to turn the whole thing into a wiki.

Now, MediaWiki has the Special:Export tool for migrating wikis (’transwikiing’). It exports pages  in a simple XML format, so that you can import it to another wiki. This way you’re able to create a wiki simply by emulating the MediaWiki XML export format.

How to

If you want to try it, the MediaWiki output has to look a little something like this:

<?xml version="1.0" encoding="utf-8"?>
<mediawiki xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="http://www.mediawiki.org/xml/export-0.3/"
  xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/
  http://www.mediawiki.org/xml/export-0.3.xsd"
  version="0.3" xml:lang="da">
<page>
 <title>Google</title>
 <id>1</id>
 <revision>
  <id>1</id>
  <timestamp/>
  <contributor>
   <username>yourUserName</username>
   <id>1</id>
   </contributor>
   <text xml:space="preserve">
   <!-- Wikitext goes here -->
   ==Link==
   [http://www.google.com]

   </text>
  </revision>
</page>
<page>
 <title>Microsoft</title>
 <id>2</id>
 ...
</page>
</mediawiki>

If your data is XTM, your starting point might be something like this made-up Topic Map with names and links of three companies:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topicMap SYSTEM "xtm1.dtd">
<topicMap id="companies-tm.xtm"
  xmlns="http://www.topicmaps.org/xtm/1.0/"
  xmlns:xlink="http://www.w3.org/1999/xlink">
 <topic id="001">
  <baseName>
   <baseNameString>Google</baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.google.com"/>
  </occurrence>
 </topic>
 <topic id="002">
  <baseName>
   <baseNameString>Microsoft</baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.microsoft.com"/>
  </occurrence>
 </topic>
 <topic id="003">
  <baseName>
   <baseNameString>Oracle<baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.oracle.com"/>
  </occurrence>
</topic>
</topicMap>

In this case the following XSLT stylesheet will do the job:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:tm="http://www.topicmaps.org xtm/1.0/"
   xmlns:tmlink="http://www.w3.org/1999/xlink"
   exclude-result-prefixes="tm tmlink" version="2.0">
 <xsl:output method="xml" encoding="utf-8" indent="yes"/>
 <xsl:template match="/">
  <xsl:apply-templates select="tm:topicMap"/>
 </xsl:template>
 <xsl:template match="tm:topicMap">
  <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/
  http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="da">
   <xsl:apply-templates select="tm:topic"/>
 </mediawiki>
</xsl:template>
<xsl:template match="tm:topic">
<page>
 <title>
  <xsl:apply-templates select="tm:baseName/tm:baseNameString"/>
 </title>
 <id><!--To give each page a unique number, use the xsl:number instruction--><xsl:number/></id>
 <revision>
  <id>1</id>
  <timestamp/>
  <contributor>
   <username>yourUserName</username>
   <id>2</id>
  </contributor>
  <!--Since whitespace is crucial to the layout of your wikipage,
you should add the xml:space attribute and set the value to 'preserve'-->
  <text xml:space="preserve">
  <!--Now start building your wikipage -->

==Links==
<xsl:value-of select="tm:occurrence"/>

</text>
</revision>
</page>
</xsl:template>
</xsl:stylesheet>

Therefore:

  • Make sure that your wiki is installed, AND that you have admin rights
  • Create a stylesheet, somewhat like the one provided above
  • Run the stylesheet on your XML file, for instance from your command line with saxon:
    $ saxon topics.xtm topicMaps2Mediawiki.xsl > mediawikiTopics.xml
  • Go to the Special:Import page on your wiki
  • Browse for the file, and
  • Upload! Do remember, however, that the filesize maximum defaults to around 1.4 MB. To change it, you need to go to php.ini and simply change the parameters for maxuploadsize=.

After uploading the file, you’ll receive a list of links to the pages, you just made.

The Case for Content Strategy

June 25, 2009 by Thomas Hansen

Over the last couple of years I’ve come to appreciate the term content strategy. It began in 2007 with Rachel Lovinger’s article Content Strategy: The Philosophy of Data. Here she urged readers to take a closer look at content itself, and then find out exactly who’s responsible for making it relevant, comprehensive, and efficient to produce.

I liked that, because it touches upon the very basics of communication, something which, I think, is somewhat neglected at the expense of design issues (keeping sentences short, using chunked text, putting action in verbs, etc.). Way too often, content is taken for granted. It’s what the customer brings to the agency, or something to be filled in later instead of the “lorem ipsum” gibberish, designers use.

Basically, content strategy adresses the issues of anyone trying to communicate anything, i.e. how to make your website function as:

  • a truthful representation of the sender’s intentions
  • a message relevant to the user
  • a correct use of language and imagery
  • an open channel between reader and author

And, of course, if you’re any good at writing, your text might even have an aesthetic value on its own.

Producing useful and useable web content on a daily basis isn’t a matter of being touched by the hand of god, or endowed with the perfect content from your client; it’s a matter of planning, and you need to be a part of it. Since internet communication involves quite a few disciplines, there’s a lot to plan for. A few things to consider:

  • Editorial strategy defines the guidelines by which all online content is governed: values, voice, tone, legal and regulatory concerns, user-generated content, and so on. This practice also defines an organization’s online editorial calendar, including content life cycles.
  • Web writing is the practice of writing useful, usable content specifically intended for online publication. This is a whole lot more than smart copywriting. An effective web writer must understand the basics of user experience design, be able to translate information architecture documentation, write effective metadata, and manage an ever-changing content inventory.
  • Metadata strategy identifies the type and structure of metadata, also known as “data about data” (or content). Smart, well-structured metadata helps publishers to identify, organize, use, and reuse content in ways that are meaningful to key audiences.
  • Search engine optimization is the process of editing and organizing the content on a page or across a website (including metadata) to increase its potential relevance to specific search engine keywords.
  • Content management strategy defines the technologies needed to capture, store, deliver, and preserve an organization’s content. Publishing infrastructures, content life cycles and workflows are key considerations of this strategy.
  • Content channel distribution strategy defines how and where content will be made available to users. (Side note: please consider e-mail marketing in the context of this practice; it’s a way to distribute content and drive people to find information on your website, not a standalone marketing tactic.)

I didn’t make that list (it comes from Kristina Halvorson, and it’s part of the article The Discipline of Content Strategy), but I agree. All of these branches are tools that help us create meaningful user experiences.

While there are obvious overlaps between content strategy and information architecture, I think that the two first disciplines on the list add something genuinely new. It’s not enough to structure and make the things on your website findable, you also need to make sure that the very content you’re providing is right for the occasion.

So, ultimately, it’s all about efficiency, and planning supports efficiency. Since creating content is both difficult and expensive (and always seems to be somebody else’s job), you want to make sure that every aspect of it performs at its best, and therefore there’s good reason to take the concept of content strategy (CS) seriously.

See also Jeffrey MacIntyre’s eloquent Content-tious Strategy.

Introducing EPUB

June 15, 2009 by Thomas Hansen

With digital books finding their way to more and more, people read everywhere and on a variety of different devices. A lot of these have small displays, and this is a problem if the text you’re reading is in PDF.

EPUB is an XML publishing format for reflowable digital books and publications standardized by the International Digital Publishing Forum (IDPF), a trade and standards association for the digital publishing industry. For the record, this organization was formerly known as Open eBook Forum. “Reflowable” means that it scales to fit different screen sizes.

Since its official adoption by IDPF in 2007, EPUB has become popular among major publishers as Hachette, O’Reilly and Penguin. The format allows publishers to produce and send a single digital publication file through distribution, and it can be read using a variety of open source and commercial software. You can use O’Reilly’s Bookworm online for free, and you can go buy Adobe’s Digital Editions (ADE). It works on all major operating systems, on e-book devices (like Kindle and Sony PRS), and other small devices such as the Apple iPhone.

Collectively referred to as EPUB, the format is made up of three open standards:

  • Open eBook Publication Structure Container Format (OCF): Describes the directory tree structure and file format (zip) of an EPUB archive
  • Open Publication Structure (OPS): Specifies the common vocabularies for the eBook, especially the formats allowed to be used for book content (for example XHTML and CSS)
  • Open Packaging Format (OPF): Defines the required and optional metadata, reading order, and table of contents in an EPUB

To learn more, Liza Daly of Threepress has done a nice tutorial called Build a digital book with EPUB, available at IBM developerWorks. To really get to know EPUB, you’ll need to read the specifications: OCF, OPS, and OPF.

Civilisation works great on TV

March 26, 2009 by Thomas Hansen

On smashing telly! I saw a Channel 4  program: The 50 Greatest Documentaries, and among the ones featured was BBC’s 1969 (colour!) venture Civilisation with Kenneth Clark.

One of the things that makes Civilisation great TV is that it’s such a personal account. This isn’t anonymous lecturing under the guise of scientific objectivity, but a passionate plea for culture in a society threatened by a cold war suddenly turning hot.

Here’s a little sample. Take it away, Kenneth!

The entire series is for sale here

Sensation: Knowledge isn’t power anymore

March 23, 2009 by Thomas Hansen

Jon Stewart’s crusade against CNBC sure stirred up a lot of dust. Just as the interview with Jim Cramer, one of CNBC’s financial experts, it made headlines all over the world.

I wasn’t that impressed. Sure, righteous Stewart exposing wicked Cramer was good television, but it wasn’t serious journalism.

I mean, of course CNBC’s (baconian) statement that ‘knowledge is power’ is both bold and ridiculous. And claiming to be the ‘ONLY network with the knowledge YOU NEED’ doesn’t help either. It’s bold because CNBC is the tabloid version of financial television news, and it’s meaningless because in the market, information is only valuable when it’s a secret. A piece of financial information which everyone knows is worthless, since the market has already accounted for it.

It’s a case of the old law of supply and demand. When knowledge is all around, it stops being valuable. And since the market is constantly changing, information considered correct and useful may turn out the next minute to be incorrect and useless.

Seriously, that’s no sensation, but I must admit: schadenfreude works on TV.

Where journalism is going

March 13, 2009 by Thomas Hansen

These days, with Depression 2.0 and all, it can be rewarding to take a quick recap on what actually sparked the whole thing. This little gem brings you

The Crisis of Credit Visualized from Jonathan Jarvis on Vimeo.

Clay Shirky on Categories and Time

February 26, 2009 by Thomas Hansen

Some time ago I commented on the value of folksonomies, and I basically (still) think that tagging can be as valuable as ’scientific’ categorization schemes such as the Dewey Decimal Classification System.

In the meantime, however, I’ve come across this truly great Clay Shirky clip from 2005. It’s a conference talk on What Time Does to Categories, and one of his conclusions is that user generated indexes like folksonomies may actually prove more durable than its ’scientific’ predecessors.

Here’s why:

Out with the new, in with the old

February 18, 2009 by Thomas Hansen

For the moment it certainly seems as though the public uproar against Facebook’s recent changes to the Terms of Sevice has had an effect. On his blog Mark Zuckerberg  says that:

A couple of weeks ago, we revised our terms of use hoping to clarify some parts for our users. Over the past couple of days, we received a lot of questions and comments about the changes and what they mean for people and their information. Based on this feedback, we have decided to return to our previous terms of use while we resolve the issues that people have raised.

Very well -at least for now. It’ll be interesting to see exactly how (and if) these issues will be resolved.

I remain sceptical, because Zuckerberg appears to be talking about the change of ToS as an attempt to get rid of what he regards as “overly formal and protective … [language]” in the old ToS.

But this is simply downplaying a genuine disagreement between Facebook and it’s users. The new ToS are suspended, not abolished, and so the question remains: Exactly who owns the content you create on Facebook? You do, for now; but for how long?

Facebook Owns You

February 17, 2009 by Thomas Hansen

If you’re just a teeny weeny bit like me, you don’t like the idea of Facebook owning your content AFTER you’ve closed your account. That’s why I’ve joined this group on… er… Facebook, and you should too.

What’s the fuss? Well, we’ve grown accustomed to owning our content ourselves; that was the deal according to the old Terms Of Service (TOS). But with the new TOS, it’s a different story:

“You hereby grant Facebook an irrevocable, perpetual, non-exclusive, transferable, fully paid, worldwide license (with the right to sublicense) to (a) use, copy, publish, stream, store, retain, publicly perform or display, transmit, scan, reformat, modify, edit, frame, translate, excerpt, adapt, create derivative works and distribute (through multiple tiers), any User Content you (i) Post on or in connection with the Facebook Service or the promotion thereof subject only to your privacy settings or (ii) enable a user to Post, including by offering a Share Link on your website and (b) to use your name, likeness and image for any purpose, including commercial or advertising, each of (a) and (b) on or in connection with the Facebook Service or the promotion thereof.”

Learn more from  Erick Schonfeld on TechCrunch.

Update (February 18th, 2009): Over at Slashdot Ian Lamont has news about Facebook’s measures to contain the ToS fallout.

Morrissey’s Years Of Refusal Out Now!

February 16, 2009 by Thomas Hansen

Today is Moz day! The new album Years of Refusal is on sale now, and very soon I’ll be heading out in a beautifully snow clad Copenhagen to get my copy.

Not gonna say very much about it just now, but from the few tracks I’ve heard it seems that there’s more edge to this one than its 2006 predecessor Ringleader Of The Tormentors.

I’m not talking about the strength of the lyrics or the songs, it’s a production thing. Producing a Morrissey album you should be able to appreciate the contradictory instead of trying to abolish it.

Perhaps Jerry Finn, who also produced You Are The Quarry back in 2004, thought so too. At any rate it seems that he has given much more prominence to the band (and most certainly the drummer) than Tony Visconti did on Ringleader Of The Tormentors. It adds friction, and I like that.

So sad that Years Of Refusal was to be Jerry Finn’s last Morrissey album. Finn recently died tragically at age 39.

For more info on downloads and the forthcoming tour check out itsmorrisseysworld.com or simply just have fun on myspace.