Archive Page 2

EPUB now available on Google Books

I’m happy to learn that Google Books have made their public domain books available for download in the EPUB format. This is a nice supplement to the existing image-based PDF version, because you’re no longer tied to large size displays -which, obviously, is where PDF works best.

epub

In a previous post I outlined the advantages of EPUB, but they’re well worth restating: EPUB is a free open standard designed to make text adapt (“reflow”) even to the smallest displays, and it’s supported by a growing ecosystem of digital reading devices.

All you need to get started on classics like Treasure Island is a reader. For instance, O’Reilly’s Bookworm is free online, and available in a growing number of languages. If you’re an iPhone user, you can install Stanza. Perhaps I should add that these two readers have been reviewed in Wired.

However, Google Books is not the only place, you can download EPUBs; ManyBooks, Feedbooks and Project Gutenberg are also available.

This is not transparency

A key factor in establishing authority on the internet is, as David Weinberger convincingly argued, transparency:

What we used to believe because we thought the author was objective we now believe because we can see through the author’s writings to the sources and values that brought her to that position. Transparency gives the reader information by which she can undo some of the unintended effects of the ever-present biases. Transparency brings us to reliability the way objectivity used to.

I agree with much most of it, and perhaps the point can be further illustrated by a quick example. If you take a look at the Wikipedia article on the epistemological sense of, well, Transparency, the contrast between then and now will be clear:

WikipediaTransparency
As you can see, there’s an explanation and a reference to an article by professor Paul Boghossian. The reference is the interesting part, because in academia this is perfectly sufficient for convincing readers that the material can be trusted. At least, it leaves you with an idea of what to do when you get to the library.

But the internet isn’t like the research library at all. Here, everybody could have made the claim that a certain Paul Boghossian said so and so about transparency, but, since links to resources supporting it (e.g. Wikipedias article on Paul Boghossian, for one) are extremly few, the article isn’t transparent and doesn’t meet Wikipedia’s requirements for verifiability, let alone follow conventions of the internet media.

Transparency is not the new objectivity, but comprehensiveness just might be

In a terrific post, Transparency is the new objectivity, David Weinberger argues that the hyperlink nature of the internet is reshaping our notions of authority. With everybody suddenly a potential author, the old claim to objectivity seems more and more trite and outworn:

Objectivity used to be presented as a stopping point for belief: If the source is objective and well-informed, you have sufficient reason to believe. The objectivity of the reporter is a stopping point for reader’s inquiry. That was part of high-end newspapers’ claimed value: You can’t believe what you read in a slanted tabloid, but our news is objective, so your inquiry can come to rest here. Credentialing systems had the same basic rhythm: You can stop your quest once you come to a credentialed authority who says, “I got this. You can believe it.” End of story.

Instead we demand transparency; to be able to “see through the author’s writings to the sources and values that brought her to that position.”

Transparency gives the reader information by which she can undo some of the unintended effects of the ever-present biases. Transparency brings us to reliability the way objectivity used to.

I think that this kind of “hyper-transparency” -where citing a book isn’t enough, but where a link has to point to the actual resource- may be an essential feature of the internet medium; but whereas it certainly is a necessary condition for establishing reliability, it’s hardly sufficient. After all, what leads to reliability is not the number of hyperlinks to the author’s sources, but trust in the fact that the relevant aspects of the matter have been adequately dealt with.

So, instead of objectivity, I’d suggest ‘comprehensiveness’ as a condition for reliability. And it’s a sufficient one too, because on the internet comprehensiveness seems more than ever to subsume transparency.

From Topic Maps to MediaWiki – Quick and Dirty

Recently, I needed to make some fairly large bodies of XML available for editing by a group of people. In this case the data was stored in the Topic Maps format (XTM), and –as long as I was the only one editing the files– this had been working just fine.

But with more people about to join in, it was clear that editing the files in a simple text editor wasn’t such a good idea. So, to avoid the risk of ending up with different versions (and people endlessly complaining about editing XML), I decided to turn the whole thing into a wiki.

Now, MediaWiki has the Special:Export tool for migrating wikis (‘transwikiing’). It exports pages  in a simple XML format, so that you can import it to another wiki. This way you’re able to create a wiki simply by emulating the MediaWiki XML export format.

How to

If you want to try it, the MediaWiki output has to look a little something like this:

<?xml version="1.0" encoding="utf-8"?>
<mediawiki xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="http://www.mediawiki.org/xml/export-0.3/"
  xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/
  http://www.mediawiki.org/xml/export-0.3.xsd"
  version="0.3" xml:lang="da">
<page>
 <title>Google</title>
 <id>1</id>
 <revision>
  <id>1</id>
  <timestamp/>
  <contributor>
   <username>yourUserName</username>
   <id>1</id>
   </contributor>
   <text xml:space="preserve">
   <!-- Wikitext goes here -->
   ==Link==
   [http://www.google.com]

   </text>
  </revision>
</page>
<page>
 <title>Microsoft</title>
 <id>2</id>
 ...
</page>
</mediawiki>

If your data is XTM, your starting point might be something like this made-up Topic Map with names and links of three companies:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topicMap SYSTEM "xtm1.dtd">
<topicMap id="companies-tm.xtm"
  xmlns="http://www.topicmaps.org/xtm/1.0/"
  xmlns:xlink="http://www.w3.org/1999/xlink">
 <topic id="001">
  <baseName>
   <baseNameString>Google</baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.google.com"/>
  </occurrence>
 </topic>
 <topic id="002">
  <baseName>
   <baseNameString>Microsoft</baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.microsoft.com"/>
  </occurrence>
 </topic>
 <topic id="003">
  <baseName>
   <baseNameString>Oracle<baseNameString>
  </baseName>
  <occurrence>
   <resourceRef xlink:href="http://www.oracle.com"/>
  </occurrence>
</topic>
</topicMap>

In this case the following XSLT stylesheet will do the job:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:tm="http://www.topicmaps.org xtm/1.0/"
   xmlns:tmlink="http://www.w3.org/1999/xlink"
   exclude-result-prefixes="tm tmlink" version="2.0">
 <xsl:output method="xml" encoding="utf-8" indent="yes"/>
 <xsl:template match="/">
  <xsl:apply-templates select="tm:topicMap"/>
 </xsl:template>
 <xsl:template match="tm:topicMap">
  <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/
  http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="da">
   <xsl:apply-templates select="tm:topic"/>
 </mediawiki>
</xsl:template>
<xsl:template match="tm:topic">
<page>
 <title>
  <xsl:apply-templates select="tm:baseName/tm:baseNameString"/>
 </title>
 <id><!--To give each page a unique number, use the xsl:number instruction--><xsl:number/></id>
 <revision>
  <id>1</id>
  <timestamp/>
  <contributor>
   <username>yourUserName</username>
   <id>2</id>
  </contributor>
  <!--Since whitespace is crucial to the layout of your wikipage,
you should add the xml:space attribute and set the value to 'preserve'-->
  <text xml:space="preserve">
  <!--Now start building your wikipage -->

==Links==
<xsl:value-of select="tm:occurrence"/>

</text>
</revision>
</page>
</xsl:template>
</xsl:stylesheet>

Therefore:

  • Make sure that your wiki is installed, AND that you have admin rights
  • Create a stylesheet, somewhat like the one provided above
  • Run the stylesheet on your XML file, for instance from your command line with saxon:
    $ saxon topics.xtm topicMaps2Mediawiki.xsl > mediawikiTopics.xml
  • Go to the Special:Import page on your wiki
  • Browse for the file, and
  • Upload! Do remember, however, that the filesize maximum defaults to around 1.4 MB. To change it, you need to go to php.ini and simply change the parameters for maxuploadsize=.

After uploading the file, you’ll receive a list of links to the pages, you just made.

The Case for Content Strategy

Over the last couple of years I’ve come to appreciate the term content strategy. It began in 2007 with Rachel Lovinger‘s article Content Strategy: The Philosophy of Data. Here she urged readers to take a closer look at content itself, and then find out exactly who’s responsible for making it relevant, comprehensive, and efficient to produce.

I liked that, because it touches upon the very basics of communication, something which, I think, is somewhat neglected at the expense of design issues (keeping sentences short, using chunked text, putting action in verbs, etc.). Way too often, content is taken for granted. It’s what the customer brings to the agency, or something to be filled in later instead of the “lorem ipsum” gibberish, designers use.

Basically, content strategy adresses the issues of anyone trying to communicate anything, i.e. how to make your website function as:

  • a truthful representation of the sender’s intentions
  • a message relevant to the user
  • a correct use of language and imagery
  • an open channel between reader and author

And, of course, if you’re any good at writing, your text might even have an aesthetic value on its own.

Producing useful and useable web content on a daily basis isn’t a matter of being touched by the hand of god, or endowed with the perfect content from your client; it’s a matter of planning, and you need to be a part of it. Since internet communication involves quite a few disciplines, there’s a lot to plan for. A few things to consider:

  • Editorial strategy defines the guidelines by which all online content is governed: values, voice, tone, legal and regulatory concerns, user-generated content, and so on. This practice also defines an organization’s online editorial calendar, including content life cycles.
  • Web writing is the practice of writing useful, usable content specifically intended for online publication. This is a whole lot more than smart copywriting. An effective web writer must understand the basics of user experience design, be able to translate information architecture documentation, write effective metadata, and manage an ever-changing content inventory.
  • Metadata strategy identifies the type and structure of metadata, also known as “data about data” (or content). Smart, well-structured metadata helps publishers to identify, organize, use, and reuse content in ways that are meaningful to key audiences.
  • Search engine optimization is the process of editing and organizing the content on a page or across a website (including metadata) to increase its potential relevance to specific search engine keywords.
  • Content management strategy defines the technologies needed to capture, store, deliver, and preserve an organization’s content. Publishing infrastructures, content life cycles and workflows are key considerations of this strategy.
  • Content channel distribution strategy defines how and where content will be made available to users. (Side note: please consider e-mail marketing in the context of this practice; it’s a way to distribute content and drive people to find information on your website, not a standalone marketing tactic.)

I didn’t make that list (it comes from Kristina Halvorson, and it’s part of the article The Discipline of Content Strategy), but I agree. All of these branches are tools that help us create meaningful user experiences.

While there are obvious overlaps between content strategy and information architecture, I think that the two first disciplines on the list add something genuinely new. It’s not enough to structure and make the things on your website findable, you also need to make sure that the very content you’re providing is right for the occasion.

So, ultimately, it’s all about efficiency, and planning supports efficiency. Since creating content is both difficult and expensive (and always seems to be somebody else’s job), you want to make sure that every aspect of it performs at its best, and therefore there’s good reason to take the concept of content strategy (CS) seriously.

See also Jeffrey MacIntyre‘s eloquent Content-tious Strategy.

Update  (2010-04-27): In this video Rachel Lovinger, Jeffrey MacIntyre, and Karen McGrane share their view on CS at the Content Strategy, Manhattan Style event, in London, 13 April, 2010.

Introducing EPUB

With digital books finding their way to more and more, people read everywhere and on a variety of different devices. A lot of these have small displays, and this is a problem if the text you’re reading is in PDF.

EPUB is an XML publishing format for reflowable digital books and publications standardized by the International Digital Publishing Forum (IDPF), a trade and standards association for the digital publishing industry. For the record, this organization was formerly known as Open eBook Forum. “Reflowable” means that it scales to fit different screen sizes.

Since its official adoption by IDPF in 2007, EPUB has become popular among major publishers as Hachette, O’Reilly and Penguin. The format allows publishers to produce and send a single digital publication file through distribution, and it can be read using a variety of open source and commercial software. You can use O’Reilly’s Bookworm online for free, and you can go buy Adobe’s Digital Editions (ADE). It works on all major operating systems, on e-book devices (like Kindle and Sony PRS), and other small devices such as the Apple iPhone.

Collectively referred to as EPUB, the format is made up of three open standards:

  • Open eBook Publication Structure Container Format (OCF): Describes the directory tree structure and file format (zip) of an EPUB archive
  • Open Publication Structure (OPS): Specifies the common vocabularies for the eBook, especially the formats allowed to be used for book content (for example XHTML and CSS)
  • Open Packaging Format (OPF): Defines the required and optional metadata, reading order, and table of contents in an EPUB

To learn more, Liza Daly of Threepress has done a nice tutorial called Build a digital book with EPUB, available at IBM developerWorks. To really get to know EPUB, you’ll need to read the specifications: OCF, OPS, and OPF.

Civilisation works great on TV

On smashing telly! I saw a Channel 4  program: The 50 Greatest Documentaries, and among the ones featured was BBC’s 1969 (colour!) venture Civilisation with Kenneth Clark.

One of the things that makes Civilisation great TV is that it’s such a personal account. This isn’t anonymous lecturing under the guise of scientific objectivity, but a passionate plea for culture in a society threatened by a cold war suddenly turning hot.

Here’s a little sample. Take it away, Kenneth!

The entire series is for sale here