Archive for the 'Publishing' Category

OpenDocument and OOXML –all is not well

Sometimes I get the impression that much of the current talk about sustainability of data resources is just a broad way of refocusing on a complex of problems which were somewhat overlooked, probably because many of us failed to grasp the full extent of the commitment to portable data. Along with more flimsy promises such as interoperable web services, portability in terms of platform-independent data, was/is actually an attainable goal – provided, of course, “we can use the cleanly documented, well-understood, easy-to-parse, text-based formats that XML provides.” And to continue along the same lines: “XML lets documents and data be moved from one system to another with reasonable hope that the receiving system will be able to make sense out of it.” (from Elliotte Rusty Harold and W. Scott Means, XML in a XML in a nutshell)

“Reasonable hope” –yes indeed. It’s very much implied that there’s more to portability than data being re-usable across different software and hardware platforms. If they are to be re-usable across different communities and different purposes as well, there are some further questions that cannot be left unanswered. This is all well argued for by Steven Bird and Gary Simons in their seminal Seven Dimensions of Portability for Language Documentation and Description (2003).

I’m bringing it up, because with word processors like OpenOffice.org and MS Word now using XML as a storage format, people could get the impression that such reasonably well-documented formats ship with a sustainability guarantee. XML formats is a step in the right direction, but like HTML they are “only” presentational, although arguably much harder to understand than HTML, and consequently difficult to manipulate and repurpose. Consider an excerpt of the present document in ODT:

<office:body>
 ...
 <text:p text:style-name="P1">
 <text:bookmark-start text:name="h.pcbwh9-j8rixd"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T1">OpenDocument</text:span>
 </text:span>
 <text:bookmark-end text:name="h.pcbwh9-j8rixd"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T1"> and OOXML </text:span>
 </text:span>
 </text:p>
 <text:p text:style-name="P2">
 <text:bookmark-start text:name="h.slg1ig-i51sxm"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T2">all</text:span>
 </text:span>
 <text:bookmark-end text:name="h.slg1ig-i51sxm"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T2"> is not well</text:span>
 </text:span>
 </text:p>
 <text:p text:style-name="P3">
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T3">Sometimes I get the impression that much of the
 current talk about sustainability of data resources is just a broad way of
 refocusing ...
 </text:span>
 </text:span>
 </text:p> ...

Basically, it consists of paragraph (text:p) and span (text:span) child elements. Mind you, these are consistently used, but in terms of format the markup doesn’t really provide any information except how an application should render it. Notice how a heading is just another paragraph with different typography.

In TEI we are able to distinguish between headings, captured by the head (heading) element, and p (paragraph) elements, which should only be used to reflect a real prose paragraph. Further, headings and paragraphs are contained by a div (division) element.

<text>
...
<body>
<div>
 <head>OpenDocument and OOXML</head>
 <head>all is not well</head>
 <p>Sometimes I get the impression that much of the current talk about
 sustainability of data resources is just a broad way of refocusing ...</p>
 ...
</div>
...
</text>

In terms of content TEI markup adds another dimension. By applying the TEI terminology, people can use the Guidelines to check if we use the terminology correctly and consistently. Also, by enriching the markup with more elements we could get a broader coverage of the different aspects of the content (quotations, emphasized passages, etc.) thereby making the content relevant to more people.

So, for long-term preservation purposes, OpenDocument and OOXML don’t quite cut it. Besides the lock-in with notoriously short-lived word processor applications, they aren’t rich enough to capture relevant aspects of your content.

Hyperlocal news

In connection with familiar words like ‘blog’, ‘news’, and ‘content’, the term hyperlocal has been a buzzword, at least since the launch of the hyperlocal content network Outside.in in 2006. We’ll get back to Outside.in and why I think it’s so important, but I’ll have to set some terminology straight first:

Hyperlocal means ‘over-local’; it refers to information not only about a specific location (that would just be plain old ‘local’ information) but implies a closer affiliation with the place, typically in terms of residence or some degree of familiarity. The rationale behind it is this: When people blog about the place they live, it attracts people who see themselves as connected to the same place. Very often, good old community feeling lies at the heart of it all.

Buzz rarely originates directly from community feeling; it’s more of a down-to-earth business kinda thing, and in order to turn volatile notions as community feeling into something tangible, the idea has to translate into a business model of sorts. Around 2005, with the rise of blogging in general and neighborhood blogs like Gothamist in particular, the aggregate amount of high-quality local content had become so extensive that it was in fact starting to look like an alternative to the news coverage of mainstream local media.

In this situation, what you need to make it a real alternative, is an aggregator that lets you gather the content you want and source it to users who will be able search and browse it. While millions of readers certainly is more than your average blogger could hope for, it’s what newspapers like New York Post crucially needs, and for that they’re more than willing to pay.

In briefly sketching the hyperlocal business model, I’ll throw in a few more buzzwords (hint: do watch out for the italics!):

Premise 1: Let there be given a lot of hyperlocal content on the web

Premise 2: Let there be given a news network that will let you

  • find and collect stuff, you want to use (that’s called aggregation),
  • select what you see fit to publish (that’s curation, but if you’re bluffing, please avoid confusing curation with ‘editorial work’) and
  • publish it to your own site

Consequence: Receive lots of traffic and ad-revenue.

While refraining from adding a Quod erat demonstrandum to the argument, there’s evidence that the model is working: New York Post (here’s a page for the Flatiron District) and CNN have teamed up with Outside.in, AOL acquired Patch, and MSNBC bought EveryBlock.

Outside.in is important, because it represents a genuine intersection of blogosphere and traditional media; it’s not just another newspaper letting a few reporters do some trendy blogging. What comes to mind is that this is in fact the most extensive local news coverage I have seen: Not only is there more content, the news are also much more granular.

If you’re interested in the really big picture, you’ll be sure to get it in Outside.in co-founder Steven Berlin Johnson’s excellent talk at SXSW 2009. As a little aside, I’ll be posting a little companion piece with a (hopefully growing) list of Danish hyperlocal blogs.

EPUB now available on Google Books

I’m happy to learn that Google Books have made their public domain books available for download in the EPUB format. This is a nice supplement to the existing image-based PDF version, because you’re no longer tied to large size displays -which, obviously, is where PDF works best.

epub

In a previous post I outlined the advantages of EPUB, but they’re well worth restating: EPUB is a free open standard designed to make text adapt (“reflow”) even to the smallest displays, and it’s supported by a growing ecosystem of digital reading devices.

All you need to get started on classics like Treasure Island is a reader. For instance, O’Reilly’s Bookworm is free online, and available in a growing number of languages. If you’re an iPhone user, you can install Stanza. Perhaps I should add that these two readers have been reviewed in Wired.

However, Google Books is not the only place, you can download EPUBs; ManyBooks, Feedbooks and Project Gutenberg are also available.

The Case for Content Strategy

Over the last couple of years I’ve come to appreciate the term content strategy. It began in 2007 with Rachel Lovinger‘s article Content Strategy: The Philosophy of Data. Here she urged readers to take a closer look at content itself, and then find out exactly who’s responsible for making it relevant, comprehensive, and efficient to produce.

I liked that, because it touches upon the very basics of communication, something which, I think, is somewhat neglected at the expense of design issues (keeping sentences short, using chunked text, putting action in verbs, etc.). Way too often, content is taken for granted. It’s what the customer brings to the agency, or something to be filled in later instead of the “lorem ipsum” gibberish, designers use.

Basically, content strategy adresses the issues of anyone trying to communicate anything, i.e. how to make your website function as:

  • a truthful representation of the sender’s intentions
  • a message relevant to the user
  • a correct use of language and imagery
  • an open channel between reader and author

And, of course, if you’re any good at writing, your text might even have an aesthetic value on its own.

Producing useful and useable web content on a daily basis isn’t a matter of being touched by the hand of god, or endowed with the perfect content from your client; it’s a matter of planning, and you need to be a part of it. Since internet communication involves quite a few disciplines, there’s a lot to plan for. A few things to consider:

  • Editorial strategy defines the guidelines by which all online content is governed: values, voice, tone, legal and regulatory concerns, user-generated content, and so on. This practice also defines an organization’s online editorial calendar, including content life cycles.
  • Web writing is the practice of writing useful, usable content specifically intended for online publication. This is a whole lot more than smart copywriting. An effective web writer must understand the basics of user experience design, be able to translate information architecture documentation, write effective metadata, and manage an ever-changing content inventory.
  • Metadata strategy identifies the type and structure of metadata, also known as “data about data” (or content). Smart, well-structured metadata helps publishers to identify, organize, use, and reuse content in ways that are meaningful to key audiences.
  • Search engine optimization is the process of editing and organizing the content on a page or across a website (including metadata) to increase its potential relevance to specific search engine keywords.
  • Content management strategy defines the technologies needed to capture, store, deliver, and preserve an organization’s content. Publishing infrastructures, content life cycles and workflows are key considerations of this strategy.
  • Content channel distribution strategy defines how and where content will be made available to users. (Side note: please consider e-mail marketing in the context of this practice; it’s a way to distribute content and drive people to find information on your website, not a standalone marketing tactic.)

I didn’t make that list (it comes from Kristina Halvorson, and it’s part of the article The Discipline of Content Strategy), but I agree. All of these branches are tools that help us create meaningful user experiences.

While there are obvious overlaps between content strategy and information architecture, I think that the two first disciplines on the list add something genuinely new. It’s not enough to structure and make the things on your website findable, you also need to make sure that the very content you’re providing is right for the occasion.

So, ultimately, it’s all about efficiency, and planning supports efficiency. Since creating content is both difficult and expensive (and always seems to be somebody else’s job), you want to make sure that every aspect of it performs at its best, and therefore there’s good reason to take the concept of content strategy (CS) seriously.

See also Jeffrey MacIntyre‘s eloquent Content-tious Strategy.

Update  (2010-04-27): In this video Rachel Lovinger, Jeffrey MacIntyre, and Karen McGrane share their view on CS at the Content Strategy, Manhattan Style event, in London, 13 April, 2010.

Introducing EPUB

With digital books finding their way to more and more, people read everywhere and on a variety of different devices. A lot of these have small displays, and this is a problem if the text you’re reading is in PDF.

EPUB is an XML publishing format for reflowable digital books and publications standardized by the International Digital Publishing Forum (IDPF), a trade and standards association for the digital publishing industry. For the record, this organization was formerly known as Open eBook Forum. “Reflowable” means that it scales to fit different screen sizes.

Since its official adoption by IDPF in 2007, EPUB has become popular among major publishers as Hachette, O’Reilly and Penguin. The format allows publishers to produce and send a single digital publication file through distribution, and it can be read using a variety of open source and commercial software. You can use O’Reilly’s Bookworm online for free, and you can go buy Adobe’s Digital Editions (ADE). It works on all major operating systems, on e-book devices (like Kindle and Sony PRS), and other small devices such as the Apple iPhone.

Collectively referred to as EPUB, the format is made up of three open standards:

  • Open eBook Publication Structure Container Format (OCF): Describes the directory tree structure and file format (zip) of an EPUB archive
  • Open Publication Structure (OPS): Specifies the common vocabularies for the eBook, especially the formats allowed to be used for book content (for example XHTML and CSS)
  • Open Packaging Format (OPF): Defines the required and optional metadata, reading order, and table of contents in an EPUB

To learn more, Liza Daly of Threepress has done a nice tutorial called Build a digital book with EPUB, available at IBM developerWorks. To really get to know EPUB, you’ll need to read the specifications: OCF, OPS, and OPF.