OpenDocument and OOXML –all is not well

Sometimes I get the impression that much of the current talk about sustainability of data resources is just a broad way of refocusing on a complex of problems which were somewhat overlooked, probably because many of us failed to grasp the full extent of the commitment to portable data. Along with more flimsy promises such as interoperable web services, portability in terms of platform-independent data, was/is actually an attainable goal – provided, of course, “we can use the cleanly documented, well-understood, easy-to-parse, text-based formats that XML provides.” And to continue along the same lines: “XML lets documents and data be moved from one system to another with reasonable hope that the receiving system will be able to make sense out of it.” (from Elliotte Rusty Harold and W. Scott Means, XML in a XML in a nutshell)

“Reasonable hope” –yes indeed. It’s very much implied that there’s more to portability than data being re-usable across different software and hardware platforms. If they are to be re-usable across different communities and different purposes as well, there are some further questions that cannot be left unanswered. This is all well argued for by Steven Bird and Gary Simons in their seminal Seven Dimensions of Portability for Language Documentation and Description (2003).

I’m bringing it up, because with word processors like and MS Word now using XML as a storage format, people could get the impression that such reasonably well-documented formats ship with a sustainability guarantee. XML formats is a step in the right direction, but like HTML they are “only” presentational, although arguably much harder to understand than HTML, and consequently difficult to manipulate and repurpose. Consider an excerpt of the present document in ODT:

 <text:p text:style-name="P1">
 <text:bookmark-start text:name="h.pcbwh9-j8rixd"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T1">OpenDocument</text:span>
 <text:bookmark-end text:name="h.pcbwh9-j8rixd"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T1"> and OOXML </text:span>
 <text:p text:style-name="P2">
 <text:bookmark-start text:name="h.slg1ig-i51sxm"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T2">all</text:span>
 <text:bookmark-end text:name="h.slg1ig-i51sxm"/>
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T2"> is not well</text:span>
 <text:p text:style-name="P3">
 <text:span text:style-name="Default_20_Paragraph_20_Font">
 <text:span text:style-name="T3">Sometimes I get the impression that much of the
 current talk about sustainability of data resources is just a broad way of
 refocusing ...
 </text:p> ...

Basically, it consists of paragraph (text:p) and span (text:span) child elements. Mind you, these are consistently used, but in terms of format the markup doesn’t really provide any information except how an application should render it. Notice how a heading is just another paragraph with different typography.

In TEI we are able to distinguish between headings, captured by the head (heading) element, and p (paragraph) elements, which should only be used to reflect a real prose paragraph. Further, headings and paragraphs are contained by a div (division) element.

 <head>OpenDocument and OOXML</head>
 <head>all is not well</head>
 <p>Sometimes I get the impression that much of the current talk about
 sustainability of data resources is just a broad way of refocusing ...</p>

In terms of content TEI markup adds another dimension. By applying the TEI terminology, people can use the Guidelines to check if we use the terminology correctly and consistently. Also, by enriching the markup with more elements we could get a broader coverage of the different aspects of the content (quotations, emphasized passages, etc.) thereby making the content relevant to more people.

So, for long-term preservation purposes, OpenDocument and OOXML don’t quite cut it. Besides the lock-in with notoriously short-lived word processor applications, they aren’t rich enough to capture relevant aspects of your content.

Hyperlocal, anyone?

This is a minor appendix to my former post. It’s intended as a bulletin board over Danish hyperlocal blogs (please note that I decide what to count as hyperlocal and that I’ve already disqualified AOK).

I’ll update it when I find something to post. So far, I’ve found these:

Hyperlocal news

In connection with familiar words like ‘blog’, ‘news’, and ‘content’, the term hyperlocal has been a buzzword, at least since the launch of the hyperlocal content network in 2006. We’ll get back to and why I think it’s so important, but I’ll have to set some terminology straight first:

Hyperlocal means ‘over-local’; it refers to information not only about a specific location (that would just be plain old ‘local’ information) but implies a closer affiliation with the place, typically in terms of residence or some degree of familiarity. The rationale behind it is this: When people blog about the place they live, it attracts people who see themselves as connected to the same place. Very often, good old community feeling lies at the heart of it all.

Buzz rarely originates directly from community feeling; it’s more of a down-to-earth business kinda thing, and in order to turn volatile notions as community feeling into something tangible, the idea has to translate into a business model of sorts. Around 2005, with the rise of blogging in general and neighborhood blogs like Gothamist in particular, the aggregate amount of high-quality local content had become so extensive that it was in fact starting to look like an alternative to the news coverage of mainstream local media.

In this situation, what you need to make it a real alternative, is an aggregator that lets you gather the content you want and source it to users who will be able search and browse it. While millions of readers certainly is more than your average blogger could hope for, it’s what newspapers like New York Post crucially needs, and for that they’re more than willing to pay.

In briefly sketching the hyperlocal business model, I’ll throw in a few more buzzwords (hint: do watch out for the italics!):

Premise 1: Let there be given a lot of hyperlocal content on the web

Premise 2: Let there be given a news network that will let you

  • find and collect stuff, you want to use (that’s called aggregation),
  • select what you see fit to publish (that’s curation, but if you’re bluffing, please avoid confusing curation with ‘editorial work’) and
  • publish it to your own site

Consequence: Receive lots of traffic and ad-revenue.

While refraining from adding a Quod erat demonstrandum to the argument, there’s evidence that the model is working: New York Post (here’s a page for the Flatiron District) and CNN have teamed up with, AOL acquired Patch, and MSNBC bought EveryBlock. is important, because it represents a genuine intersection of blogosphere and traditional media; it’s not just another newspaper letting a few reporters do some trendy blogging. What comes to mind is that this is in fact the most extensive local news coverage I have seen: Not only is there more content, the news are also much more granular.

If you’re interested in the really big picture, you’ll be sure to get it in co-founder Steven Berlin Johnson’s excellent talk at SXSW 2009. As a little aside, I’ll be posting a little companion piece with a (hopefully growing) list of Danish hyperlocal blogs.

This is not transparency

A key factor in establishing authority on the internet is, as David Weinberger convincingly argued, transparency:

What we used to believe because we thought the author was objective we now believe because we can see through the author’s writings to the sources and values that brought her to that position. Transparency gives the reader information by which she can undo some of the unintended effects of the ever-present biases. Transparency brings us to reliability the way objectivity used to.

I agree with much most of it, and perhaps the point can be further illustrated by a quick example. If you take a look at the Wikipedia article on the epistemological sense of, well, Transparency, the contrast between then and now will be clear:

As you can see, there’s an explanation and a reference to an article by professor Paul Boghossian. The reference is the interesting part, because in academia this is perfectly sufficient for convincing readers that the material can be trusted. At least, it leaves you with an idea of what to do when you get to the library.

But the internet isn’t like the research library at all. Here, everybody could have made the claim that a certain Paul Boghossian said so and so about transparency, but, since links to resources supporting it (e.g. Wikipedias article on Paul Boghossian, for one) are extremly few, the article isn’t transparent and doesn’t meet Wikipedia’s requirements for verifiability, let alone follow conventions of the internet media.

Transparency is not the new objectivity, but comprehensiveness just might be

In a terrific post, Transparency is the new objectivity, David Weinberger argues that the hyperlink nature of the internet is reshaping our notions of authority. With everybody suddenly a potential author, the old claim to objectivity seems more and more trite and outworn:

Objectivity used to be presented as a stopping point for belief: If the source is objective and well-informed, you have sufficient reason to believe. The objectivity of the reporter is a stopping point for reader’s inquiry. That was part of high-end newspapers’ claimed value: You can’t believe what you read in a slanted tabloid, but our news is objective, so your inquiry can come to rest here. Credentialing systems had the same basic rhythm: You can stop your quest once you come to a credentialed authority who says, “I got this. You can believe it.” End of story.

Instead we demand transparency; to be able to “see through the author’s writings to the sources and values that brought her to that position.”

Transparency gives the reader information by which she can undo some of the unintended effects of the ever-present biases. Transparency brings us to reliability the way objectivity used to.

I think that this kind of “hyper-transparency” -where citing a book isn’t enough, but where a link has to point to the actual resource- may be an essential feature of the internet medium; but whereas it certainly is a necessary condition for establishing reliability, it’s hardly sufficient. After all, what leads to reliability is not the number of hyperlinks to the author’s sources, but trust in the fact that the relevant aspects of the matter have been adequately dealt with.

So, instead of objectivity, I’d suggest ‘comprehensiveness’ as a condition for reliability. And it’s a sufficient one too, because on the internet comprehensiveness seems more than ever to subsume transparency.

The Case for Content Strategy

Over the last couple of years I’ve come to appreciate the term content strategy. It began in 2007 with Rachel Lovinger‘s article Content Strategy: The Philosophy of Data. Here she urged readers to take a closer look at content itself, and then find out exactly who’s responsible for making it relevant, comprehensive, and efficient to produce.

I liked that, because it touches upon the very basics of communication, something which, I think, is somewhat neglected at the expense of design issues (keeping sentences short, using chunked text, putting action in verbs, etc.). Way too often, content is taken for granted. It’s what the customer brings to the agency, or something to be filled in later instead of the “lorem ipsum” gibberish, designers use.

Basically, content strategy adresses the issues of anyone trying to communicate anything, i.e. how to make your website function as:

  • a truthful representation of the sender’s intentions
  • a message relevant to the user
  • a correct use of language and imagery
  • an open channel between reader and author

And, of course, if you’re any good at writing, your text might even have an aesthetic value on its own.

Producing useful and useable web content on a daily basis isn’t a matter of being touched by the hand of god, or endowed with the perfect content from your client; it’s a matter of planning, and you need to be a part of it. Since internet communication involves quite a few disciplines, there’s a lot to plan for. A few things to consider:

  • Editorial strategy defines the guidelines by which all online content is governed: values, voice, tone, legal and regulatory concerns, user-generated content, and so on. This practice also defines an organization’s online editorial calendar, including content life cycles.
  • Web writing is the practice of writing useful, usable content specifically intended for online publication. This is a whole lot more than smart copywriting. An effective web writer must understand the basics of user experience design, be able to translate information architecture documentation, write effective metadata, and manage an ever-changing content inventory.
  • Metadata strategy identifies the type and structure of metadata, also known as “data about data” (or content). Smart, well-structured metadata helps publishers to identify, organize, use, and reuse content in ways that are meaningful to key audiences.
  • Search engine optimization is the process of editing and organizing the content on a page or across a website (including metadata) to increase its potential relevance to specific search engine keywords.
  • Content management strategy defines the technologies needed to capture, store, deliver, and preserve an organization’s content. Publishing infrastructures, content life cycles and workflows are key considerations of this strategy.
  • Content channel distribution strategy defines how and where content will be made available to users. (Side note: please consider e-mail marketing in the context of this practice; it’s a way to distribute content and drive people to find information on your website, not a standalone marketing tactic.)

I didn’t make that list (it comes from Kristina Halvorson, and it’s part of the article The Discipline of Content Strategy), but I agree. All of these branches are tools that help us create meaningful user experiences.

While there are obvious overlaps between content strategy and information architecture, I think that the two first disciplines on the list add something genuinely new. It’s not enough to structure and make the things on your website findable, you also need to make sure that the very content you’re providing is right for the occasion.

So, ultimately, it’s all about efficiency, and planning supports efficiency. Since creating content is both difficult and expensive (and always seems to be somebody else’s job), you want to make sure that every aspect of it performs at its best, and therefore there’s good reason to take the concept of content strategy (CS) seriously.

See also Jeffrey MacIntyre‘s eloquent Content-tious Strategy.

Update  (2010-04-27): In this video Rachel Lovinger, Jeffrey MacIntyre, and Karen McGrane share their view on CS at the Content Strategy, Manhattan Style event, in London, 13 April, 2010.

Civilisation works great on TV

On smashing telly! I saw a Channel 4  program: The 50 Greatest Documentaries, and among the ones featured was BBC’s 1969 (colour!) venture Civilisation with Kenneth Clark.

One of the things that makes Civilisation great TV is that it’s such a personal account. This isn’t anonymous lecturing under the guise of scientific objectivity, but a passionate plea for culture in a society threatened by a cold war suddenly turning hot.

Here’s a little sample. Take it away, Kenneth!

The entire series is for sale here