Go to content Go to navigation

Thoughs on Semantic Desktop and filesystems

Some days ago, I wrote a blog entry on my experience with MacOS. I anticipated a new post on "semantic desktop" applications.

In the first place, I must insert a disclaimer. I'm not involved in any semantic desktop project. There are some projects, such as Nepomuk, that are investing a lot of money to realize the vision of the semantic desktop. Although I've read some published papers, and I attended to some talks on this topic, I think that I'm not really aware of the current state of the art, and therefore, I'm not really qualified to talk about this topic. Therefore, you should read these paragraphs as a few thoughts from an outsider to the semantic desktop community.

I don't know anything about the status of the semantic desktop in the Windows platform, because I don't use it. However, in the open source world and in the Mac world, it is easy to find some progress. One of the main changes that the semantic desktop should bring to us is the invisibility of the file systems. Some years ago, when I was a studying Operative Systems Design at the University, I read an article that argued that filesystems were doomed. Unfortunately, I don't remember the citation, but since that moment, I've been convincing myself that the visionary assertion was right. Actually, traditional filesystems and partitions are ridiculous. Their main abstractions (files and directories) were OK when the size of magnetic disks was measured in KBytes or MBytes. Today, we have home devices with GBytes or even with TBytes, and we use them to store and share huge amounts of data. It is often difficult to organize the information in a hierarchical structure, but it is even more difficult to find it when it is stored in this way. But is it possible to store and find documents without a hierarchical filesystem? Well, I think that the web and Google have already proved that point. In the web, we don't browse hierarchical directories, we just discover and follow links. Big information repositories, with millions of documents, such as Flickr or Youtube, are not organized using folders.

So what an OS without filesystems looks like? In the first place, directory browsers (Explorer, Nautilus, Finder) will probably loose their central position in the user interface and the desktop metaphore. Their place will be taken by desktop search tools such as Beagle, Google Desktop and Spotlight. I've tried these three, and now I use the second and the third one everyday.

I don't know which one pioneered the idea, but nowadays, there are some applications that do a decent job to abstract the user from the filesystem. I imagine that iTunes is the most popular example, although the free software world has also a number of them. These applications use the meta-data to provide faceted browsing and searching, which are very convenient features for multimedia repositories. This paradigm shift is not welcome by everyone. Some advanced users still prefer to keep control on how their files are stored in their hard disks. Novice users, on the other hand, may be confused by having to learn new things (meta-data, ID3 tags, etc.). Anyway, I think this is a step in the right direction.

Strict hierarchical structures are also a burden to store our "bookmarks". From my experience, only a few people use the bookmarking feature of their browser. Most people use their preferred search engine to look for the page they want to visit. Many times, it is quicker to look up in Google than to browse through your bookmarks. On the other hand, advanced users have discovered the joy of collaborative bookmarking services such as Del.icio.us (by the way, this is my favourite Firefox extension).

What about using a hierarchical menu to look for the application you want to launch? Even in a well-organized menu, such as the GNOME one, it takes some time to find something. With respect to Windows... well, I feel sad for its users. In a semantic desktop, most applications should be transparent utilities, that are opened by the system when they're necessary (can anyone help me to understand why Windows users need an "Acrobat Reader" icon on their desktops?). But sometimes you really need to fire an application. Some months ago I discovered a very nice application called QuickSilver, which is very convenient for launching applications (and other things, see below). Recently, a GNOME clone has been created. It is difficult to define these applications, but once you have learned to use them, they're addictive.

A remarkable thing about QuickSilver is that does much more than just launching applications. From a semantic point of view, I think it is interesting from two points of view. In the first place, it allows me to launch applications, play music, open documents, see my agenda... and I don't have to think about where they are. I just type some characters, and they show up. In the second place, the QuickSilver metaphor is based on building sentences with a subject and a verb, but some actions also need an object. When they are displayed next to each other, they resemble an RDF statement.

In order to be able to abstract from the filesystems, we need to have meta-data rich objects in our desktops. I'm using MailTags to tag my email. MacOS has support for metadata in the filesystem, so I can also tag my files, but this is often a burden. TagBot makes this task as easy as drag-and-drop the files to tag them.

At this point, maybe you are convinced that we can live without filesystems. But wait a moment... sometimes, I need to now where my files are. For instance, at some point ¡, the file I need may live in my laptop hard disk, but now I'm using my desktop PC and I don't have access to it. I think this kind of problems also has a solution. The solution is simple: the files must be available everywhere! Historically, I've been using CVS and SVN to synchronize and distribute my files (not as far as Joey Hess does, but close). Recently I discovered EverNote, an application that smoothly stores and synchronizes your files across multiple machines. I can even access my files from a borrowed computer though the web interface. This new kind of Web 2.0 applications still have a long way to go, but they are showing us the future. All our data (our huge GMail account, our photos in Flickr, our bookmarks in Del.icio.us...) will be transparently available from everywhere.

Even an operation such as making backups, which a priori is completely dependent on the filesystem, can be abstracted from them. TimeMachine is so easy and transparent to use that it makes all the other backup applications look like they are from the past.

I think we are getting closer to the semantic desktop, but we aren't there yet. I expect to see more progress on integrating all these applications. Smooth synchronization of PIM data (agenda, contacts, tasks) among all my devices will be also a cornerstone, and it is almost done. Blending the barrier between local and remote data is, from my point of view, the main challenge.

MacOS X user

I've been (and I still am) a free/libre software advocate for many, many years. I've also been an active member of the community since 10 years ago, when I contributed my first translations of technical documentation to the Lucas/Insflug project. With this background, my friends are a bit confused because I've recently become a MacOS user (I'm a bit surprised too). When I bought my MacBook, I decided to give an opportunity to MacOS Leopard. Even if MacOS has a UNIX core, I was afraid of feeling strange, so I also bought VMware and I installed my beloved Debian in a virtual machine. My main fears were: a) to become less productive (I'm a Linux power user, so I'm quite productive using UNIX tools), b) to miss some critical apps, c) to loose control over my own computer.

Regarding the first one, I've discovered that I can be very productive with the Mac. There are a number of amazing applications for the Mac that are designed to boost your productivity. To my amazement, there are even some "semantic desktop" applications that really work. I'll try to cover some of them in future posts. Anyway, for complex tasks, I still have the ability to fire a UNIX console with bash.

With respect to the applications I use everyday on Debian (all of them open source, of couse), fortunately many of them also exist for the Mac: Firefox, Eclipse, Emacs, Apache, Python... Other ones, particularly OpenOffice, are a bit behind, although this is expected to change in the next months. I think the application I miss the most is Evolution (the mail client of the GNOME desktop). Evolution is, by far, the most complete mail client for the desktop, built by people who really use e-mail for their everyday work. Note that I'm not claiming that Evolution is the perfect and definitive e-mail client, it still has its flaws (particularly, stability), but I have no doubt on its vast superiority over the alternatives (side note: I still can't believe that there are happy users of Outlook). I've tried three alternatives: Apple Mail, Thunderbird and web-clients (GMail). They're OK, but they cannot make me forget Evolution.

Finally, with respect to my latter fear, the situation is a bit worse. I've effectively lost the complete control of my computer. It is executing code that I cannot see or compile by myself. However, from a pragmatical point of view, the situation is not really different than in other devices (my mobile phone, my Palm, my photo camera, my video game console, my iPod, my TV set, my car, and of course the firmware in many devices inside and outside my PC, etc... all run proprietary code). Unless you are rms, you cannot claim that you are not using a single line of closed code. Please do not misunderstand my point, but you have to draw the line somewhere. I recognize the value of running free software as much as it is possible, as demonstrated by the fact that I use much more free software than the vast majority of the computer users. By switching to a proprietary OS, I've moved my personal line a bit backwards, seduced by the charm of Mac. But I don't feel this is a betrayal to my principles, which I still hold. I've always defined myself as a free software advocate, not as a free software partisan.

I'm not trying to convince anyone or to defend my point. Besides my laptop computer, I'm still a very happy Debian user on my personal desktop PC at home, and at the PC at my office. Everyday, I spend more hours using Debian than Mac. Debian is installed in every computer I own. In my eyes, GNU/Linux is the best platform for almost every task, and in particular, for software development and internet, which are my main occupations.

XML and RDF

As you probably already know, my PhD topic is related with these two technologies from W3C, and how they interact with each other. Unfortunately, their relationship is often misunderstood. While it is true that they overlap to a certain extent, it is also true that they have definite roles. In my humble opinion, there are some of the factors that contribute to the confusion. The first one is that the XML specification is all about syntax. As Erik Wilde and Robert J. Glushko point out in a funny and very interesting article titled 'XML Fever', the XML Information Model (InfoSet) is not widely known. It is difficult to understand XML technologies (such as XPath, XQuery and XSLT) when you don't go beyond the XML syntax. It would have been great if W3C had published (and promoted) InfoSet together with the XML syntax specification. The recognition of this duality, and a clearer separation between the data model (tree) and the serialization (character sequence with characteristic angle brackets) would have been very important, from my point of view.

Other common pitfall is to assume that RDF is an XML application. With RDF, W3C clearly specified the data model and a normative serialization from the beginning. The only problem is that this serialization (RDF/XML) is based in XML, consequently too many people assume they can access the RDF data model with XML tools. I'm convinced that an XML-based serialization of RDF is a cornerstone for the Semantic Web... however, I'm sad that it has lead to so much confusion. One of the authors of the article I cited above recently asked for a RDF/XML parser written in XSLT, but I'm not really sure if such a thing is really useful or even feasible.

This week there was some discussion on a W3C mailing list on the relation between the RDF and XML data models. My wish would be that they could be more similar to each other. For instance, I think that the RDF policy for identifiers (i.e.: the usage of URIs) is better than the QNames of XML+Namespaces. Is it possible to re-formulate XML with a generalization of the QNames to URIs? Are CURIEs an intermediate step?

And what about the trendy JSON? Well, I've already discussed JSON in a previous entry.

Unlicensed games (it was: learning to program computers)

I think it was in 1988 when I got my first computer, a Sinclair ZX Spectrum +2A. I had some games, but I quickly discovered that it was much funnier to create my own games than to play to games created by others. And that's how I began to program computers.

I was particularly obsessed with a game about the movie Tron. I had a cassette tape (this link is for those who were born in the age of the iPod and the Blu-ray) with a game inspired on that movie. I think I got the tape from my uncle. Unfortunately, the game was for a different model of computers, so I didn't had a chance to play it.

Some years later, probably in 1991, a PC arrived at home. It was a 286 CPU with a monochrome display. I still have that venerable machine at home, and it still works. There was a BASIC interpreter (QBASIC) included in the operating system, therefore for me it was very natural to move from Spectrum's BASIC to the MS-DOS' QBASIC.

In November 1993, I completed the first version of a game called "Tron". I was trying to develop from scratch a clone of a game I haven't seen! My only source of inspiration was, apart from the movie, the label of the cassette case. "Tron" (the game) was multi-player (2 players shared a single keyboard) and it included some hand-made sprites for the explosions, and even a couple of sound-effects recorded with my own voice and played using my first sound card, a SoundBlaster Pro. I still have the source code of that game. It contains 1,000 lines of BASIC. Fifteen years have passed since 1993. Then, I was an untrained, very young programmer, and now I'm older and wiser. But I must say that I'm a bit astonished by myself. The code is neatly formatted, elegantly structured in subroutines and profusely commented.

Approximately at that time, I was learning a much more powerful programming language, C. A Spanish magazine distributed PCC (links welcome!), a small shareware C compiler. However, I decided to use BASIC again for the second version of my game. The first version was very successful, providing a lot of fun with my friends, although I think I never distributed the software. Therefore, it was quite natural to make an attempt to create a better game. By then, I had a 486 with a colorfull SuperVGA display. The new version was completed by mid 1994, and contained 2,000 lines of code. It included an installer, and I vaguely remember giving some copies of the game to my friends. Among other improvements, the game supported 4 simultaneous players (using a single keyboard! can you imagine how to put 4 hands of 4 different people over the same keyboard?), AI-controlled players and it had nicer graphics and sounds. It even included a title image that I created using a primitive release of POV. This image was inspired by the label of the tape I mentioned above. I silly named the new game "Tronator" (probably inspired by the movie Terminator II). As before, the code has a surprising quality for an untrained developer. Interestingly enough, the book on Design Patterns by the GoF was published approximately at that time.

The second version of the game was also incredibly successful, and we spent many, many afternoons and week-ends playing to it.

ESWC2008 at Tenerife

I've just returned from Tenerife, where I've (partially) attended to the ESWC2008. I arrived there a few days earlier to attend to other business, and I left the ESWC2008 yesterday, just when the main track was starting. Anyway, I had the opportunity to attend to the Scripting for the Semantic Web workshop, where I presented a paper. I was also a co-author of a second paper that was presented by Wikier. Unfortunately, I missed part of the SFSW workshop because I moved to another room to present a third paper by some colleagues at the SIEDL.

As usual, I tried to do my best with the presentations, so I spent a considerable amount of time carefully tidying my slides and speech. I introduced a new element: instead of OpenOffice, I used Apple Keynote. I felt comfortably presenting with the new tool, although the way in which Keynote stores the files is extremely inconvenient. Firstly, it uses a directory instead of a single file, thus making it (almost) impossible to keep the slides in a SVN repository. Secondly, it doesn't support the OpenDocument standard.

The trip to Tenerife was uneventful (fortunately, I must say), with some minor inconveniences in the return trip. These included a very eccentric taxi driver and a small delay in one of the flights, without consequences for the next connection. The place and the weather were superb. This was my first time in the Canary Islands. Actually, this trip has enlarged my personal "geographical bounding box" in two dimensions (can you guess how?).

Karting

Yesterday I went to Cabañas Raras (not too far from Ponferrada, in León) to enjoy some time driving a kart and racing with some of my colleagues and friends. It is a well-known fact that I'm not the most talented driver, consequently my final place (11th out of 15) should be seen as a positive result. Moreover, I was able to overtake two competitors! (as you can see, I started the race in 13th place -- hint: my kart is number 4). Anyway, the result doesn't really matter. The important fact is that I enjoyed the experience, we had lots of fun together, and at the end, no one was seriously hurt.

Alive and kicking

A friend of mine told me recently that my blogging activity has decreased in the last months. A glance to my blog reveals only four entries in the last four months. This effect is a reflect of my lack of (spare) time. So what I have been doing during the last months? Well, I traveled (Vienna, Paris, Berlin...); I edited a new draft of the Recipes (hopefully, a newer one will be published soon); I co-authored and successfully submitted some papers to international workshops; I taught a lesson in a course on semantic web at the University, etc.

Apparently I forgot to post something here about my trip to Berlin in February. It was my second time there, but unfortunately, this time I was there for just a few hours. I attended to a day-long meeting, but I made a very executive travel: I was back to my place less than 36 hours after leaving home. In those hours, I took four flights, I waited at four different airports, I took a number of taxis, but I didn't have too much sleep. I was sick (a heavy case of cold), so the experience wasn't very pleasant. Of course, I didn't have even a single minute for tourism or sightseeing.

Besides my work at CTIC, I've been working in my PhD thesis. I published my first paper related with my thesis, and I released a simple software component called XSLT+SPARQL that demonstrates a possible approach to the problem I'm tackling with. The good news is that I'm moving forward and I have a lot of ideas to explore. The bad news is that there is still a lot of work ahead.

Vienna

This week I went to Vienna for a business trip. I was my first time there, and as usual, I didn't have time for tourism. My only opportunity to visit the city was at night, after dinner. It was dark and cold, and I only had a couple of hours to walk through Vienna's streets, but I really enjoyed what I saw. Hopefully, someday I'll return to Vienna to do a proper visit!

Still alive (and reviewing)

Some have noted that the activity in my blog has decreased recently. Well, fortunately I'm still alive and well, although my workload is (even) higher than usual. That's why I have only posted once in this blog in the last two months. Some might say that's good because I'm contributing to reduce the noise in the net :)

One of the things I did in the last few weeks was to review the draft of the specification of RDFa. I've already posted some bits about RDFa (see this and this), I think it has the potential to become a widely used technology. So, when the W3C Semantic Web Deployment WG asked for reviewers for the draft specification, I volunteered. I did my best to spot any potential issue in the draft, and actually I sent a number of comments. Therefore, even if the merit is all of the editors of the specification, I feel that I have made a humble contribution to this new technology.

Some time ago I applied RDFa to my web page, mainly as an exercise to learn the technology. In the last days, I fixed some errors and I decided to drop my FOAF file and to replace it with an Apache redirect rule:

RewriteRule ^foaf.rdf http://www.w3.org/2007/08/pyRdfa/extract?uri=http://berrueta.net/ [R=303,L]

Additionally, I also included another redirect rule to allow non RDFa-aware semantic web agents to get RDF descriptions of my pages. This is the second rule:

RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteRule ^(.*) http://www.w3.org/2007/08/pyRdfa/extract?uri=http://berrueta.net/$1 [R=303,L]

Once I finished implementing these ideas, I found that Ivan Herman had already done the same thing, and he posted the recipe in his blog. How could I have missed his post? Well, if you how I started this entry, I've been so busy these weeks that I haven't read my feed aggregator. Ouch!

By the way, I still have to check why Vapour says that there is a problem with content negotiation. When I check the same thing using curl or wget, everything seems to be OK. Maybe there is a bug in Vapour?

New draft of the Recipes

The W3C has just published a new draft of "Best Practice Recipes for Publishing RDF Vocabularies", a document which describes how to configure a web server to publish RDF and HTML representations of the resources.

This is the first document I've co-edited for the W3C. My contribution isn't really impressive, but I'm very happy and proud of my involvement with W3C.