“All models are wrong, but some are useful”

And then there are models which are not useful at all (emphasis mine):

“consider an all-OSS world in which each company offers consumers exactly the same shared code as every other company. By definition no company can then compete by writing more OSS code than its rivals. This lack of competition suppresses code production for the same reason that cartels suppress output.”

Or to put it in other words, because companies compete within a common code base, they contribute less and less code into the project because they run the risk of losing a future contract to a competitor using code they have submitted.

The authors of this study are advised to read the history of the X Window System whose development closely follows their model. X is universal in the Unix world (commercial and open source systems who try to converge by being POSIX compilant (another hint here)), never faced lack of contributors and contributions or even stewardship and whenever stagnated new branches forked and pushed it forward. And while the authors seem to think that Open Source has been with us for the last 20 years, X was born in 1984. In fact we’ve had Open Source software since the very beginning of software.

* The quote used in the title of this post is attributed to statistician George Box.

Update: After this post and a discussion on twitter, Gregory Farmakis performed a mind experiment.

Quick note on Lattices, Relations and stuff

Thanks to @mosabou I got to read about Formal Concept Analysis. What a cool concept! Of the first things that I read was “A First Course in Formal Concept Analysis” (an introduction to the subject without the mathematics). While going through the examples, I kept thinking Where have I seen that before? Relational Databases, that’s where! And it seems that my intuition was correct: see the excellent “Gentle introduction to Relational Lattice” by Vadim Tropashko and the links from there.

[ I vaguely remember Yannis introducing the lattice concept in a database context in a course lecture a few years back, but have to admit of not looking much into it back then. ]

This is mostly a note for people who insist on thinking that theory is disconnected from practice. Especially the ones who write SQL code and insist on not realizing that they deal with sets (and set theory). Stop holding an umbrella and invest some time in your math.

Are “systems people” really necessary?

A good friend forwarded me a (handwritten) manuscript by E.W. Dijkstra entitled Are “systems people” really necessary? Giorgos pointed out that it might already be archived in the E.W Dijkstra Archive. As a matter of fact it is EWD1095 [handwritten version here in pdf].

It is a classic EWD document, straight to the point, properly impolite and asking the right questions. Great advice for career and personal growth.

The (n-th) return of the Database Machine

Once there were Database Machines. Talk to your favorite database person and they will tell you that this is an outdated idea. It is so old, that it can be served again as new and innovative. @mperedim remembers that I predicted that right after Oracle bought Sun. I am neither a market analyst nor I have predictive powers. It is just that Oracle has tried this before: Unbreakable Linux just a few years ago and with Sun hardware and Solaris in the 90s (It also happens that they had tried a lot of things with Sun before, like trying to move all their development desktops to Solaris x86. Or working with Sun on the NC which is no different than today’s netbook paradigm, or the X terminal of the early 90s or even the dumb terminal). So with a 20-year amnesia cycle in CS why not reintroduce the idea? Enter the Sun Oracle Database machines.

Oracle wants to sell such machines. It eliminates support (contract) complexity. Oracle needs a base Operating System that it can control its development and a hardware platform that can be optimized for what Oracle does best. Now clients can buy turn key solutions from Oracle just like they do when they buy IBM. Picture this: Two Linux machines, with Oracle 10g installed exchanging every kind of traffic except sqlplus. Whose fault is this? Oracle’s? The Linux vendor’s? It turned out to be a weird combination of the hardware. And this was discovered because the DBA and the System Administrator under the same employer decided to solve the problem (I was the System Administrator involved). Imagine two different vendors and the client trying to solve the problem: I would expect a lot of finger pointing instead of actually finding the solution and/or workaround.

Oracle now has the opportunity to market the product as a cost saver (“You only need an army of DBAs, not an army of DBAs and an army of systems administrators for different operating systems. Oh, and by the way our patching process just got simpler, you need to call only us”). While in fact a solution’s complexity is unaffected, support contract and communication complexity for the client is simplified. This looks better than buying IBM (or Microsoft) to the person that signs the checks.

Now if someone can make WebKit work with Emacs and we will have Lisp Machines resurrect…

PS: You do not believe in CS amnesia? In “Getting started as a PhD student” Matt Welsh writes: “you should never read anything from the 1960’s or 70’s or you will realize that it all has been done before”.

30 χρόνια επιτυχίες

Kat was a Greek computer that dual-booted MS-DOS and Apple II

Ψάχνοντας για την ιστορία της Gigatronics και του Kat, έπεσα πάνω σε αυτό το PDF. Χρήσιμο ανάγνωσμα για όσους ενδιαφέρονται για την ανάπτυξη της Πληροφορικής στην Ελλάδα, τότε που ήμασταν παιδιά.

Greenspun’s Tenth Rule and variations

For those who have not heard Greenspun’s Tenth Rule, it states that:

Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

By the way, Greenspun‘s rules 1 to 9 do not exist.

Seven months ago, during a discussion about Prolog, I asked Ozan S. Yigit to reformulate Greenspun’s tenth rule for Prolog. Oz replied:

Any sufficiently complicated modern program contains a buggy, informal implementation of prolog that casual observers confuse with lisp.

Just hours earlier I was basically a listener in a discussion that involved NoSQL. While clearly I am not a NoSQL advocate, I am no hater either, but what I heard lead me to the following reformulation of Greenspun’s rule, this time involving the relational model:

Those who blindly adopt #NoSQL will discover a variation of Greenspun’s tenth rule

I am sure that many other variations exist. In fact the Wikipedia page on Greenspun’s Tenth Rule contains a Prolog variation similar to Ozan’s and an Erlang version. So if you know of (or can make up) any other, please post it here (or somewhere).

Polymath projects for other disciplines too?

While I was revisiting Gowers‘ “Mathematics: A Very Short Introduction” my mind wandered to the first Polymath project (essentially a massively collaborative effort to solve certain mathematics problems where participation seemed to follow the 90-9-1 principle). Anyone who wants to learn more about Polymath can start from “A gentle introduction to the Polymath project

Anyway, as I was reading the paragraph I was looking for, it struck me: Do other disciplines have similar efforts? Wouldn’t it be nice if they did? If not, why? One minute later a second strike came:

– Wait a minute! We were there before Polymath! We have Hackathons!

Although more free spirited (in a Hackathon anyone can tackle what they want) the outcome is to the benefit of the society concerned with the event.

However hackathons seem disconnected from academic enviroments and it is a pitty. Big conferences occur yearly and people have fun discussing their work at the hallway tracks exchanging ideas and strategies. It seems a bit of waste that so many bright minds together do not sit around a blackboard, or even collaboratively over the Net, and discuss about attacking a problem, any problem, that has endured the test of time. Bright Math people did it, why not the rest?

With HDMS approaching, maybe this is something to consider for the last session. So if anyone from those going to Cyprus is reading, keep this at the back of your head. I could be wrong and such an effort may not be feasible in another discipline, but I would like to know why.

a bit of history on the relatonal model

Thanks to Software Memories we learn about David Childs and his work on Extended Set Theory. I quote from the blog post:

“Way back in 1968, Childs wrote a paper outlining how set theory, relations, and tuples could be applied to data management.

And that’s where I did a double-take, because 1968 < 1970. Sure enough, Footnote #1 in Codd’s seminal paper is to Childs’ 1968 work. Indeed, Childs’ paper is the only predecessor Codd acknowledges as having significant portions of his idea.”

It seems that there was life before God Codd after all.

Stateful protocols

Mark Crispin writes:

“In particular, doing things with mailboxes in the hundreds of MB in that format takes a while. The authors of Outlook and Thunderbird are victims of a computer science course mindset which, starting in the 1980s, taught their pupils that all protocols are (or should be) stateless. Thus, they believe that IMAP is like HTTP; that when a server fails to respond immediately, that means that the correct remedial action is to disconnect and try again, or just disconnect and assume that everything happened anyway.”

[via imap-uw]