What I like about the NoSQL crowd

Although I am not a big fan of the NoSQL movement (mostly because many of its advocates use arguments I do not agree with) there are a few things that I like about the NoSQL crowd and I want to write them down^*. Most of what follows stems from discussions through the years with our DBA and some friends who are members of the “Greek Database Mafia“.

For more than two decades the dominance of the relational model (even though no commercial system fully implemented it^†) was undisputed. Nobody ever got fired for choosing a commercial RDBMS for an application, where instead one would look suspicious if one dared to propose something different. This situation is no different than what Rob Pike described in “Systems Software Research is Irrelevant” for Operating Systems:

For example it took 10+ years for the R-Tree to enter the commercial systems, although it was solving a real problem. In the meantime if you were lucky and your system offered extensibility you could write it on your own.
No matter how novel the system, for it not to be marginalized it had to have an “SQL layer”. No SQL queries, no sales. Provide an SQL layer and all innovation of the product stays unused.
Ones proposal for an RDBMS purchase had to be among three or four commercial products. Anything else would likely be considered “a hacker’s choice” because “We make money! We cannot go that way!”

You could say that databases outside academic research had come to a halt. You don’t believe me? Just ask Yannis Ioannidis who uses to say that “Databases are dead“^‡ in the most emphatic way when he wants to stir things up in a conversation.

And this is what I like about the NoSQL crowd (== implementers, advocates and integrators) . They do not care about established standards. They are not afraid to experiment in a “real environment”. Some of them may focus on a single problem and solve it well. Others may aim at a wider range of problems. But no system is stopped from being developed and deployed because it not “SQL compilant” or not relational. And even though some of these solutions resemble CODASYL^o, once again there is action in the field.

But please people, stop marketing them as a “one solution fits all”. For we will again end up in a stagnation era, just like when everyone was storing stuff in an RDBMS for lack and fear of better suited solutions. They do not invalidate relational systems. They fill in the gap left by them.

[*] – As I had promised.

[†] – By the way, did you know that GROUP BY works outside of the “relational box”?

[‡] – I have heard him say this while giving a speech.

[o] – “one usually gets a low-level record-at-a-time DBMS interface“, says Mike Stonebraker.

re: The Humbling Power of P v NP

academia_vs_business — Some engineer out there has solved P=NP and it's locked up in an electric eggbeater calibration routine. For every 0x5f375a86 we learn about, there are thousands we never see.

In “The Humbling Power of P v NP“, Lance Fortnow urges theorists to try and solve P v NP “not because you will succeed but because you will fail” . This is the Kobayashi Maru character test for theorists it seems.

So what about non-theorists?

My answer is: So what if a problem is NP-complete? Does this mean that we are going to use that fact as an excuse to not solve it, or present a lousy hack as a solution? Or do people think that such problems do not come along the way of “a real professional”? They do, but theorists are trained to recognize them when they see them.

Just like theorists then, “practical computer people” must try to solve (using whatever tool they see fit) an NP-complete problem (like the TSP for example). Not because they will solve it optimally, but because there will always be a better solution. And by seeking it and understanding that “computation is a nasty beast” they will become better ~~programmers~~ professionals.

Update: You should also read:

“η πληροφορική στο Δημόσιο είναι σαν τον άπατο πίθο των Δαναΐδων”

Τέλος.-

(next)

Chaitin’s books online

A few weeks ago I heard that Chaitin tries to put everything he has ever published online, and most of the stuff via arXiv. Over the weekend I gave it sometime and found some of his books:

I own “Meta Math!“. I bought it by chance. While in the bookstore, I opened it and stumbled upon LISP code. This is a really nice book. It is a personal log of Chaitin‘s career, an introduction to Kolmogorov complexity⁰ (which was independently invented by Chaitin too), algorithmic information theory, thinking about randomness (and in a way first presented to me years ago by a long time friend), LISP¹, useful comments on NKS², Polya‘s “How to Solve It!” and an introduction to the thought of Leibniz³.

Of course the book has irritating parts: Having invented a field independently of Kolmogorov, somehow puts you in the shadow of the man and Chaitin constantly tries to get out of the shadow and stand not on the shoulders of Turing and Goedel but next to them (when in fact such effort is not needed).

I loved the book⁴. It is the kind of book that reminds you why you loved Mathematics. Even Chaitin’s personal effort to stress the importance of his work (that annoys many) contributes to this. For he unquestionably loves Mathematics.

(Must make time and read “The Limits of Mathematics”; Time there is not…)

[0] – More about it at “An introduction to Kolmogorov complexity“.

[1] – Link to the Lisp interpreter that Chaitin has developed for his needs.

[2] – See also this presentation on NKS by S. Wolfram.

[3] – An essay on Leibniz by Chaitin.

[4] – I also loved “Title: Algorithmic information theory: Some recollections“.

This is a weekend’s work

More often than I would expect, I run into conversations about certain IT and web related projects where someone will say, with a bit of authority:

– This is not a big deal! It can be done over the weekend.

Once I was told that a certain feature that needed to be implemented was “only two hours work”! And this came from a person that had no knowledge of the software stack in use.

Another time upon protesting (and basically saying “Put your money where your mouth is!”) I got the best exit:

– I did not say that I can do it over the weekend. I said that it is possible to be done in a weekend by you!

Like I do not have better things to do in my weekends…

So if you have ever claimed that something is possible over a weekend (and can be delivered as a working beta) the weekend is not far. Shut up and prove your point. You can even use Friday afternoon as extra time.

But on Monday please call the guy that said it will take him a week, a month or more and apologize.

A glimpse at Christos H. Papadimitriou

Via Machinations we learn that the current issue (Volume 3, Issue 2, May 2009) of Computer Science Review is devoted to celebrating the research contributions of Christos H. Papadimitriou. The first article “A glimpse at Christos H. Papadimitriou” (by Marios Mavronicolas and Paul G. Spirakis) has a lot of information not only on Papadimitriou’s path, but also on how CS evolved in Greece and how it was influenced by Papadimitriou^*. This is a must read, especially if you are a NTUA student which means that there are at least two people that you can go to and ask for more (or better yet work with them). UoA students can ask Elias Koutsoupias.

As the authors say, there is stuff missing from the paper, since a 30+ years fruitful career cannot be covered in a few pages. But since both the fact that the TSP is a recurring theme in his research and the late Kanellakis are mentioned, I think their work on the ATSP should have been mentioned. What can I say, I’ve grown to a TSP junkie the last few years.

Also, for those interested, in the same issue Costis Daskalakis surveys their recent joint work on Nash Equilibria and complexity.

P.S. Elsevier told me that a hardcopy of this issue costs €66.50 for Greece, so it is best to go to your University’s library and access it online.

[*] – For example when two guys asked him right after graduation (~1982) on what to do, his advice was to work with databases (and I know this first hand).

Eval

I was looking for a scientific calculator for my Axim X3 and I stumbled upon Eval (I think I was reading a slashdot article for “Programmers at work” at the time). Eval is written by Jonathan Sachs. Not only is it a fine freeware solution that transforms your Pocket PC into a scientific calculator, but there is also a Windows version available (To be able to use the help file you have to install the WinHlp32.exe update. I do not know for Windows 7 and higher).

Performance tunning

From a dialog I had with a friend just minutes ago:

Friend: Do you know of any companies that sell (database) performance tunning tools?
Me: It is called a DBA.

The Gosling Tarpit

I think Panagiotis is going to love this:

“because the Java programming language and Java Virtual Machine are (surprise!) so tightlyCoupled, new language designers are compelled to make their languages such that they use only those features they can implement efficiently on the JVM. For example, implementations of Scheme for the JVM either lack call/cc or have a very slow and slightly buggy implementation of it. We call this the Gosling Tarpit.”

From Phosphorous, The Popular Lisp.

[via]

on AMKA

Συνήθως αποφεύγω να κάνω κρίσεις από μακριά για την δουλειά των άλλων, αλλά η περίπτωση του ΑΜΚΑ είναι μία εξαίρεση. Μου γεννιόνται απορίες και ελπίζω να τις διαβάσει κάποιος και να μας διαφωτίσει. Αρχικά υπήρχε η παρατήρηση του Stazybo Horn πως τα πρώτα ψηφία του ΑΜΚΑ αποτελούνται από την ημερομηνία γέννησης του ασφαλισμένου. Μου φάνηκε περίεργο, αλλά δεν έδωσα ιδιαίτερη σημασία. Μέχρι που σήμερα η γυναίκα μου κάνοντας ένα paper mining, βρήκε τη δικιά της κάρτα με τον ΑΜΚΑ της, καθώς και το χαρτί που συνόδευε την κάρτα. Διαβάζει λοιπόν κανείς στο πίσω μέρος του χαρτιού:

Ελέγξετε την ημερομηνία γέννησης και το φύλο, που περιέχονται στον Α.Μ.Κ.Α., σύμφωνα με το παρακάτω παράδειγμα.

Έστω ότι ο Α.Μ.Κ.Α. είναι ο 270163 0012 5

Οι πρώτοι έξι (6) αριθμοί είναι η ημερομηνία γέννησης (27 Ιανουαρίου 1963).

Από τους επόμενους τέσσερις αριθμούς (0012) ο τελευταίος (δηλ. το 2) δηλώνει το φύλο και στη συγκεκριμένη περίπτωση πρόκειται για γυναίκα. Γιατί οι ζυγοί αριθμοί 0,2,4,6,8 δίδονται στις γυναίκες ενώ οι μονοί 1,3,5,7,9 δίδονται στους άνδρες.

Ο τελευταίος αριθμός (5) αφορά τη μηχανογράφηση και επομένως δεν ελέγχεται από εσάς.

Ερωτήματα:

Η ασφαλισμένη γεννήθηκε στις 26/01/63. Y2K anyone; Η δικιά μου κάρτα έφτασε ~5 χρόνια πριν. Ξεχάστηκε τόσο γρήγορα; Τι θα γίνει εάν κάποιο νεογέννητο βρεθεί με το ΑΜΚΑ κάποιου υπεραιωνόβιου;
80 bits (ναι 80, όχι 88, θα το δούμε μετά) για να περιγραφεί ο ασφαλισμένος με μοναδικό τρόπο; Τη στιγμή που αρκούν 33 bits για να χαρακτηρίσουν μοναδικά όποιον ζει αυτή τη στιγμή στον πλανήτη;
Δεν ξέρω πως είναι οργανωμένη η βάση του ΑΜΚΑ, καταλαβαίνω όμως πως ο ΑΜΚΑ είναι πρωτεύον κλειδί (ή χρησιμοποιείται ως τέτοιο). Είναι δυνατό να κάνει κανείς πράξεις στο πρωτεύον κλειδί για να εξάγει συμπεράσματα για τον ασφαλισμένο; Και εάν δεν κάνει τον υπολογισμό επί του κλειδιού αυτού, ξανασώζει την ίδια πληροφορία σε χωριστό πεδίο; Και όχι μόνο αυτό, αλλά η πληροφορία αυτή να είναι substring μήκους 6 bytes μέσα σε άλλο string, ενώ θα μπορούσε να είναι 4 bytes (ένας integer); Update: Χρόνια μετά υποψιάζομαι πως ο λόγος που ο ΑΜΚΑ έχει αυτή τη μορφή είναι για να τον θυμάται εύκολα ο ασφαλισμένος. Και τον δικό του και των παιδιών του.
Το αυτό και για το φύλο του ασφαλισμένου. Γιατί πρέπει να είναι κομάτι του ΑΜΚΑ αυτό;
Τι εξυπηρετούν τα 3 bytes για τα οποία δεν υπάρχει εξήγηση;
Το 11ο byte (αυτό που αφορά τη μηχανογράφηση) είναι μάλλον κάποιο check digit.

This smells a lot like COBOL.

– Που πάνε οι ΑΜΚΑ όταν πεθαίνουν μπαμπά;