Wednesday, May 14, 2008
Why Arc is bad for exploratory programming
The book is an excellent example of exploratory programming, showing how to incrementally build up these applications and experiment with different algorithms from the Python interactive prompt. For instance, topic clustering is illustrated by first implementing code to fetch blog pages from RSS feeds, breaking the pages into words, applying first a hierarchical clustering algorithm and then a K-means clustering algorithm to the contents, and then graphically displaying a dendrogram showing related blogs. At each step, the book shows how to try out the code and perform different experiments from the interactive prompt.
By using Python libraries, each step of implementation is pretty easy; the book can focus on the core algorithms, and leave the routine stuff to libraries: urllib2 to fetch web pages, Universal Feed Parser to access RSS feeds, Beautiful Soup to parse HTML, Python Imaging Library to generate images, pydelicious to access links on del.icio.us, and so forth.
If you want more details than the book provides (it is surprisingly lacking in references), I recommend Andrew Moore's online Statistical Data Mining Tutorials, which covers many of the same topics.
What does this have to do with Arc?
While reading this book, I was struck by the contradiction that this book is a perfect example of exploratory programming, Arc is "tuned for exploratory programming", and yet using Arc to work through the Collective Intelligence algorithms in Arc is an exercise in frustration.The problem, of course, is that Arc lacks libraries. Arc lacks basic functionality such as fetching a web page, parsing an XML document, or accessing a database. Arc lacks utility libraries to parse HTML pages or perform numerical analysis. Arc lacks specialized API libraries to access sites such as del.icio.us or Akismet. Arc lacks specialized numerical libraries such as a support-vector machine implementation. (In fact, Arc doesn't even have all the functionality of TRS-80 BASIC, which is a pretty low bar. Arc is inexplicably lacking trig, exp, and log, not to mention arrays and decent error reporting.)
To be sure, one could implement these libraries in Arc. The point is that implementing libraries detours you from the exploratory programming you're trying to do.
Paul Graham has commented that libraries are becoming an increasingly important component of programming languages, that huge libraries are now an expected part of a new programming language, and that libraries are an increasing important feature of programming languages. Given this understanding of the importance of libraries, it's surprising that Arc is so lacking in libraries. (It's also surprising that it lacks a module system or some other way to package libraries.) It's a commonplace complaint about Lisp that it lacks libraries compared to other languages, and Arc makes this even worse.
I think there are two different kinds of exploratory programming. The first I'll call the "Lisp model", where you are building a system from scratch, without external dependencies. The second, which I believe is much more common, is the "Perl/Python model", where you are interacting with existing systems and building on previous work. In the first case, libraries don't really matter, but in the second case, libraries are critical. The recently-popular article Programming in a Vacuum makes this point well, that picking the "best" language is fine in a vacuum, but in the real world what libraries are available is usually the key.
Besides the lack of libraries. Arc's slow performance rules it out for many of the algorithms from Programming Collective Intelligence. Many of the algorithms run uncomfortably slow in Python, and running Arc is that much worse. It's just not true that speed is unimportant in exploratory programming.
On the positive side for Arc, chapter 11 of Programming Collective Intelligence implements genetic programming algorithms by representing programs as trees, which are then evolved and executed. To support this, the book provides Python classes to represent code as a parse tree, execute the code tree, and prettyprint the tree. As the book points out, Lisp and its variants let you represent programs as trees directly. Thus, using Arc gives you the ability to represent code as a tree and dynamically modify the code tree for free. (However, it only takes 50 lines of Python to implement the tree interpreter, so the cost of Greenspunning is not particularly severe.)
To summarize, a language for exploratory programming should be concise, interactive, reasonably fast, and have sufficient libraries. Arc currently fails on the last two factors. Time will tell if these issues get resolved or not.
So look for "Scheme libraries" or "Lisp libraries", not "Arc" libraries. Needless to say, you'll find lots.
Shouldn't this be obvious, or what am I missing?
Don't get me wrong, it is good to see a language like Arc emerging the field, but it is a difficult field those days. People expect a language to do automagical things in an easy way.
Like in Ruby.
(Or Python.)
;-D
I just asdf-installed all the libraries I needed.
I'm sure CL lacks some libraries, but for Web 2.0 / collective intelligence tasks it seems OK. :-)
I think there's a new meme that Lisp has no libraries, but it's demonstrably false. I don't know any Lisp programmers (except maybe one trying to build a new dialect) who would write a program without using existing libraries.
Use newlisp - http://www.newlisp.org
It's tiny, it's monolithic - one 250k executable, no need for "system installation", it has all modern networking and APIs built-in, easy and direct access to C libraried, operators on high level, and speed of perl/python.
OR: - use picoLisp, which is more cryptic, but very, very usable and practical - what is proved by it's author's consulting experience.
Do not play with Graham's talkers - use one of the implementations that already do and surpass what "arc" only purports to become
even further and say go to the source
and use Common Lisp which is industrial-strength lisp with plenty of libraries (more being written every day, http://www.cl-user.net and
http://www.cliki.net) and forget about
poor pretenders to the throne such
as newlisp or arc or picolisp or whatever crappy-obsolete lisp implementation will come out next.
Python, along with most non-lisps, doesn't have the powerful lisp macros to do this re-programming. Just as Arc lacks libraries so it can't do the exploratory programming that Python can, Python is poorly suited for the programming language exploratory programming that Arc provides.
<< Home
Subscribe to Posts [Atom]

