Log in

No account? Create an account
Programming languages - Arvind Narayanan's journal [entries|archive|friends|userinfo]

Programming languages [Mar. 30th, 2006|11:57 am]
Arvind Narayanan
[Tags|, , , , ]
[Current Mood |thoughtfulthoughtful]

This is a long and hopefully insightful post about the power of programming languages, particularly Python and Lisp. Most of this is inspired by the Theory of programming languages class I'm taking this semester, which is one of the toughest classes I've taken at UT. The reason I'm finding it tough is that while a lot of the stuff we learn in grad school here is part of the undergrad curriculum at IIT, the theory of programming languages is one significant omission.

Conceptually, the lambda calculus is very powerful stuff. The entire theory is way too complex to describe here, but let me present one nugget. Consider booleans. Anything you can do with a boolean can be done with a branch (a JNE instruction, an if-then-else, however you want to think of it). For instance, "a and b" is the same as "if a then b else false". So in the lambda calculus, a boolean is an abstraction which will branch to its first or its second argument. That's the only thing you can do with a boolean, and there are no separate branch statements. Booleans and branches are the same thing. (For those who know what a lambda expression is, "true = lambda a. lambda b. a" and "false = lambda a. lambda b. b".)

This takes some getting used to. Unfortunately, when you're learning the lambda calculus you have to get used to having to make big shifts in your whole world view every five minutes. Like I said, conceptually powerful stuff.

But does the lambda calculus, in its avatars as functional programming languages, actually give you more expressive power over procedural or object oriented programming? The pedantic answer to that would be that all functional programms can be converted into procedural programs (both are Turing complete, if you understand computer science theory), but that's clearly not what I'm talking about. We don't write machine code even though we can achieve the same things by doing so.

Most academic types strongly believe that the answer to the above question is affirmative. Apparently much of the love affair that academia has with functional programming goes back to a 1984 paper by John Hughes, "Why Functional Programming Matters." It is rather weak, at least from the modern reader's point of view, for such a seminal paper: it claims that functional programming rules all because it has higher order functions and lazy evaluation. Higher order functions are functions that take functions as arguments. They're somewhat like function pointers, only a lot less hairier. Lazy evaluation is roughly like saying that all functions behave like Unix pipes, but somewhat more powerful.

Even though this paper was written 20 years ago, I'm still surprised that these features were thought to be a big deal, because they are readily available in object oriented languages. What you can do with higher order functions you can do with Interfaces, and Iterator objects are virtually the same as lazy evaluation. I'm not sure what was the exact state of the OOP world back then, but I know that OO languages existed way back in 1967. In any case, we can say with certainty that as of today we have John Hughes covered. I am in fact (co-)writing a class paper right now which aims to show that everything in the Hughes paper can be done just as easily in Java.

But there are other, non-academic proponents of functional programming who appear to have other reasons. The foremost of these is probably Paul Graham. In particular, his essay Beating the Averages on how his startup that used Lisp for the backend totally kicked everyone else's ass is highly recommended reading for any programmer. This is my favorite part:
Languages fall along a continuum of abstractness, from the most powerful all the way down to machine languages, which themselves vary in power... to explain this point I'm going to use a hypothetical language called Blub. Blub falls right in the middle of the abstractness continuum. It is not the most powerful language, but it is more powerful than Cobol or machine language.

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

When we switch to the point of view of a programmer using any of the languages higher up the power continuum, however, we find that he in turn looks down upon Blub. How can you get anything done in Blub? It doesn't even have [feature] y.

By induction, the only programmers in a position to see all the differences in power between the various languages are those who understand the most powerful one. (This is probably what Eric Raymond meant about Lisp making you a better programmer.) You can't trust the opinions of the others, because of the Blub paradox: they're satisfied with whatever language they happen to use, because it dictates the way they think about programs.
Extremely well written, and I completely agree with it. I believe I am at the n-1th level of this hierarchy: being fluent in Python, I tend to wonder how people get anything done with less powerful languages, but being not very fluent with lisp, I also tend to wonder what's so great about it.

The killer feature of Lisp, according to Paul Graham (and many others) is macros (in lisp, a macro is function that processes code). Keeping data and code separate is something that is drilled into every beginning programmer's head, so you might wonder why macros are a good thing. This is because Lisp programs have an extremely simple tree structure (the same as all Lisp data), which makes processing code not only feasible, but very easy and very beneficial. While I am convinced of the need to experience this first hand by writing a large Lisp program, I am not convinced that python doesn't have macro-equivalent features. For example, someone who moved from Lisp to Python says you don't miss macros that much because Python has eval, and operator overloading, and regular expression parsing.


The need for the tree structure appears to be why Lisp programs need all those parentheses. It takes a little more getting used to than whitespace indentation in python (which I became comfortable with in about 15 minutes, and others have said the same thing), it's still not as bad as I thought it would be. In particular, with a smart editor and some experience, you can make it work to your advantage.

A lot of people appear to think that removing lambda means that Python won't be functional anymore. That's dumb: lambda is just syntactic sugar; the real functional progamming support in Python is the ability to pass functions around.

Finally, I was surprised to find that Lisp is way faster than Python (at least when the Great Computer Programming Language Shootout was conducted.) This is probably because Lisp is more mature and is less dynamically typed. Fortunately, Python has made great progress in this area in the last few years.

[User Picture]From: haran
2006-03-30 08:01 pm (UTC)
You mentioned how theoretically, all turing-complete languages are equal but that doesn't make all languages equal (or at least that's how I understood the statement in paragraph 4). But then you go on to mention how anything you can do with high order functions and continuations can be implemented through iterators and interfaces.
I think an important point is that while you can get the required functionality in interfaces and iterators, having a language that treats functions as first-class citizens and supports continuations is *conceptually* superior. It becomes easier to program in these languages and easier (usually) to understand these programs. So I'm not sure I agree with your position that Java has got John Hughes covered.

Even Iterators (which I understand are like Python's generators, right?) can theoretically be duplicated using classes to store a function's state information between calls. However, the code becomes a lot hairier which is why iterators are a good thing. Languages like Lisp and Python go a step further and make things even easier.

And about Python being at the n-1th level... I'm sure there are lots of Ruby fans who will disagree with that. :)
I've tried Ruby and it certainly seems closer to Lisp (though I'm no Lisp expert) with its support for continuations,ability to pass around blocks and lack of distinguishing between statements and expressions (there's no such thing as a statement - Everything's an expression).

While I like Ruby, it just seems to me to trade out Python's warts and introduce a few of its own, so I'm back on the Python bandwagon.
Have you tried it? What are your thoughts on it?
(Reply) (Thread)
From: nearfar
2006-03-30 09:55 pm (UTC)
In practice, emotions matter. In theory, no.
(Reply) (Parent) (Thread)
[User Picture]From: arvindn
2006-03-31 05:41 am (UTC)
That's an excellent point. Thanks for bringing it up. I will need to address it in the paper; I'm not yet quite sure what the answer is.

Clearly, some language features are beneficial and some are trivial. Where does one draw the line? Let us consider what it would take to emulate lisp macros in C. I think the only way to do it is to embed the compiler into the compiled program. Clearly not feasible in practice.

So we could maybe draw a boundary based on the amount of effort needed to emulate a feature. To emulate higher order functions and lazy evaluations in C, I could do it pretty easily if I completely short-circuit the type system entirely (by making all my data to be of type void *.) Conversely, I think you can also show that it is not possible otherwise. So the effort needed to get the Hughes features in C is to embed a type checker into the compiled program. Again, too much effort.

On the other hand, it is way less effort to do it in Java, because subtype polymorphism gives you the run time type checking you need. But is it sufficiently less effort? Is the code clearer? I don't know how to answer that question.

This suggests an alternative approach -- throw in the towel and declare that it is meaningless to state that feature X can or cannot be easily emulated; instead, the proof of the pudding is in the eating: language A is as good as language B if you can solve almost all problems with code that is as modular and as clear in language A as you can in language B. This is probably the approach we're going to take in the paper, because we already have a body of programs in the Hughes paper which are supposed to showcase why Lisp programs are more modular. If they turn out to be as modular and clear in Java, then we can simply appeal to the reader's common sense.

We are of course open to the possibility that our Java programs will end up being hideously complex and ugly. I'll probably make another post on this topic when we're done writing the paper.
(Reply) (Parent) (Thread)
[User Picture]From: haran
2006-03-31 07:37 am (UTC)
Let us consider what it would take to emulate lisp macros in C. I think the only way to do it is to embed the compiler into the compiled program. Clearly not feasible in practice.

I may have an incorrect understanding of how Lisp macros work but from what I understand, they allow you to manipulate the text or the symbols in an expression rather than the expression's values. Is that right? Its kind of like C's macro preprocessor on steroids. Perhaps a more powerfull preprocessor in C would do the trick? (Of course, then the feature is not in the language itself, but in a preprocessing tool that runs before the actual compiler is invoked.

And on a tangential note, have you seen this bit of Python awesomeness:
(Reply) (Parent) (Thread)
[User Picture]From: arvindn
2006-03-31 08:40 am (UTC)
Aww crap. I'm sorry, I meant lisp (and Python) eval, which is executed at runtime. Don't know why I said macros -- I guess I was thinking about whether Python can emulate lisp macros while I was typing.

But anyway, even if you look at lisp macros, it's nearly impossible to get them in C -- for a different reason. The best way to explain this is that Python aleady has macros; it's just so hard to use it's as good as not being there. For example, from one of the pages linked to in my post, this is an example of a python parse tree.

>>> parse("2 + 2")
['eval_input', ['testlist', ['test', ['and_test', ['not_test', ['comparison',
['expr', ['xor_expr', ['and_expr', ['shift_expr', ['arith_expr', ['term',
['factor', ['power', ['atom', [2, '2']]]]], [14, '+'], ['term', ['factor',
['power', ['atom', [2, '2']]]]]]]]]]]]]]], [4, ''], [0, '']]

It could be made simpler, but there's an inherent limit, because the syntax is too complicated. With Python there is some hope because it has good regexes and stuff, but in C it is hopeless.
(Reply) (Parent) (Thread)
[User Picture]From: arvindn
2006-03-31 06:07 am (UTC)
Sorry I forgot to address the rest of your questions.

An Iterator is an interface to a list. Instead of exposing the members of the list, you expose only a function which returns the next element of the list. This allows the list to be infinite.

Now, Iterators have sources, filters and sinks (think Unix pipes). Sources are usually either an input stream, or a collection of items in memory. Filters are essentially 'map' equivalents, and sinks usually either print or 'reduce' equivalents. Programming languages usually have builtins or library functions to make Iterators out of input streams or memory items (things like 'file.readline()' and 'for i in list' in Python). Sometimes, however, you need an iterator for a list that fits in neither of these categories, say when you want to iterate over the prime numbers. That's when you need a Generator. Its when you want to create your own source.

I don't think Java has generators -- if you want to do that you have to manually save and restore state.

As for the hierarchy, I meant that Graham puts the five languages he considers on a hierarchy, and Python is just below Lisp on it (Ruby is not in the list). I haven't tried Ruby.
(Reply) (Parent) (Thread)
From: (Anonymous)
2006-04-14 01:49 pm (UTC)

The Computer Language Shootout

"surprised to find that Lisp is way faster than Python"
The Computer Language Shootout was revived in 2004, and is updated most weeks.



And to reach all the data, start from the homepage
(Reply) (Thread)
[User Picture]From: arvindn
2006-04-15 03:15 pm (UTC)

Re: The Computer Language Shootout

I didn't know that, thanks.
(Reply) (Parent) (Thread)
From: (Anonymous)
2006-05-08 06:00 am (UTC)

Lisp vs the others

I disagree with Graham, I don't think programming languages fall on a line, but a plane, or an n-dimensional space or something. Some languages are better than others - at some things. Lisp is very good at some things that, although conceptually impressive, doesn't mean all that much in practice.

One thing that has annoyed me to no end with functional languages I've tried (Scheme, ML) is that it seemed everyone eagerly used the powerful macro systems to do powerful stuff - in completely different ways. I found it hard to read, because you could bet that two functional programmers would never do the same thing in the same way. In that way, it's like Perl, or worse.

It's annoying when c programmers do simple things in different ways, but with lisp macros, or functors, or what-have-you, that problem is taken to a whole new level. And it's shocking to see how little FPL designers have done to avoid the things we know cause a lot of errors in established languages, things which could often be avoided by merely a less terse syntax.

A lot of the common sense of OO languages seem completely lost on the academics promoting FP. I think I can sum it up in this: that we agree on a way of doing things is immensely valuable. Often, doing things in a different, more "consise" way, is simply not worth the cost.
(Reply) (Thread)