Arvind Narayanan's journal - Pypes work! (more or less) [entries|archive|friends|userinfo]

Pypes work! (more or less) [Jul. 31st, 2007|01:24 am]
Previous Entry Add to Memories Share Next Entry
[Tags|, , ]

If you hack Python at all, it might be worth your while to read this.

My bitching about function call syntax motivated someone to actually try to solve the problem in Python using operator overloading. It was a great first step, but I tried to make it useful enough to write code using pipe syntax. Here is the original code (I love the name):
class Pype(object):
  def __init__(self, op, func):
    self.op = op
    self.func = func
  
  def __ror__(self, lhs):
    return self.op(self.func, lhs)
With this you can do:
>>> double = Pype(map, lambda x : x * 2)
>>> sum = Pype(reduce, lambda x,y: x+y)
>>> range(7) | double | sum
42
Cute. But there's a few more features we need to add to make it more broadly useful: first, we need to be able to pass arguments to func at a later time. Second, when we add arguments it turns out that map and reduce are quite different in behavior, so we just make them subclasses. Finally, most of the time a chain doesn't end with a reduce but with some other function that consumes the entire list in some way. We need a third subclass, Sink, for this purpose.
class Pype(object):
    def __init__(self, func, *args, **kwargs):
        self.func = func
        self.args = args
        self.kwargs = kwargs

class Map(Pype):
    def __ror__(self, lhs):
        return itertools.imap(lambda x:self.func(x, *self.args, **self.kwargs), lhs)

class Reduce(Pype):
    def __ror__(self, lhs):
        return reduce(self.func, lhs)

class Sink(Pype):
    def __ror__(self, lhs):
        return self.func(lhs, *self.args, **self.kwargs)
If you don't speak Python that well, the *args and **kwargs that you see are like C's variable argument lists, but far more powerful and convenient. The reason for using imap instead of map will soon become clear. You might argue that the verbosity of the code above negates the point of the exercise, but it's a one time investment. The benefits are substantial. Already at this point the Pype class started becoming useful for me. I rewrote a small function that I often use using the pipe syntax:
pDict = Sink(dict)
pSplit = Map(string.split)

def dictFromFile(filename, reverse=False):
    """Parses a sequence of key-value pairs from a file and returns them as a dict"""
    swapPair = Map(lambda x: (x[1], x[0]) if reverse else x)
    return open(filename) | pSplit | swapPair | pDict
(pDict and pSplit are reusable, so they are outside the function.) On realizing how much better this looked than my parenthetical version, I knew I was on to something.

But to supply arguments to string.split, pSplit needs to be defined slightly differently:
pDict = Sink(dict)
pSplit = lambda *args, **kwargs: Map(string.split, *args, **kwargs)

def dictFromFile(filename, delim=None, reverse=False):
    """Parses a sequence of key-value pairs from a file and returns them as a dict"""
    swapPair = Map(lambda x: (x[1], x[0]) if reverse else x)
    return open(filename) | pSplit(delim) | swapPair | pDict
i.e, instead of a Map, it's a lambda that returns a Map. Here's some more awesome stuff we can do:
def writeFile(values, filename=None, file=None, delim="\n"):
    out = open(filename, "w") if filename else file
    for v in values:
        out.write("%s%s" % (v, delim))
    if filename: out.close()

pWrite = lambda *args, **kwargs: Sink(writeFile, *args, **kwargs)
pPrint = Sink(writeFile, file=sys.stdout)
pError = Sink(writeFile, file=sys.stderr)

def dictTofile(dic, filename, separator=" "):
    """writes a dictionary to a file"""
    keyToStr = Pype(lambda k: "%s%s%s" % (k, separator, dic[k]))
    dic | keyToStr | pWrite(filename)
Several points to note here:
  • the interface to writeFile is somewhat ugly but as long as we are only using the Sinks that call it, we should be fine
  • a dictionary when evaluated in a generator context yields its keys, which is very useful
  • the whole thing is evaluated lazily - we don't start pulling keys out of the dic until writeFile starts executing! One good consequence of this, for instance, is that we use no extra storage. Using dic.keys() insetad of dic would void this benefit.
  • You usually need pPrint instead of the builtin print to output a result of a Pype, because it's a generator, as in
    >>> xrange(5) | pDouble | pPrint
BTW, here are pDouble and some more definitions:
pDouble = Map(lambda x:2*x)
pSum = Reduce(operator.add)
pFilter = lambda f: Sink(lambda x:itertools.ifilter(f, x))
pReverse = Sink(lambda x:list(x)[::-1])
Now for the part that caused me the most headaches: composition. If you tried to define
pPrintPairs = pSplit() | pPrint
it fails because it tries to evaluate the pipe immediately. This is the part where you start cursing the absence of macros. To fix this, we need one more subclass and one more operator in the base class Pype:
class Chain(Pype):
    def __init__(self, lpype, rpype):
        self.lpype = lpype
        self.rpype = rpype

    def __ror__(self, lhs):
        return lhs | self.lpype | self.rpype
  
class Pype:

    ...

    def __or__(self, rhs):
        return Chain(self, rhs)
This makes the syntax work just like we expect. The components of a chain can be grouped and combined in any way! Here's pype.py -- it's got a few more things than I had space to talk about here. I'm going to make a genuine attempt to write my code using Pypes whenever it makes sense. Let's see how it goes.

I've already started using ipython's shell mode instead of bash half the time, and after this it should become even easier to make python my shell (side note: I can no longer imagine life before ipython.)

And once again, a big thanks to t3rmin4t0r who got me started on this!
LinkReply