Perl with a Lisp

Browsing around on hacker news one day, I came across a link to a paper entitled "A micro-manual for Lisp - Not the whole truth" by John McCarthy, the self-styled discoverer of Lisp. One commentor stated that they have been using this paper for awhile as a code kata, implementing it several times, each in a different language, in order to better learn that language. The other day I was pretty bored and decided that maybe doing that too would be a good way to learn something and aleviate said boredom. My first implementation is in perl, mostly because I don't want to have to learn a new language and lisp at the same time. The basic start is after the jump.

Building a lisp seems to center around two key decisions. First, how do you represent your core data structure? A two-element array? A struct? Something a little more complex? Second, what are your scoping rules. Lexical? Dynamic? Global? After that, everything else is gold plating. Substrate-langauge interop, how you represent scopes, how to get closures right, macros, etc, all can be determined later.

I've chosen to write this first implementation in perl. I know perl pretty well but more importantly I don't know lisp very well at all. I've done a little elisp hacking, but not much. I certainly don't know how all of the pieces fit together quite yet. This first post is really more about getting the fundamental data structure and list-manipulation routines and the reader down. Later posts will elaborate on eval and friends, as well as closures, scoping, and perl interop.

Data Structure

Lisp represents most things fundamentally in terms of what's known as a cons cell. This is some sort of object that has two slots for other objects, be they primitives or other cons cells. Being a good little modern perl programmer, I've chosen to implement this as a small Moose-based class:

package Cell;

use Moose;

use overload
    'bool' => sub { return !shift->is_nil() },
    'fallback' => 1
;

has 'car'    => (is => 'rw');
has 'cdr'    => (is => 'rw');
has 'is_nil' => (is => 'ro', default => 0);

1;

Using Moose, we define an object with two read-write slots named car and cdr. This is due entirely to historical precident: car is the first element in the pair, cdr is the second. is_nil is there to allow us to define a fixed nil value later on. The overload allows us to use a Cell in a boolean context. Anything that doesn't have is_nil set is true;

Fundamental Functions

Now that we've got the data structure done, let's define a few fundamental functions to work with it.

our $NIL = Cell->new(is_nil => 1);
sub nil
{
    return $NIL;
}

our $T = "t";
sub t
{
    return $T;
}

sub equal
{
    my ($a, $b) = @_;
    return t if $a eq $b;
    return nil;
}

Notice how $NIL is just hanging out there. It's the only Cell that will ever have _is_nil set. We return the reference to the singleton from the nil function. t is the opposite. We just return the atom t. equal exploits perl's built-in comparison operator eq to compare two things.

Now, the good stuff. List manipulation:

sub cons
{
    my ($thing, $list) = @_;
    return Cell->new(car => $thing, cdr => $list);
}

sub list
{
    reduce { cons($b, $a) } (nil, reverse @_);
}

sub car
{
    my $thing = shift;
    confess "Argument to car must be a list"
        unless ref($thing) && ref($thing) eq 'Cell';
    return defined($thing->car()) ? $thing->car() : nil;
}

sub cdr
{
    my $thing = shift;
    confess "Argument to cdr must be a list"
        unless ref($thing) && ref($thing) eq 'Cell';
    return defined($thing->cdr()) ? $thing->cdr() : nil;
}

cons creates new Cells, setting their car and cdr as appropriate. The list function is a pure convenience thing to make setting up singly-linked lists easy. car and cdr do a small amount of error checking and call out to the given Cell's car() and cdr() methods.

Functions also defines some functions that will be used later, as well as some things that can walk lists and trees made from cons cells and do something with them. It implements a list_string function which will be imported as the (print) function, once we have symbol tables and function importing defined.

There are a bunch of tests for these functions in 01_list_manipulation.t.

Reader

Lisp's parser is referred to as the reader. Generally you interact with it using the (read) function, which pulls off of the input stream and returns the next parsed form as an AST. This reader consists of a hand-rolled recursive descent parser in Read.pm that implements these constraints:

  • Numbers consist only of numeric charcters and decimal points.
  • Symbols start with [a-zA-Z] and can contain anything within that range, as well as numbers, the ':' character, underscores, and dashes.
  • String literals start and end with the '"' character. Escaping is not implemented yet.
  • Lists start with '(', end with ')', and contain one or more whitespace-delimited things.
  • Whitespace is skipped.

This most basic of readers is only 113 lines of perl, but it can parse a string of characters that look like lisp and turn it into a tree of cons cells, ready to be evaluated. Tests and examples can be found in 02_read.t.

Well, that's all for now. It's a good start, but doesn't really deal with any of the interesting bits yet. Next up: (eval).

Posted in: Software  

Tagged: Perl Programming