Skip to content

joinr/clclojure

Repository files navigation

CLClojure: An Experiment Port of Clojure to Common Lisp

Background

Porting Clojure seems to be the thing to do these days. After the clojurescript compiler came out, people had a really good blueprint for hosting clojure in other languages. The fact that the clojurescript compiler is defined, largely, in terms of protocols, makes it a bit closer to the “clojure in clojure” goal. As a result, you should be able to bootstrap the language (as they did with JavaScript) with a minimum of primitives. Most of the language is defined in Clojure, so voila.

We currently have Clojure ports targetting Scheme, C, C-via-Scheme, Python, etc. I’ve found Common Lisp to be a particularly slick environment with some cool tools and a fairly active community. I think it would be interesting to port the ideas from Clojure to Common Lisp, ultimately implementing a compliant Clojure in Common Lisp.

Layout

The current WIP is broken into several Common Lisp files (only tested on SBCL). You can look at the dependencies in clclojure.asd. The general layout and organization follows (subject to change):

common-utils

Contains a plethora of utility functions and convenient clojure analogues (or direct ports) useful for porting clojure source. Primary features include multi-arity lambdas via lambda* and multi-arity functions via defun*. Also provides a with-recur form that allows for a similar idiom for Clojure’s loop/recur. This is typically pulled in by all other packages for general utility. Some limited sequence functionality overlaps with sequences.lisp, and may be replaced in the future.

sequences

Provides an implementation of clojure’s lazy sequence abstraction. Implemented in CLOS, and porting the seq protocols. Provides a subset of the clojure core sequence library for CL used in bootstrapping and metaprogramming. The intent is not to have a full port, but a minimally useful subset. Integrates the seq abstraction with CLOS sequence types.

pvector

A port of the java implementation of clojure’s persistent vector based on Hash Array Mapped Tries (HAMTs). Serves as a stand-alone datastructure with local functions. Also defines subvec which is the O(1) slice of a persistent vector.

cowmap

A simple persistent map implemented by copy-on-write operations over a standard hash table. This is a minimal implementation to support bootstrapping, and not intended for performance or production use.

pmap (wip)

A port of the java implementation of clojure’s persistent hashmap. Incomplete.

protocols

Defines forms for defprotocol and deftype in common lisp. Protocols are implemented as bundles of generic functions and args, stored in a CLOS struct. Protocols are registered, and define satisfies? predicates for the protocol to ensure valid implementations of reify deftype extend-type and extend-protocol. Protocol definitions expect vector args (rewrite pending to extend these to sequences to allow simple cl-based protocol definitions).

reader

Provides a consolidated place for reader macro support. Primary purpose is to enable vectors, maps, and sets for bootstrapping. Original experiments also supported characters. Intentionally hijacks the read-table. Needs to be ported to named-readtables, or excluded entirely once bootstrapping of clojure.tools.reader is done.

eval

Due to the custom evaluation rules for clojure’s data literals, simple reader macros are not enough to get literal support. This package defines a generic function, custom-eval, and an API for installing/uninstalling custom evaluation support in SBCL. The implementation swaps the function definition for SB-IMPL::simple-eval-in-lexenv and adds a final condition to the non-list eval rules: if custom-eval is supported, then the generic function is invoked. This allows user-defined extension of the evaluation semantics in a minimal way, and retains low-level support (e.g. from the compiler).

literals

Consolidated place to extend custom evaluation rules for vectors, maps, and other literals.

keywordfunc

Initial hack at implementing keywords-as-functions support in CL. This currently works only in lexical forms, and is not a global modification. Exposes the macro with-key-fn which will scan the body for keywords in function position, and build a table of keyword functions and fbind. The more commonly used feature is the keyaccess type, which is used in the lexical package.

lexical

Implements a lexical environment ala let* called unified-let* that will unify lisp-2 bindings into a lisp-1 lexical environment for the body. This is the foundational method for implementing clojure-style let.

bootstrap

Implements a core set of clojure special forms and brings along the protocol definitions from cljs.core. Eventually, the entire cljs.core library will be pulled in here and evaluated. This should hopefully provide a sufficiently robust environment to then bootstrap the rest of clojure via tools.reader and tools.analyzer, leveraging extant CL implementations from the earlier bootstrapping process where it makes sense.

symbols (wip)

Nascent thought piece on implement clojure-specific vars and namespaces. Clojure’s qualified symbols and keywords are a tad different and require some thought towards integration with CL.

Goals

Bridge the gap between the cool stuff in clojure and Common Lisp.

Implement really useful bits of clojure in portable common lisp, and provide them as stand-alone libraries.

This includes lazy sequences, the generic sequence abstraction, and the fundamental persistent hash-array-mapped-trie data structures in clojure:

  • persistent vectors
  • persistent maps
  • persistent sets.

Extend the generic sequence abstraction and other idioms to Common Lisp’s built-in mutable structures.

Common Lisp already has a sequence library, but I think Clojure’s is more general and can be trivially extended to new types. Common Lisp’s irrational locking down of SEQUENCE is a hurdle here. The HYPERSPEC will never be updated in my lifetime :) So generic functions are the current way to bridge this stuff.

  • Protocols are really nice, as are Clojure’s arbitrary dispatch multimethods.
  • Data literals are also highly useful. I think Clojure nailed the choice of literals, so providing reader macros for these guys would be very nice.

Possibly wrap a Common Lisp STM implementation, or cheat and use something like lparallel or just delegate to clojure.core.async (like clojurescript).

Bootstrap a fully functional Clojure onto Common Lisp.

Learn.

So far, this project continues to be a great way to learn about both CL and Clojure, to include implementation hurdles, support for PLT stuff, long-standing warts from decisions made ~25 years or more ago, and work-arounds. Nothing is insurmountable so far, which is pretty cool.

Status

Began work porting Clojure’s persistent data structures from Java about years ago, while

simultaneously learning Common Lisp (and by necessity Java :/ ).

  • Got a working persistent vector implementation, with compatible clojure literals about a year ago.
  • Started working on persistent hash maps around November 2012.
  • Built a temporary implementation of clojure-style protocols in Common Lisp ~ Dec 2012.
  • Pulled the bits into an actual ASDF system and pushed everything to the Github August 2013.
  • Implemented a baby clojure evaluator that __should__ bridge the lisp1/lisp2 gap between clojure and the Common Lisp host. Unsure if this is going to work out in the long term, but eh.
  • It’s real trivial at the moment.
  • Working on library code in my spare time, still need to port the persistent map.

Revisited 2017

  • implemented some rudimentary stuff
  • vector reader macros not fleshed out; work fine at the REPL, but failed to return a vector (instead returning a form to create the vector). Trivial but important oversight.
  • Still hijacking the global read-table. Pulled in named-readtables to help, but haven’t integrated.
  • Working on variadic functions, explored funcallable classes, refined protocols, deftype.
  • cleaned up the build process (more work to be done here)

Revisiting 2018

  • got reader macros matured (vector literals produced properly now),
  • got protocol definitions and implementations respecting vectors, albeit in a hacky way. We still allow lists for args….
  • working on deftype, then getting the bootstrap from CLJS (core protocols and functions) to compile.
  • Working on leveraging named-readtables for better delineation of clojure source, to include file-level (somewhat racket like).
  • Also intend to leverage named-readtable to get case-senstive reader (via :modern), and to enable CL interop trivially with a macro allows CL reader inside body (vice versa, for inline clojure if it makes sense).
  • refining the approach after reading a lot, looking at some other sources of inspiration (including early proto-clojure->java compiler from Hickey in CL)
  • Basic def, fn works. Protocols work. Metadata works mostly. Deftype
  • got let, albeit without destructuring. Still useful for bootstrapping!
  • Initial implementation of reify working, wrapped deftype in a version compatible with cl:deftype

Revisiting 2019

Reconciled some problems with data literals, clojure’s evaluation model, and CL’s model.

The issue is tricky and - even after using Clojure for 8+ years - something I completely missed.

So folks laud CL having awesome reader macros, and talking about how they allow you to implement any language etc. Banking on these comments, I went about doing that (Clojure). As it turns out….IF you want to define a language with new data literals , and those literals have their OWN evaluation semantics, reader macros cannot help you (they help to a point).

  • I think racket probably better supports this based on research, but I have no proof.

When we read a vector in clojure (not a syntax-quoted one), we actually get a literal vector data structure (not a a list of `(vector arg1 arg2 arg3), but an actual vector of [(eval arg1) (eval arg2) …]).

I chased my tail on this twice now, and finally realized that in clojure, (eval [x y]) -> [(eval ‘x) (eval ‘y)].

So when you read a vector, and eval it, you actually have to apply those evaluation semantics to build up the resulting vector. In CL, eval just passes the vector through.

This poses a problem in CL, since yes, you can trivially extend the reader to handle vectors, but you CANNOT get the evaluation semantics to line up, since CL provides no mechanism to do so (unlike say reader macros). You can of course define your whole infrastructure, including your own eval, etc., but I’m interested in minimally bootstrapping a functional enough clojure to get the language tooling ported, so hacking on running lisp implementation of `eval` is attractive.

I tried having the reader macro return `(vector ,@args) instead of building the literal vector directly ala (eval `(apply #'vector ,@args)) , which worked (mostly!), except that for things like macros, instead of dealing with a vector for say bindings, you’d now be dealing with the syntax-quoted list of (vector arg1 arg2 ....).

I though about hacking all the macros to recognize this, and coerce the vector sexpr into an actual vector, but it’s like squeezing air in a balloon.

Rather, simply extending eval sb-impl:simple-eval-in-lexenv to understand that there are other data literals (as expressed by say a generic function) that have their own evaluation semantics as forms, allows one to get up to par with using clojure literals e.g. in macros.

It should be minor surgery (and actually portable), but since the goal is to bootstrap something that can read the source for clojure.tool.reader and clojure.tools.analyzer, etc. and emit a pure common-lisp implementation, I “think” it’d be a one-time deal.

  • Currently implemented in eval.lisp, which provides clclojure.eval and an API for allowing custom evaluation semantics defined by the generic function custom-eval.
  • Also implemented a companion package, clclojure.literals, which enables custom evaluation, and extends the semantics for the boostrapped persistent structures.

Quasiquoting Data Literals

So we have evaluation semantics consistent with Clojure for vectors, maps (and trivially sets when they arrive).

The backtick implementation in sbcl’s backq.lisp is currently befuddling due to lack of familiarity. Based on the existing implementation in readers.lisp, I get unquote working fine with user-defined literals like vectors and maps, but unquote-splicing is throwing. Trying to hack around it from the reader level, e.g. coerce to a normal quasiquoted sexpr then eval that, rather than hacking backq.lisp as I did clclojure.eval. Lack of familiarity is stunting progress a bit.

I had to dig into the innards of their eval and backquote implementations to jank some stuff together, but it works. It’s life Jim, but not as we know it…. I literally got done hacking the quasiquote stuff in reader.lisp, clclojure.reader:quasify to make things work, and had no idea how I got it to work since my previous intuitions had failed. SBCL uses a “comma” struct to indicate splicing operands inside a quasiquote, and there’s some weird interplay with macroexpand. So I kludged stuff together through trial and error, to build an arglist to shove at my vector and map constructor.

The other hack is wiring in a generic method into sbcl’s default eval implementation,

This admits defining custom evaluation rules for new data literals, like vectors and maps.

Surprisingly, SBCL has some well documented/written source. I learned a bit plumbing through it, although their compiler stuff (and IR-1 intermediate language) is still magic to me. Thankfully, it appears to hacking the simple-eval-in-lexenv function is enough to wire into the compiler (I’m guessing the compiler form leverages it downstream, but I’m not sophisticated enough yet to see how it all weaves together).

Once the CL native reader and analyzer are in place, then the normal eval semantics would take hold (unless there’s a desire to maintain the bootstrapping methodology and allow clojure literals like persistent vectors, maps, and sets to have eval semantics for interop or something, I dunno).

Refactoring

There are a lot of “learning” debris throughout the code base that are not on the critical path for bootstrapping (or no longer are). These need to be pruned and moved elsewhere so that a consistent, clear path of required code is attentuated.

Sequences and Hacks

So I decided (for ease of metaprogramming my bootstrap), to clean up the lame sequence implementation I had from years ago and write a proper one. This went very well.

I even got up to the point where I’d implemented idiomatic lazy sequences, integrated with CLOS, and started getting reduce working. Then I ran into a snag:

I wanted to use CL’s built-in reduce for its sequence types, since it was already efficient. Clojure has the notion of a simple protocol, InternalReduce, which basically says “hey, reduce yourself!” or else it falls back to (likely slower) seq-based first/rest reduction. So this is desirable (plus nice interop points).

I got everything working, but one thing was wrong: Clojure implements a short-circuiting variant of reduce, via invoking (reduced the-result). This creates a little wrapped type that can be quickly checked during the reduction loop to see if it’s a reduced item, which signals to stop reducing. The reduced item is then unpacked (via deref) and returned as the result.

Problem: the CL folks didn’t think of this decades ago. So how to bootstrap this functionality on extant concrete?

I almost gave up, then remembered vaunted warnings from Java land about idiots using Exceptions for control flow. This built on the idea that I could just throw an error if I detected a reduced value, and handle the reduction gracefully outside of the built-in cl:reduce.

My ugly hack (although using the elegant tools of signals and handler-case) emerged:

We just leverage the excellent low-level construct block to define an escapable context we can return-from. From inside this block, we wrap the call to reduce with a lambda that - upon detecting a reduced item - invokes return-from with the dereferenced value of the reduced item.

As another fun time, I also ran into problems with CLOS and multiple dispatch that were a little unsavory. Had to learn about call-next-method. Realized that CLOS doesn’t have a method preference hierarchy by default (clojure does), and that these kinds of problems create interesting lisp OOP solutions…Also managed to crash SBCL due to it several times (I was trying to leverage SBCL’s extensible sequences, where unlike other lisps, you can inherit from the sequence base class, which I wanted my lazyseq classes to do). This ended up failing since going the generic method route, I wrote a specializer on ‘sequence, which overode the other specializer on ‘lazyseq, since ‘lazyseq inherited from ‘sequence. Despite getting some of it worked out, I still managed to crash SBCL several times (stochastic…), and reverted to NOT having ‘lazyseq extend ‘sequence for now.

Loop / Recur (currently with-recur)

Basis of loop/recur. Combined with lambda* and defun*, this will allow clojure style recur forms in function bodies (with a trivial extension to loop / recur). I’m not checking for tail position yet….probably need to do that to ensure correctness. Might have to code walk.

This took many tries (dissecting dolist, some simple loop forms), as well as trying and failing to implement a (more elegant) version with macrolet and another that used a simple labels form for recur (labels blew the stack…. think tco only happens there if compiled).

CL has some nice lower level flow control primitives like block and tagbody.

  • “BTW, does your code work on non-tail `recur`s? (Is there even such a thing? idk.) Example:”
(with-recur (x 2)
   (if (< x 4)
       (progn
         (recur (1+ x))
         (format t "back from recur~%"))
       x)) 

I think, based on the semantics of “go”, it will, since it’s effectively a goto that jumps to the tag in the tagbody.

However, I’m currently working on tailcall detection to enforce clojure’s semantics.

The new version is now like clojure’s

(loop [x 2] 
  (if (< x 10)
     (recur (inc x)) 
     x))

with a sequence of bindings denoting [arg1 init, arg2, init] ….

(with-recur (x 2) 
  (if (< x 10) 
      (recur (1+ x))
      x))

Started off optimizing to see if I could avoid emitting with-recur for functions that don’t invoke it, so code-walk and look for recur forms.. Then realized I needed to detect tail calls…so started going down that rabbit hole..

Curious to see if there’s a simple way to walk the code instead of denoting the grammar of tail calls. Like something empirical to mechanically demonstrate the evaluation is independent of further calls. Say, build a DAG representing the call graph, demonstrating that recur only ever occurs at leaves.

My cro-mag approach it to just define special cases for each form and check that way. Tricky stuff…

Tail Call Detection / Enforcement

Defined some rules (currently for CL, but clclojure should work on top of these with little to no changes) that follow the tail call semantics for special forms.

In common-utils, these functions inform with-recur about the viability of the code at compiletime and throw errors if invalid tail calls are detected:

  • detect-recur Simple naive code walker that walks an expression looking for (recur ...)
  • tail-children Defines the (currently static) semantics of different special forms detailing whether they have children and what position (:tail or :non-tail). Returns a list of pairs of (:tail|:non-tail child-expr)
  • categorize-tails Walks an expression, accumulating a list of call-site structs, which encode what kind of call was made (:recur, :illegal-recur) and what the expr was for use in reporting errors.
  • summary-tails Reduces the output of categorize-tails into a pair of (t|nil illegals), where illegals is a list of illegal call-sits, and t|nil reports whether “any” instances of recur calls were detected.

with-recur then leverages summary-tails to determine if there are errors in the input at compile time, and signals an illegal tail call if so. If there aren’t errors, but there are not instances of recur calls in the expression, with-recur just optimizes down to a semantically-equivalent let*, so it can be invoked regularly with no added codegen (e.g. in lambda* or other recur targets) if recur never shows up.

Otherwise, a form with a local function recur is emitted and things perform as expected, with proper tail calls enforced.

Symbol Interning and Mucking With MetaProgramming

Discovered, through use of case to match on symbols for dissecting lists during code walking, that my functions wouldn’t work outside of the package they were originally defined it. Subsequently, macros depending on said functions, (with-recur being the foremost example) failed to behave as expected. It turns out, this has to do with how CL interns apparently unqualified symbols in a package.

Say I implement a function to examine the first element of a list and dispatch depending on the symbol detected there:

(defun detect-foo-bar (expr) 
  (case (first expr) 
    (foo :foo)
    (bar :bar)
    (t nil)))

This function will work great so long as I’m inside of the package where I defined it. The case test for each clause uses common-lisp:eql, so the notion of symbol equality applies to the args. The side-effect introduced here is that the symbols ‘foo, ‘bar, are actually….’current-package:foo and ‘current-package:bar rather than being free-floating, unqualified symbols. If we’re at the repl, they even print as such, without any indication that they’re qualified. This is standard behavior.

The tricky part is that if we then export detect-foo-bar and try to use it from other-package,

CL-USER> (original-package:detect-foo-bar '(foo 2 3 4)) 
NIL

we get no joy, despite the input expression “obviously” having the symbol ‘foo in it. To compound matters, the original function we dutifully tested at the REPL works fine in original-package.

The reason is that ‘foo in original-package where we defined the function, and where we macroexpanded case to implement it, is actually ’original-package::foo , and the symbol we’re trying to match against coming from the list read and evaluated in CL-USER is actually ’CL-USER::FOO:

CL-USER> (symbol-package 'foo)
#<PACKAGE "COMMON-LISP-USER">

Under these conditions, we realize the symbols are not in the same package, thus not eql:

CL-USER> (eql 'foo 'original-package:foo)
NIL

Clojure doesn’t have this problem due to its semantics for unqualified symbols across namespaces:

Clojure 1.10.1
user=> (def foo 'foo)
#'user/foo
user=> (ns blah)
nil
blah=> (def foo 'foo)
#'blah/foo
blah=> (= foo user/foo)
true

So CL’s behavior was a bit of a surprise, particularly in how it seemingly janks up defining utility functions that work on symbols and dissect lists, e.g. for metaprogramming.

Implementing (and then testing from another package) with-recur and its auxillary functions was the first notable time this popped up. The fact that it can fail silently (in the case of case ) engenders additional caution (absent a better solution that I’m unaware of).

So, my current solution is to define a new equality test based on unqualified symbols, and their symbol-name equality. In common-utils:

  • symbol? determines if the input is strictly a symbol, and not keyword
  • symbol= determines if both inputs are symbol?, whether the symbol-name s are string=
  • seql wraps the typical common-lisp:eql with a custom path for inputs that are symbol?
  • custom-case Allows defining common-lisp:case macros with a custom user-defined test.
  • scase Implements case using seql vis custom-case and is a drop-in replacement for

existing uses of case that relied unintentionally on interned symbols for case clauses.

With these in place, and extant code for with-recur and friends ported to use scase, we now have a working implementation of clojure-style loop/recur.

Variadic Protocols

We now appear to have functioning variadic protocols, as well as marker protocols (protocols with no methods).

These work via a simple dispatch mechanism, via common-utils:lambda*.

The generic function is somewhat opaque in the args, only taking (obj &rest args) but the dispatch mechansim conforms to the protocol definition, allowing multiple dispatches per clojure’s definition.

Behind the scenes, we have a simple lambda* that’s created for the implementation during defmethod time, which actually carries out the implementation.

It is invoked inside the body of the defmethod specializer, and simplied applid to the obj and args, allowing the lambda* dispatch mechanism to take over.

This is good for bootstrapping, but the lack of integration with slime and e.g. arg detection (ala CIDER) is not desirable long-term.

For dev purposes, it’s meaningless though since we’re bootstrapping pretty simply. I don’t plan to hack a custom slime mode to enable documenting multiple- arity functions just for the boostrap.

This opens most of the remaining doors for our basic boostrap. We can now leverage most of cljs.core to get the basic libraries lifted in with little effort (given the current state of reader, seqs, etc.).

It should also be possible to work backwards from the protocol implementations to get us into an even simpler bootstrapping environment. For the time being, we’ll continue wrapping existing CL functionality though.

Generic Symbol Equality (#’seql)

One nagging feature that probably needs to be addressed is the necessity of using common-utils:seql for symbol equality testing due to CL’s somewhat baroque equality and the impediment it causes toward simple metaprogramming.

This is prolific: even hash tables aren’t trivially extended with a custom equality test out of the box, although that’s not a vast hurdle.

Lexical Literals

So, everything worked fine when evaluating literals from the null lexical environment based on the custom-eval semantics in clclojure.eval:custom-eval and related wiring into sbcl’s eval.

This did not carry over into expressions with lexically bound forms that were also data literals. Results kept returning wierd quoted forms for values rather than the actual evaluated forms.

It ended up being a long and tricky process to tease out the underlying cause, and to diagnose a quickish fix.

As it turns out, CL is pretty canny about how it handles lexical scope. To satisfy the compiler, we’d like to pass off friendly, common forms that SBCL expects. Unfortunately, if literals show up in various places where lexical scope is defined (such as the right-hand side bindings for let*, which is emitted by clclojure.base:let), we end up with problems.

We also end up with problems if the literal is the result (or any piece of) the body of expressions that extend lexical scope.

We could theoretically fix all this at a somewhat high level, with a code-walker that recognizes our clj forms and does the requisite transforms at macroexpansion time. This is cool, but not necessarily what we want for helpful user feedback, e.g. macroexpanding a form expecting to see clojure literals.

Thankfully, we have access to the evaluator, and can continue to slightly tweak it to suit our needs. Namely, where lexical forms are defined, e.g. common-lisp:let* and common-lisp:let, we can add in a mechanical transform to the expression (not unlike manual macro) that walks the expression and expands our literals into literal-constructor forms that are readily recognized by sbcl’s evaluation machinery (and the compiler).

So, the technique is to define another generic function that user’s can express lexical expressions for, clclojure.eval:let-expr. The semantics would be:

(def y 3)

(let [x [2 y]] [x]) => [[2 3]]

In CL

(let* ((x [2 y])) 
   [x])

is compiled to

(let* ((x (persistent-vector 2 y)))
  (persistent-vector x))

Importantly, we don’t want to muck with the literals at macroexpansion time (e.g. for debugging an visualizing the code), but we do want to intercept them prior to going to evaluation. This sounds very much like a possible compiler macro (or other eval-when context), and may be trivially refactored into one.

Absent that, we define a function clclojure.eval:custom-eval-bindings, which acts as an intercessor, and takes an expression to walk, and a lexical environment to pass along in case we ever get to use it (this conforms to the function signature for sbcl’s sb-impl::%simple-eval.

We detect various types of expressions that could define literals in the lexical scope, and walk the expression based on the form’s semantics, recursively, replacing any literals with the appropriate constructor- based canonically evaluable bindings - iff the value supports clclojure.eval:let-expr?.

This gets us 90% of the way to happy, but there still exists a problem with (fn ) forms, since they actually build up lambdas.

Funny enough, CL doesn’t like to just emit (lambda (x) …) from a defmacro expression.

Neigh, if we quasiquote it, we get #’(lambda (x) …) which is effectively (function (lambda (x) …))

Since we’re emitting these forms, they end up being opaque when we look at custom-eval-bindings. So, we have to process them at macroexpansion time , e.g. in the (fn …) macro. We do so by macroexpanding the body of the lambda, and similarly walking the form with custom-eval-bindings. Then, we emit the ultimately sharp-quoted lambda with the new body. This works well, with the small tradeoff that fully macroexpanding a (fn ..) macro will expand into the more verbose form without any literals.

Multimethods (WIP)

We still need to implement multimethods but that’ll be a simple variation on the dispatch stuff. These show up in both the reader and analyzer, so they’re next.

Loose protocol implementation

Right now, defprotocol expects implementations for all function arities. I think Clojure allows partial implementation, or generates stubs via abstract methods. We need to implement this relaxation, e.g. for IFn, so we can get boostrapping advanced.

Revisiting Lexical Literals

So the original implementation for this fixed one thing only to break another. In an attempt to comport with the SBCL compiler’s pre-baked notion (read ignorance) of the evaluation rules for non-list things, I went down the route of defining a custom code walker to intercept calls to let and friends. This interception would walk the code, and invoke generic functions to coerce custom literals (like vectors) to eval-friendly forms (read, a constructor and args) so I could avoid hacking the compiler.

This worked great and fixed a major problem, where lexical literals were effectively just uneval’d and quoted directly. Now, at eval time, we get custom literals working as in Clojure.

Downside

I broke the expectation that we would be using “actual” persistent vectors as syntactic objects in macros (namely defprotocol, fn, defn). This primarily manifested as an ugly regression in extend-protocol, where CL would complain that we’re trying to apply a generic function from the pvector namespace to something non-pvector.

The short of it is that, depending on the order of eval and macroexpansion, and if we have a (let …) binding anywhere, we stand a good chance of coercing our actual [vector] into a (persistent-vector vector) for eval sake, thus breaking the expectation that literals will be preserved both for eval and code.

Fix?

To get this working, I went down a couple of paths, including looking at hacking into the compiler. Then I realized, if I can just tag the transformed literals, I can identify them and invert the process inside of macros. You’d end up with a ping-ponging of the custom eval bindings walker munging data literals into evalable forms, then macros - prior to expansion - traversing their args and coercing the forms back into actual vectors, maps, and sets, etc. Design wise it’s ugly, but performance wise should be neglible….

So the eventual solution involves using the code walker from SBCL and a couple of generic functions. We now - by convention - mark our literals with the identity function literal, which just wraps the aforementioned constructor.

We then define a function recover-literals that can walk a form and detect invocations of literal, replacing them with the result of calling symbolic on the subform, which evals the arg for literal and gives us back our custom literal.

recover-literals serves as an auxillary function used in the macro-defining-macro defmacro/literal-walker, which will quietly bind the args for the macro definition to a lexical binding where the args (inputs) are bound to their recover-literals results. This basically ensures that, when using defmacro/literal-walker, we always have access to data literals. Simultaneously, our literals now also work seamlessly in eval with custom evaluation semantics, including in lexical bindings.

So far, all tests in examples are passing.

Revisiting Quasiquoting

The method I hacked together for quasiquoting does not work as intended with lexical values in quoted literals. I did some naive variant that’s bone-headed in hindsight, where I tried to eval stuff at read-time to build an actual non-sexp literal datastructure. Turns out, this is not what Clojure does. Clojure does depart from CL’s quasiquotation (via syntax-quote), in that for quoted literals, it emits the actual sexpr construction of the literal. This is similar to my solution for working around lexically scoped literals. The only challenge in SBCL is that the ` reader macro defaults to emitting a quasiquote form. So if we leverage the built-in stuff, we get the quoted form of the sexpr vector constructors expanded from the original quasiquoted literal.

The solution is to hack the ` macro and rebind it to something that’s smart enough to detect that we’re actually quoting a literal. From there, we should just emit the expanded quasiquote result without the quasiquote form preceding it. Or otherwise, expand the quoted result.

Initial testing with persistent vectors is promising…almost there.

Nested Quasiquoting mostly works

Signficant overhaul to the old (wrong) qq infrastructure. Eliminated all readtime eval stuff, we now emit sexprs consistent with Clojure’s implementation (slightly different in form though).

We now have a cleaner quasiquotation model that’s extensible as well.

Generalized usage of sequence lib

Ensured the sequences lib is portable enough to be used, sicne it shows up quite a bit in clojure.core bootstrapping (e.g. macrology). Need to define additional core functions that are seq friendly or coercing.

Destructuring

Destructuring is pretty fundamental and shows up in a lot of places. So common forms like loop, for, let, fn, defmacro and friends all leverage it. We should be able to fairly straightforwardly port it from clojure.core.

Revisited Loop/Recur and variadic version

During porting of more core library functions over, I exposed bugs in the extant labels/macrolet based version of function implementation.

Did some experimenting and switched to the earlier loop-focused with-recur macro. Did some more testing and experimentation, and switch over to inlining arity bodies for multiple-arity function definitions. So named functions work, recur works, loop (without destructuring) is implemented (over loop*).

Our functions now work fine with lazy-seq where before they were creating infinite loops and later not recognizing function symbols for self recur.

ex-info

En route to destructure, I implemented exception handling via ex-info and friends. try-catch-finally is a macro away.

TCO detecting false positives

Need to revisit the tail cail detection code that compiles with-recur invocations, since we seem to be mistaking simple with-let and if-let forms for non-tail calls for some reason. For now, I’ve got warnings rendering, where in reality we want an error to signal (prior behavior).

Rudimentary Parity

We’re approaching basic functionality where I can just copy/paste source from clojure.core and have it work without issue. It turns out, this is a great way to explore the range of language features from the ground up and implement as needed. It might be possible to demo some advent of code solutions in clojure in common lisp before too long…

try

Defined try-catch-finally semantics from clojure.core/try in common-utils:try, which clclojure.base exports. Trivial to implement using handler-case and restart-case.

~ and ~@

Realized that sb-impl::comma-charmacro handles this and splicing (@ form), so all you have to do is add another reader macro binding for ~ using the same as =,=.

cond

Implemented clojure’s cleaner version of cond. Found a subtle error regarding the definition and use of cons. Since we shadow common-lisp:cons, we ended up with an infinite pingpong of the protocol extension of -conj for cons types. Fix was to ensure we use common-lisp:cons for all necessary non-seq cons cell methods.

Hurdles

A couple of big hurdles:

Lisp1 vs Lisp2.

  • rely on macros (like def) to unify symbol and function namespaces, leveraging CL’s environment.
    • This is mostly implemented via unified-let* in lexical.lisp, which is then used in clclojure.base:defn and elsewhere.
    • The only thing missing is arbitrary support for keyword access based on function application, which I have working prototype of, but it’s not valid outside of clojure macros.
      • Better solution is to address this as a compiler pass in the analyzer.
  • The longer route would be writing a custom eval, compiler, etc. Doesn’t look necessary at the moment.

Lexical Scope vs. Top Level

So, Common Lisp as a lisp-2 has multiple namespaces, foremost of which are function and symbol. We already have the top-level (that is, null lexical environment) symbol and function namespaces unified by using setf for symbol-function to make it identical to symbol value….but…..

  • Lexical environments don’t work that way!
    • symbol-function and symbol-value only work on non-lexical symbols.
    • Initial hack was to unify the namespaces by traversing the vars in the let bindings and unifying prior to continued binding.
      • Had a false-positive success since the symbol modifications were actually pointing to non-lexical scoped symbols (stuff from prior REPL interactions).
  • The fix for this is to use a combination of let* and labels, which CAN unify the symbol-value and symbol-function namespaces for lexical vars..
    • Defined a macro, unified-let*, that does this for us:
      • We parse the bindings, detecting if any symbol points to a literal keyword.
      • We ensure keyaccess objects are compiled during macroexpansion, which creates the plumbing for keyword access if it didn’t already exist
        • and we create function definitions for the vars that point to keywords, where the function bindings invoke the funcallable keyword accessor directly.
      • We need to apply this as an implicit body for fn forms as well..

Persistent structures.

  • If we get defprotocol, deftype, etc. up and running, these are already defined in CLJS core.
  • For bootstrapping, using original port of Persistent Vector from JVM clojure, also creating a dumb wrapper on top.
    • Need to add meta field to struct, also atomic locks at some point (unless cljs provides this….)

Protocols.

  • Already implemented as generic functions.
  • Explore alternative implementations if generic functions aren’t speedy enough (doubtful, but haven’t profiled anything).
  • protocol definitions need to be namespace/package qualified.
    • Looks like they are already, at the package level.
  • Need better error messaging on missing protocols
    • TODO: get-protocol should signal.
    • Namespace qualified symbols needed.
      • Currently there’s a potential for collisions, since the protocol registry (a hashtable) doesn’t assoc the protocol name with the package name it was defined it.
      • We can approximate qualified symbols using the dot syntax, even though CL won’t resolve those symbols to a clojure namespace directly.
  • Get variadic protocol implementation verified.

Deftype.

  • shadows CL deftype.
  • current implementation follows defprotcol, appears to work for non-varidiac protocol impls.
  • Look at walking method impl bodies to eliminated unnecesary slot bindings (currently we generally bind all non-shadowed slots).

Multimethods.

  • Initial ideas for multiple dispatch following clojure’s implementation.

Metadata

  • Symbols and collections (anything that supports IMeta) can have a persistent map of metadata in clojure.
  • Used to communicate a lot of information, including type hinting, source code, documentation, etc.
  • Current expectation is to leverage property lists for symbols, and unify with generic functions behind a protocol.

Namespaces

  • CL has packages. Initial hack would be to leverage packages as an anlogue to ns.
  • Longer term solution is implement own ns system via objects.
    • Rather leverage CL where possible.
  • Currently implementation of def exports by default.
    • Looking at introspection possibilites for more easily tagging meta, arglists. custom function objects (experimental) are looking good for this.
    • Should we maintain a parallel collection of namespaces?
      • Based on def and derived forms, we ought to be able to hook in and register stuff.
      • Allows the reflection and introspection we care about / use in clojure.
  • Need to translate between require, refer, import (clojure) and import (cl).

Reader.

CL macros use , and ,@ in place of ~ and ~@ in Clojure.

We’ll need to either cook the common lisp reader, or build a separate clojure reader that will perform the appropriate replacements.

  • Looks like a simple tweak to the ` reader macro will suffice, we’ll just flip the symbols as appropriate.
  • TODO: quasitquoting in clojure (let []) inside of macros is not splicing, need to revisit quoted-children.
    • ex `[,’a] => [,’A] (incorrect)
    • This now appears to be working, via clclojure.readers:quasify.
    • Should be compatible if we switch reader macros for , and ,@ around too.

@ is a literal for #’deref in clojure, comma is whitespace in clojure.

  • Similar, we’ll flip the read-table.

[] denote vectors

  • already have a reader macro in reader.lisp
  • vectors read correctly and obey clojure eval semantics.
  • +There’s an incorrect read going on, [x 2] will currently read when it should throw since x is not defined.+
    • TODO: revisite quoted-children for vectors and the reader fn bracket-reader.
    • If we’re not reading a quoted vec, we need to eval args to persistent-vector ctor.
    • Quasiquoting works except for splicing.

{} denote maps

  • already have a reader macro in readers.lisp.
  • Janky but useful bootstrapping implementation in cowmap.lisp, which is a copy-on-write wrapper around a CL hashtable.
    • This should be enough to bootstrap, where clojure.core actually defines a persistent map implementation for us.
      • We can always opt for an optimized implementation if it makes sense to go for a built-in.
  • Similar problems with quasiquoting.

\#{} denote sets

  • Easy enough to add a dispatch in reader.lisp
  • Trivial implementation based on cowmap, pending.

^ denotes metadata for the reader

  • Trivial to implement as a reader macro, BUT, we need to get metadata working generally.
    • with-meta currently mutates symbol’s plists. Ideally we’d need a corresponding alter-var-root! to do that.
    • need to implement metadata slots for other clojure structures (data literals, persistent lists).

\c vs. #\c for chars

  • Added initial support for clj->cl, need to define printable representation for chars as well.
  • Current holdup is defining a print-method for STANDARD-CHAR, which claims the class doesn’t exist.
    • Looking at print-escape and friends to see if we can hack this. We may need our own printer, outside of the provided REPL.
  • defined a reader macro that coerces \c to #\c.
    • Maybe less useful for bootstrapping.

reading/printing booleans

  • Similar issues as characters. PAIP has a good chapter on this for similar issues with Scheme.

.- field access

  • wrap to calls for CLOS slot

ns.name/fn

  • detect / in symbol name, coerce to qualified internal symbol ns.name::fn

(.someMethod object)

  • .hello is a valid function name in CL…
  • Reader macro for .?
    • need to incorporate . form ala: (. obj someMethod)

::qualified-key

???
is used as a delimiter for package symbols in CL.
  • need to parse the symbol name and dispatch….
  • ::blah -> :BLAH for now…CL reader eliminates redundant colon.
  • Should be simple to modify the reader macro for :

(aget obj field) from cljs…

  • keep? This doesn’t work the same in clj jvm…

Keyword access

Keywords are applicable in Clojure. That means they show up in the function position in forms. This won’t jive in CL directly.

  • Possible option: reader macro for lists, detect presence of keyword in function call positition, if not quoted.
  • Replace keyword with a funcallable thing that has a print-method looking like a keyword?
  • Or try to hack eval (dubious).
  • custom read / eval / print….

Looks like we can just leverage he function namespace to get around this….keywords are “just” symbols….

  • (defun :a (m) (gethash :a m))
  • (defun (setf :A) (new-value m) (setf (gethash :A m) new-value))
  • (defparameter ht (make-hash-table))
  • (setf (:A ht) 3)

So the workaround is:

  • Need a reader macro lists.
  • If we see a keyword in the function position,
    • we define a function for the keyword via defun.
    • define a setf expander that provides hashmap access (alternately, something that’s not mutating).

Looks like that may work inside the existing ecosystem…. Significant progress / experimentation in clclojure.keyworfuncs. However, it’s looking like, to get “full” access (even with some tricky pseudo-keyword class, macros, and the above suff), we’re probably better off hacking eval / compile.

Destructuring.

This may be a bit tricky, although there are a limited number of clojure forms.

  • Given the goal is to read tools.analyzer and tools.reader, we may get by with an un-prettied version of the original source that elides destructuring (e.g. via macroexpansion on the CLJ side) into something simpler to digest for clclojure’s bootstrap.

Seq library.

This shouldn’t be too hard. I already have a lazy list lib prototype as well as generic functions for the basic ops. I think I’ll try to use the protocols defined in the clojurescript version as much possible, rather than baking in a lot of the seq abstraction in the host language like clojure does.

  • For simple bootstrapping, this if fine, but we already get all of this with the CLJS core implementation. So, get the minimums there and gain everything else….

Strings

CL strings are mutable (character arrays), clj/cljs are not…

  • Can we inherit from string to create an immutable type that outlaws setf?
  • I think most string operations (ala the sequence-based ones) return copies (which are mutable).
    • We can prevent setf in clojure mode, providing a pure API for string manipulation…
      • As well as impure….hmm

Regexes

Leverage CL-PPRCE

  • check for reader macro collisions….

Compilation / Analysis

Currently, we project everything to (portable) CL, and let the host do the dirty work for compilation/analysis.

  • Future, port clojure.tools.analyzer
    • Maybe use that for some of our efforts…
  • If we have enough ported, look at using clojure.tools.reader to help as well.

Usage

Currently just clone the repository. Assuming you have ASDF setup properly, you should be able to evaluate (require ‘clclojure) at your Lisp REPL and it’ll compile (if you’re in the project dir in lisp).

Alternately, you can compile and load the .asd file, then use quicklisp to load it from the repl. (ql:quickload :clclojure)

You currently get persistent vectors, literals, defprotocol, extend-protocol.

You can mess with the currently limited clojure evaluator in the :clclojure.base package, although I’m moving in a different direction now that I think I can explot CL better.

You can see rudimentary examples and usage in the :clclojure.example package (example.lisp). TODO: shift to named-readtables to keep clclojure from hijacking your session. Right now, we take over the REPL…..user beware!

I’m a bit new to building Common Lisp projects, so the packaging will likely change as I learn. At some point, if anything useful comes out of this experiment, I may see if it can get pushed to quicklisp.

License

Eclipse Public License, just like Clojure.

About

An experimental port of clojure to common lisp. Also some native common lisp implementations of clojure libraries, like seq, persistent vectors, etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published