Lobsters Interview with Claudius

Naver researcher Claudius, maintainer of LispE and TAMGU, discusses his career in symbolic AI and computational linguistics, including his work on the Xerox Incremental Parser and the limitations of rule-based language systems. He reflects on the futility of compressing language into rules despite achieving high-speed parsing and competition wins, leading to a shift toward neuro-symbolic approaches.

@Claudius https://lobste.rs/~Claudius maintains LispE https://github.com/naver/lispe and previously TAMGU https://github.com/naver/tamgu at Naver, combining array https://github.com/naver/lispe/wiki/5.3-A-la-APL and logic programming with Haskell features https://github.com/naver/lispe/wiki/5.4-A-la-Haskell . N.b. the wiki https://github.com/naver/lispe/wiki holds the documentation and articles. In this interview, we discuss Lisp and Prolog implementations, array languages, symbolic GOFAI https://en.wikipedia.org/wiki/GOFAI and neuro-symbolic AI. How did you discover programming, come to pursue a PhD etc.? It's not exactly a recent adventure; I started in 1980 when my father bought a computer for Christmas. Learning Basic, I faced a lot of problems because I didn't really speak English, which most of the documentation was written in. I spent a lot of time trying to understand what the set command did put a cell on the screen, on this machine. Then I learned to program the Z80 https://en.wikipedia.org/wiki/Zilog Z80 processor in machine language and decided to pursue computer science. I got a masters degree from Paris VI https://en.wikipedia.org/wiki/Pierre and Marie Curie University . In 1989, I moved to Montreal and started a PhD in computational linguistics. New symbolic ways of implementing grammars were really hot. I implemented a parser for my PhD thesis which was weird, because the rule system was not based on pure context-free grammars, but a set of categories which could appear on the right-hand side of your rule. People had been hunting for ways to speed this up and the solution was quite silly: Consider each category as a separate 64-bit vector, a long integer, and for each rule tag categories on the right, replacing the categories by their position on the bit vector where the value would become the index of the rule, so you could find the rule to apply from the index. From there, I was recruited by PARC's sister, the Xerox Research Centre Europe XRCE in Grenoble, where I still live. I spent 20 years with Xerox and another 10 with Naver who bought the lab https://www.news.xerox.com/news/NAVER-to-acquire-Xerox-Research-Centre-Europe . What's it like working as a researcher? You've published many papers, hold patents etc. which isn't a common route in software. Companies only evaluate researchers in 3 ways: Now, patents or intellectual property aren't what most people think of. In industry, they function as tokens traded between companies for access to other technologies. But their importance is decreasing. I've implemented a lot of software across these years. With my PhD in linguistics, I worked with linguists to speed tools up. As an example, on top of my PhD I wrote the Xerox Incremental Parser XIP https://www.slideserve.com/marla/xerox-incremental-parsing summarized here https://string.hlt.inesc-id.pt/wiki/XIP and implemented here https://github.com/clauderouxster/XIP which could parse 3,000 words per second. I've been dwelling on a comment of yours: I worked for more than 30 years on these systems and they could never work. I implemented a very fast NLP symbolic parser, for which the team I worked with created grammars for 8 languages, including Japanese. In 2007, with a grammar of 60,000 rules, we could parse at a speed of 3000 words/s see https://github.com/clauderouxster/XIP for the Open Source version . The parser could extract syntactic dependencies, and could use ontologies. But language is like sand, the more you try to grab, the more you leak. There was a kind offutility in trying to compress languages into rules, nothing actually scaled up. Still, we managed to win competitions as late as 2016 with SemEval sentiment analysis, and in 2017, we also ranked first in a legal document extraction campaign organized by IBM, but to no avail. It was a lot of work, and the conclusion was very simple. We had to push our grammars as far as possible into lexical grammars, which eventually LMMs managed to really implement. We discovered very early, that context was all that mattered. We tried to create grammars that would apply to a full paragraph instead of sentences, but then the performance would plummet. The reason why LLM work, is that at each step they compress the whole context into a meaningful vector, which they then used to guide the rest of the generation process. I spent my whole life in the pursuit of a perfect parser with very brilliant people, andI really find hard to say that not only did we fail, but that LLM is the response we were looking for. - Claude Roux It was a very nice journey, but sometimes you have to accept the reality of the world: Symbolic methods have been replaced. Even 10 years ago, I could only dream of the AI technologies we have today. I wanted to be able to talk to a computer, to create things just by speaking. Now, I'm trying to apply everything I learned to new methods. Talking about agents, I'm now implementing PREDIBAG https://github.com/naver/lispe/wiki/6.21-PREDIBAG , a retrieval-augmented generation https://en.wikipedia.org/wiki/Retrieval-augmented generation system RAG , in LispE to help harness/constrain LLMs. It's deterministic with restricted unification. So, XIP had dependency rules, connecting nodes from a tree into a dependency like a subject or direct-object dependency in the sentence tree. These were implemented on a first order logic engine, which inspired me to add Prolog to TAMGU https://github.com/naver/tamgu . But this was problematic and complicated, with Prolog's true unification process. Prolog has a problem: The closed-world problem. You can only deal with information in the environment, in the knowledge base. So I wanted something simpler with unification, but also back tracking. Now, compiling any language, you start with an AST and Lisp is already an AST... TAMGU's grammar required 400 rules for for , for while , to instantiate a variable... The more you want to add and experiment, the more the grammar becomes impossible to manage. This is Python's problem. They have to concoct ever weirder notations to fit more features in. But you can do anything in Lisp without modifying the parser at all. You just open a parenthesis, use a function and that's it Some complain about parentheses, but most languages are just sugar coating on top, some complicated program translating another syntax into the AST. So, I wanted to show someone how TAMGU worked and thought Lisp would be a clearer way... And I discovered the joy of Lisp again I can experiment with APL https://github.com/naver/lispe/wiki/5.3-A-la-APL or Haskell https://github.com/naver/lispe/wiki/5.4-A-la-Haskell in LispE, whatever I want, because I don't have to deal with extra formalisms What sort of projects did you work on before TAMGU? Well, it emerged directly from XIP. We had a 20 linguist team building grammars for English with 60,000 rules , French, Spanish, Japanese... But corpora were terrible back then, with distinct encodings etc. At first, I integrated the Python interpreter into XIP, with rules calling Python functions. Unfortunately, Python's very strict about mixing encodings with a string and would fail all the time, causing me to develop a language just to solve this... And scope crept until you could build rules on the fly or execute them on top of a grammar. I extracted this language from XIP, rewrote the interpreter a few times, renamed it a few times... In 2020, I wanted to incorporate PyTorch and got a trainee, who needed to know how TAMGU worked, leading me to LispE... In TAMGU, every instruction and data structure is an object an instance of a C++ object , derived from the same class to live in the same vectors. Every function and every data structure has its own eval . Going further back, when I started my PhD, Prolog was the way to work with grammars. But its inventor Alain Colmerauer was a friend of my PhD supervisor, so I discovered it from the implementation side first. I learned many tricks from them like indexing rules on the first argument. When you describe your rules, the first argument becomes an index. When you try to execute a new predicate, you try to see if you can use the predicate's first argument to find out which predicate to test based on the index . When dealing with language, the first argument would often be a word or category an atom or string ; indexing on them speeds selection up a lot. The knowledge base was also implemented with indexes in the background, so instead of trying every element in your knowledge base, you'd only try the ones indexed on the word you were looking at. WordNet https://en.wikipedia.org/wiki/WordNet is an interesting corpus with its own inefficient Prolog implementation; someone said it took about 2 minutes to load it into SWI-Prolog but using this technique I could load it into TAMGU in a few seconds I don't use RDF and public knowledge bases much anymore though. I really loved implementing stuff in Prolog but unfortunately, Prolog couldn't efficiently handle my idea of associating every category with a position in a bit vector. I already worked on PREDIBAG with TAMGU cf. the wiki https://github.com/naver/tamgu/wiki/3.4-PREDIBAG:--Building-Modern-AI-Agents-in-Tamgu's-Prolog reaching 98% accuracy for the GSM8K https://huggingface.co/datasets/openai/gsm8k math dataset with Prolog and a model only able to reach 60%. The Prolog program would ask an LLM to create then answer a new question creating knowledge before using capabilities , then output a Python program and test whether it outputs the dataset's expected values. PREDIBAG was all about using predicates to explore the implicit graph computed by the rules themselves. I'm now trying to bring this to the browser via LispE with its lighter, simpler rules. It's such a nice way of working with rules; backtracking is very powerful. It means that you have a single entry point the name of your predicate with different functions sharing that same name, which the system would sequentially try. To enrich them, you just add a new rule/implementation instead of if else hell. How does LispE fit into the lab and your efforts? I give regular presentations to the other researchers, but I have other tasks I get evaluated for. You have to remember you're getting paid by the company, so the trick's matching the company's goals with my own experiments. I made a proposal and management let me work on PREDIBAG. In the past, the goal was mostly machine translation. After the fall of the Berlin wall, machine translation seemed like the solution to welcoming new countries. I must say I hadn't come up with the best solution, but today machine translation's almost solved. In the past I played with GOFAI grammars between Esperanto and Interlingua, two conlangs with regularized grammar. You made a conlang Lingvata with case endings etc. as a translation target/assist for machine translation. How did that go? At XRCE, we made finite-state lexical transducers https://www.redalyc.org/pdf/5157/515751735044.pdf from dictionaries e.g. from English to French , which were quite compact. LispE has a transducer library https://github.com/naver/lispe/wiki/5.15-Transducer . I studied Latin at school and thought declension marking subject, object etc. at the end of the word would help here. In Spanish, for example, you don't have to use pronouns because the ends of the verb carry the meaning. If the transducer could systematically identify the attributes of a word based on its ending... So I made a system with XIP to generate a Lingvata sentence These transducers are automatons, graphs. A lot of stuff is common to many branches of the graph, which you want to merge, compressing the lexicon into less than a MB. In the LispE transducer directory, you can create a document with surface and lemma forms. Giving that to the transducer compiler, you get a compressed system you can use throughout the library I have transducers for many natural languages and can parse a sentence, returning e.g. man plural noun for "men". How did you implement LispE's APL/array features? I was really happy to see function conforming where you can e.g. + ' 1 2 3 ' 1 2 3 3 . In 1984, I was studying computer science and had Yves Escoufier https://imag.umontpellier.fr/YvesEscoufier/ as my statistics professor. The largest factory in Montpellier was an IBM factory which collaborated with Escoufier to implement his statistical methods in APL. I joined that team Now, we were implementing this on a new computer with a floating-point chip So I implemented matrix multiplication in assembly for its APL version. It was possible to make some very interesting programs on top of it and I kept it in the back of my mind. Implementing LispE, I replaced linked lists with vectors via indirection: A list is a pointer to a buffer, shareable by different lists. When sharing a buffer, you can have your own offset start of the buffer. Since I had those arrays, I investigated APL operators like rho, reduce etc. which were more complicated to implement, than I expected. Eh... Cabuchon It's my cat. He's playing with... Oh no... Every year, I try to do some Advent of Code and with the APL operators, many problems become trivial. Rho, rank, iota https://github.com/naver/lispe/blob/457a5938807ae1872d098d5f609672bf2d8c5d80/examples/AdventOfCode2021/day13.lisp L16 are so useful Because it's Lisp and you don't have to deal with specific formalisms, this is all relatively easy. I did the Game of Life https://github.com/naver/lispe/wiki/6.20-Conway-Game-of-Life-in-LispE in this APL-Lisp too. You implemented LispE in C++ with classes, what inspired this approach? Because C++ provides you with vtables, which let you make an eval function for every different class. There's an isomorphism between Lisp and the subset of C++ I use. Instead of trying to implement something complicated, I tried to leverage vtables as much as possible. I implemented a subset of Python https://github.com/naver/lispe/wiki/6.22-Transpiling-Python-into-LispE in LispE to execute Python within LispE, so I understand Python well. When Python was first implemented around 1990, C++ wasn't exactly a thing. So Guido reinvented the vtable; Python uses its own inefficient vtable-thing. But modern c++ has so many useful features for handling strings, vectors etc. so you don't have to reroll them yourself. I do try to steal features from elsewhere too. I find Rust's notion of borrowing very interesting and inspired by this, in the LispE Torch https://github.com/naver/lispe/tree/master/lispetorch implementation, you can check a structure e.g. a list of integers and transform it into a tensor automatically with 0 copy. What other sources of inspiration have you found? You have a la APL, Haskell... Prolog. Lisp, of course. The naming conventions like setq come from The Roots of Lisp https://www.paulgraham.com/rootsoflisp.html . ontology https://github.com/naver/lispe/wiki/5.11-Ontologies stuff based on bit-vectors. Many people work on cool things and I understand things better when I implement them myself. I also learned from others' mistakes. The Python API is extremely complex requiring heroic effort to tame , so the core API is very simple: Just isolate description of your function a string with a pointer to an object and it just works. LibTorch is a gorgeous API, a real work of art. When I started wrapping it in LispE I was amazed by its quality. I discovered LispE from an absolutely amazing video showing off its awesome shell capabilities. I presume this is your day to day shell. How do you develop APIs and chisel them to perfection? Through a lot of pain. I've used many APIs in my life and my personal adage is "never trap yourself". Implementing complicated things, we often hit a point where we suddenly don't understand what we're doing anymore, stuck in a complicated, intertwined blob of code... I've gone through that a lot. I often have the feeling that someone else wrote those messes, but in fact it was just another me with less experience... So I try to keep things simple. "Will I be able to read it tomorrow?" Now, the shell interface was very complicated. I didn't want to use curses because it doesn't work on all platforms. I don't mean to disparage Python at all, but I was very dissatisfied with the Python REPL; after entering some functions and reaching a happy result, I found myself wanting to create a file out of it. So you can do lispe -e <file name to start an editor within the lispe interface, place breakpoints etc. It's really simple, but I love being able to just type ls to execute UNIX commands or v=ls -l to bind ls -l 's output to v . I just wanted to do something for myself, to create something I want to use every day. An example of how I try not to trap myself is with naming. There's a function link which lets you rename almost anything e.g. link 'plus + . let also works. But in lispe.cxx https://github.com/naver/lispe/blob/85be18784f0347be47256d8f2c8443a51b72e8c8/src/lispe.cxx L378 the entrance file to the whole language set instruction links surface bindings with numerical IDs which the compiler actually uses . So you can simply change the names there and recompile LispE with new internal and external names. What do you think about neuro-symbolic AI? Verifiable rewards have become quite popular, where the system generates a solution verified by an external program, not just looking up the value in a table of outputs but actually executing code. It's trendy to throw MCP everywhere exchanging data with an external server. In the case of PREDIBAG https://github.com/naver/lispe/wiki/6.21-PREDIBAG , instead of the LLM making all decisions, you can use an intermediary layer implemented with defpred or LispE's pattern matching rules tagging the decision to send something to an LLM with a callback function to check whether the output is valid, delegate execution to other components etc. If the system's trying to e.g. import curl, you can already intercept it there. And that's something you can only do with rules. Many people are using other LLMs to "verify" others, which carries no guarantee. In industry, you need guarantees that something won't crash your whole system and a traceable system. Today, the browser offers maximum security handling auth, token IDs etc. and you can use LispE https://github.com/naver/lispe/wiki/6.17-A-WebAssembly-version-of-LispE as a WASM library too. The idea's that instead of running Python code in a Docker container etc. so you can just execute code in the browser sandbox directly. Where do you think classic symbolic logic's still a good fit? For low memory text generation, I like generating via grammars https://github.com/naver/lispe/blob/master/examples/patterns/dcgfr.lisp . Relatedly, LispE supports unicode and can e.g. rename everything in Greek https://github.com/naver/lispe/wiki/6.15-Programming-is-Greek-to-me...-Literally . Nowadays, LLMs can generate such a grammar for you but in the past it took a lot of time and linguists. When working on the Japanese grammar, they were very surprised to be able to use Japanese and quickly began to employ Japanese for dependency names, functions, categories etc. and I couldn't read a single thing You started with Basic, how did you learn C ++ ? At the time of my PhD, C++ was also the only way to access Mac's graphical environment, so I started to learn it then decided to implement my parser in it too. Of course, I avoid nightmares like multiple inheritance in C++, but it's improved a lot over the years. When implementing TAMGU, I was able to reduce the interpreter's complexity step by step, by removing C++ features until it became a Lisp. It's always the same story: Lisp is just an AST. If you keep the AST live in your C++ code, transform it into an evaluation tree, don't try to implement it as byte code trendy in the 90s but keep a simple tree where each element will evaluate itself, you end up with something so simple and efficient In fact, you're just compiling stuff into C++ instances which execute as fast as C++. There's a real elegance in Lisp here. To illustrate this another way, the output of a LLM transformer is a list of probabilities probable tokens , one of which you select. A complicated language like C++ or Rust have more complicated probability distributions than Lisp, where the fewer tokens following an open parenthesis constitute lower entropy. You can have an opening parenthesis, token or closing parenthesis, that's it. How do you leverage LLMs these days? LLMs have a lot of knowledge but few competencies. If you constrain them to output knowledge and use that to further constrain results, you'll go far. For context management, I have the system generate log.jsonl and log.py which queries the other document . Whenever an action is processed an error's corrected etc. the system adds something to log.jsonl . If it needs to know what happens, it uses log.py to query and display only the relevant/required information like a date, errors or attempted fixes reducing tokens. I use LLMs to generate JS UIs, but core LispE doesn't have any AI-generated code because I know the code by heart and LLMs would butcher it and work slower than me. With infinite time, what would you like to add to LispE? I don't see any ways of improving LispE performance right now; I need external eyes to help. After dealing with the same code over and over, it becomes difficult to push through the fog and find new ideas. The reason LispE's faster than Python is very simple: I provide a list of values. In Python, you only have a list of pointers or a dictionary of pointers; you have to use NumPy if you want vector values, but it's not user friendly. When using Python, you're working in 2 worlds at the same time. On the one hand, you have the virtual machine with tokens and bytecode. On the other hand, some C++ library will have no relation to that, making it really hard to understand what's going on, because in fact a large part of it is not actually accessible. In LispE, it will derive everything like the language itself. LispE's list of values are a good example - to release a list of pointers in Python, you must traverse the list and release each element individually while in LispE you can just delete the list and that's it. If something in Python was created by an external library, you have to delete the PyObject and the object within it. When creating a Python library or wrapper, for each object you create, you must make a table implementing the pointers to the functions to delete or create elements. I think this is a good way to validate LLM "thinking" before exploring a path further. PREDIBAG is my main focus at the moment. You have to forgive an old man, but I love the poetic justice of bringing Prolog back to AI.