Lobsters Interview with Claudius Naver researcher Claudius, maintainer of LispE and TAMGU, discusses his career in symbolic AI and computational linguistics, including his work on the Xerox Incremental Parser and the limitations of rule-based language systems. He reflects on the futility of compressing language into rules despite achieving high-speed parsing and competition wins, leading to a shift toward neuro-symbolic approaches. @Claudius https://lobste.rs/~Claudius maintains LispE https://github.com/naver/lispe and previously TAMGU https://github.com/naver/tamgu at Naver, combining array https://github.com/naver/lispe/wiki/5.3-A-la-APL and logic programming with Haskell features https://github.com/naver/lispe/wiki/5.4-A-la-Haskell . N.b. the wiki https://github.com/naver/lispe/wiki holds the documentation and articles. In this interview, we discuss Lisp and Prolog implementations, array languages, symbolic GOFAI https://en.wikipedia.org/wiki/GOFAI and neuro-symbolic AI. How did you discover programming, come to pursue a PhD etc.? It's not exactly a recent adventure; I started in 1980 when my father bought a computer for Christmas. Learning Basic, I faced a lot of problems because I didn't really speak English, which most of the documentation was written in. I spent a lot of time trying to understand what the set command did put a cell on the screen, on this machine. Then I learned to program the Z80 https://en.wikipedia.org/wiki/Zilog Z80 processor in machine language and decided to pursue computer science. I got a masters degree from Paris VI https://en.wikipedia.org/wiki/Pierre and Marie Curie University . In 1989, I moved to Montreal and started a PhD in computational linguistics. New symbolic ways of implementing grammars were really hot. I implemented a parser for my PhD thesis which was weird, because the rule system was not based on pure context-free grammars, but a set of categories which could appear on the right-hand side of your rule. People had been hunting for ways to speed this up and the solution was quite silly: Consider each category as a separate 64-bit vector, a long integer, and for each rule tag categories on the right, replacing the categories by their position on the bit vector where the value would become the index of the rule, so you could find the rule to apply from the index. From there, I was recruited by PARC's sister, the Xerox Research Centre Europe XRCE in Grenoble, where I still live. I spent 20 years with Xerox and another 10 with Naver who bought the lab https://www.news.xerox.com/news/NAVER-to-acquire-Xerox-Research-Centre-Europe . What's it like working as a researcher? You've published many papers, hold patents etc. which isn't a common route in software. Companies only evaluate researchers in 3 ways: Now, patents or intellectual property aren't what most people think of. In industry, they function as tokens traded between companies for access to other technologies. But their importance is decreasing. I've implemented a lot of software across these years. With my PhD in linguistics, I worked with linguists to speed tools up. As an example, on top of my PhD I wrote the Xerox Incremental Parser XIP https://www.slideserve.com/marla/xerox-incremental-parsing summarized here https://string.hlt.inesc-id.pt/wiki/XIP and implemented here https://github.com/clauderouxster/XIP which could parse 3,000 words per second. I've been dwelling on a comment of yours: I worked for more than 30 years on these systems and they could never work. I implemented a very fast NLP symbolic parser, for which the team I worked with created grammars for 8 languages, including Japanese. In 2007, with a grammar of 60,000 rules, we could parse at a speed of 3000 words/s see https://github.com/clauderouxster/XIP for the Open Source version . The parser could extract syntactic dependencies, and could use ontologies. But language is like sand, the more you try to grab, the more you leak. There was a kind offutility in trying to compress languages into rules, nothing actually scaled up. Still, we managed to win competitions as late as 2016 with SemEval sentiment analysis, and in 2017, we also ranked first in a legal document extraction campaign organized by IBM, but to no avail. It was a lot of work, and the conclusion was very simple. We had to push our grammars as far as possible into lexical grammars, which eventually LMMs managed to really implement. We discovered very early, that context was all that mattered. We tried to create grammars that would apply to a full paragraph instead of sentences, but then the performance would plummet. The reason why LLM work, is that at each step they compress the whole context into a meaningful vector, which they then used to guide the rest of the generation process. I spent my whole life in the pursuit of a perfect parser with very brilliant people, andI really find hard to say that not only did we fail, but that LLM is the response we were looking for. - Claude Roux It was a very nice journey, but sometimes you have to accept the reality of the world: Symbolic methods have been replaced. Even 10 years ago, I could only dream of the AI technologies we have today. I wanted to be able to talk to a computer, to create things just by speaking. Now, I'm trying to apply everything I learned to new methods. Talking about agents, I'm now implementing PREDIBAG https://github.com/naver/lispe/wiki/6.21-PREDIBAG , a retrieval-augmented generation https://en.wikipedia.org/wiki/Retrieval-augmented generation system RAG , in LispE to help harness/constrain LLMs. It's deterministic with restricted unification. So, XIP had dependency rules, connecting nodes from a tree into a dependency like a subject or direct-object dependency in the sentence tree. These were implemented on a first order logic engine, which inspired me to add Prolog to TAMGU https://github.com/naver/tamgu . But this was problematic and complicated, with Prolog's true unification process. Prolog has a problem: The closed-world problem. You can only deal with information in the environment, in the knowledge base. So I wanted something simpler with unification, but also back tracking. Now, compiling any language, you start with an AST and Lisp is already an AST... TAMGU's grammar required 400 rules for for , for while , to instantiate a variable... The more you want to add and experiment, the more the grammar becomes impossible to manage. This is Python's problem. They have to concoct ever weirder notations to fit more features in. But you can do anything in Lisp without modifying the parser at all. You just open a parenthesis, use a function and that's it Some complain about parentheses, but most languages are just sugar coating on top, some complicated program translating another syntax into the AST. So, I wanted to show someone how TAMGU worked and thought Lisp would be a clearer way... And I discovered the joy of Lisp again I can experiment with APL https://github.com/naver/lispe/wiki/5.3-A-la-APL or Haskell https://github.com/naver/lispe/wiki/5.4-A-la-Haskell in LispE, whatever I want, because I don't have to deal with extra formalisms What sort of projects did you work on before TAMGU? Well, it emerged directly from XIP. We had a 20 linguist team building grammars for English with 60,000 rules , French, Spanish, Japanese... But corpora were terrible back then, with distinct encodings etc. At first, I integrated the Python interpreter into XIP, with rules calling Python functions. Unfortunately, Python's very strict about mixing encodings with a string and would fail all the time, causing me to develop a language just to solve this... And scope crept until you could build rules on the fly or execute them on top of a grammar. I extracted this language from XIP, rewrote the interpreter a few times, renamed it a few times... In 2020, I wanted to incorporate PyTorch and got a trainee, who needed to know how TAMGU worked, leading me to LispE... In TAMGU, every instruction and data structure is an object an instance of a C++ object , derived from the same class to live in the same vectors. Every function and every data structure has its own eval . Going further back, when I started my PhD, Prolog was the way to work with grammars. But its inventor Alain Colmerauer was a friend of my PhD supervisor, so I discovered it from the implementation side first. I learned many tricks from them like indexing rules on the first argument. When you describe your rules, the first argument becomes an index. When you try to execute a new predicate, you try to see if you can use the predicate's first argument to find out which predicate to test based on the index . When dealing with language, the first argument would often be a word or category an atom or string ; indexing on them speeds selection up a lot. The knowledge base was also implemented with indexes in the background, so instead of trying every element in your knowledge base, you'd only try the ones indexed on the word you were looking at. WordNet https://en.wikipedia.org/wiki/WordNet is an interesting corpus with its own inefficient Prolog implementation; someone said it took about 2 minutes to load it into SWI-Prolog but using this technique I could load it into TAMGU in a few seconds I don't use RDF and public knowledge bases much anymore though. I really loved implementing stuff in Prolog but unfortunately, Prolog couldn't efficiently handle my idea of associating every category with a position in a bit vector. I already worked on PREDIBAG with TAMGU cf. the wiki https://github.com/naver/tamgu/wiki/3.4-PREDIBAG:--Building-Modern-AI-Agents-in-Tamgu's-Prolog reaching 98% accuracy for the GSM8K https://huggingface.co/datasets/openai/gsm8k math dataset with Prolog and a model only able to reach 60%. The Prolog program would ask an LLM to create then answer a new question creating knowledge before using capabilities , then output a Python program and test whether it outputs the dataset's expected values. PREDIBAG was all about using predicates to explore the implicit graph computed by the rules themselves. I'm now trying to bring this to the browser via LispE with its lighter, simpler rules. It's such a nice way of working with rules; backtracking is very powerful. It means that you have a single entry point the name of your predicate with different functions sharing that same name, which the system would sequentially try. To enrich them, you just add a new rule/implementation instead of if else hell. How does LispE fit into the lab and your efforts? I give regular presentations to the other researchers, but I have other tasks I get evaluated for. You have to remember you're getting paid by the company, so the trick's matching the company's goals with my own experiments. I made a proposal and management let me work on PREDIBAG. In the past, the goal was mostly machine translation. After the fall of the Berlin wall, machine translation seemed like the solution to welcoming new countries. I must say I hadn't come up with the best solution, but today machine translation's almost solved. In the past I played with GOFAI grammars between Esperanto and Interlingua, two conlangs with regularized grammar. You made a conlang Lingvata with case endings etc. as a translation target/assist for machine translation. How did that go? At XRCE, we made finite-state lexical transducers https://www.redalyc.org/pdf/5157/515751735044.pdf from dictionaries e.g. from English to French , which were quite compact. LispE has a transducer library https://github.com/naver/lispe/wiki/5.15-Transducer . I studied Latin at school and thought declension marking subject, object etc. at the end of the word would help here. In Spanish, for example, you don't have to use pronouns because the ends of the verb carry the meaning. If the transducer could systematically identify the attributes of a word based on its ending... So I made a system with XIP to generate a Lingvata sentence These transducers are automatons, graphs. A lot of stuff is common to many branches of the graph, which you want to merge, compressing the lexicon into less than a MB. In the LispE transducer directory, you can create a document with surface and lemma forms. Giving that to the transducer compiler, you get a compressed system you can use throughout the library I have transducers for many natural languages and can parse a sentence, returning e.g. man plural noun for "men". How did you implement LispE's APL/array features? I was really happy to see function conforming where you can e.g. + ' 1 2 3 ' 1 2 3 3 . In 1984, I was studying computer science and had Yves Escoufier https://imag.umontpellier.fr/YvesEscoufier/ as my statistics professor. The largest factory in Montpellier was an IBM factory which collaborated with Escoufier to implement his statistical methods in APL. I joined that team Now, we were implementing this on a new computer with a floating-point chip So I implemented matrix multiplication in assembly for its APL version. It was possible to make some very interesting programs on top of it and I kept it in the back of my mind. Implementing LispE, I replaced linked lists with vectors via indirection: A list is a pointer to a buffer, shareable by different lists. When sharing a buffer, you can have your own offset start of the buffer. Since I had those arrays, I investigated APL operators like rho, reduce etc. which were more complicated to implement, than I expected. Eh... Cabuchon It's my cat. He's playing with... Oh no... Every year, I try to do some Advent of Code and with the APL operators, many problems become trivial. Rho, rank, iota https://github.com/naver/lispe/blob/457a5938807ae1872d098d5f609672bf2d8c5d80/examples/AdventOfCode2021/day13.lisp L16 are so useful Because it's Lisp and you don't have to deal with specific formalisms, this is all relatively easy. I did the Game of Life https://github.com/naver/lispe/wiki/6.20-Conway-Game-of-Life-in-LispE in this APL-Lisp too. You implemented LispE in C++ with classes, what inspired this approach? Because C++ provides you with vtables, which let you make an eval function for every different class. There's an isomorphism between Lisp and the subset of C++ I use. Instead of trying to implement something complicated, I tried to leverage vtables as much as possible. I implemented a subset of Python https://github.com/naver/lispe/wiki/6.22-Transpiling-Python-into-LispE in LispE to execute Python within LispE, so I understand Python well. When Python was first implemented around 1990, C++ wasn't exactly a thing. So Guido reinvented the vtable; Python uses its own inefficient vtable-thing. But modern c++ has so many useful features for handling strings, vectors etc. so you don't have to reroll them yourself. I do try to steal features from elsewhere too. I find Rust's notion of borrowing very interesting and inspired by this, in the LispE Torch https://github.com/naver/lispe/tree/master/lispetorch implementation, you can check a structure e.g. a list of integers and transform it into a tensor automatically with 0 copy. What other sources of inspiration have you found? You have a la APL, Haskell... Prolog. Lisp, of course. The naming conventions like setq come from The Roots of Lisp https://www.paulgraham.com/rootsoflisp.html . ontology https://github.com/naver/lispe/wiki/5.11-Ontologies stuff based on bit-vectors. Many people work on cool things and I understand things better when I implement them myself. I also learned from others' mistakes. The Python API is extremely complex requiring heroic effort to tame , so the core API is very simple: Just isolate description of your function a string with a pointer to an object and it just works. LibTorch is a gorgeous API, a real work of art. When I started wrapping it in LispE I was amazed by its quality. I discovered LispE from an absolutely amazing video showing off its awesome shell capabilities. I presume this is your day to day shell. How do you develop APIs and chisel them to perfection? Through a lot of pain. I've used many APIs in my life and my personal adage is "never trap yourself". Implementing complicated things, we often hit a point where we suddenly don't understand what we're doing anymore, stuck in a complicated, intertwined blob of code... I've gone through that a lot. I often have the feeling that someone else wrote those messes, but in fact it was just another me with less experience... So I try to keep things simple. "Will I be able to read it tomorrow?" Now, the shell interface was very complicated. I didn't want to use curses because it doesn't work on all platforms. I don't mean to disparage Python at all, but I was very dissatisfied with the Python REPL; after entering some functions and reaching a happy result, I found myself wanting to create a file out of it. So you can do lispe -e