Bringing a dead Spring Boot project back to life with Claude

wpnews.pro

Spring Data Solr stopped active development in 2020 and was archived in 2023. Three years later and some pairing with Claude, its now back from the attic.

Github Link: https://github.com/tomaytotomato/spring-data-solr

If you are using or have used Apache Solr, you will know its not the "coolest" of search engines out there. However it is still being actively developed, is very fast, durable and is easy to use for querying documents.

Some people are even surprised to find out that it uses the same Lucene indexing engine as its newer kid-on-the-block rival, Elasticsearch.

Having used Solr for several projects in my career, I enjoy its powerful search and faceting capabilities. The real power comes from two things working together: a well-defined schema that shapes how your data is indexed, and the Lucene query syntax underneath that lets you slice it in almost any direction with a variety of filters.

For those previous projects I was providing this search capability through a Java backend service using the Spring Framework.

Without going into the nitty gritty of how Spring framework and Spring Boot works; it essentially is a belt and braces dependency management framework that lets you integrate with a whole variety of infrastructure. You can then express that data to a client with REST, GraphQL, RPC, SOAP (who still uses that?), CLI etc.

A lot of the ease of integration comes with Spring Boot's starter libraries for these requirements. Just add your dependencies and it wires them up with minimal config.

You can have a play around here - https://start.spring.io/

For a long time there was a starter library for Solr but it stopped being actively developed in 2020 and eventually was doomed to the Spring Attic in 2023.

After a little hacky project and detour into Spring Boot starter library development, could the Spring Data Solr library be brought back into life.

Personal Hacky Project #

As developers we love to hack away at stuff. Most days I have random ideas or things I want to build at work or in my days off. Unfortunately time is becoming a more restricted and precious commodity in my 30s.

LLMs have turned up though to offer you those chances to scratch those intellectual itches and test out some ideas that would normally take hours and hours and result in a half done Github project and the unanswered question of, "what if?".

My most recent intellectual itch was to make a search engine to index and store to obscure local planning documents (public domain). Extracting the metadata and other interesting data from them into an index to be easily searched.

The motivation behind this was that a lot of local council planning websites in the UK have terrible search features and are often limited to things like address or date. See this example of Manchester City council's website.

It has been years since I used Solr, so it seemed like a good opportunity to try out its latest version. However when it came to search for documentation on the best way to integrate it with Spring Boot, I was presented with this nice repository, all archived and gathering dust.

Fair enough, not a major blocker, I can just use the SolrJ client and handle my own search API calls it came time to throw together a search API service using Spring Boot, it felt very lacking in that there was no Spring Data Solr library I could just add and have a nice repository abstraction and middleware layer.

This left me with my next best port of call which was to use the SolrJ Java client to execute my queries to Solr. This would mean hand rolling my own repositories and interfaces.

This was easy enough to do and the proof of concept worked. When looking at the codebase though you notice that there is a lot of boilerplate.

However if the Spring Data Solr library was still alive and kicking it would have looked more like this:

So it begs the question, couldn't we get Claude to build us a new Spring Data Solr starter library?

What version? #

The last time I used Solr was with Spring Boot 2.4.x which was back in 2019/20; which is a long long time ago in the Spring ecosystem.

With six years passing, Spring boot has now reached version 4.x.x and the underlying Spring framework is onto version 7. With its release velocity not slowing anytime soon, it seemed pointless to entertain making a new Spring Data starter library for Solr in anything other than the latest versions.

At the same time Solr had moved on as well as a search engine with its release of version 10.

Fair enough lets just keep it anchored at those versions.

MVP Functionality #

Spring Data libraries all have features in common that are very handy for a Spring developer to use:

CrudRepository interfaces that provide useful methods for getting started with (findById, delete, update, create, findAll)
Derived query methods: Spring Data infers what query to construct based on the method name e.g findByTitleContaining(String keyword) @Query

: for more involved queries that can't be handled by derived query methods.@Query("title:?0 AND author:?1")

Criteria API: fluent Java API for building queries programmatically

Criteria.where("title").contains("spring").and("price").between(10, 50)

Pagination and sorting: with Pageable

support, a must for any dataset greater than 100. - Auto-configuration: add the dependency, set the config values and you have access to Solr via repositories.

So for this experiment to revive a library, we would need all these pieces of functionality present and working.

Claude Recon #

Before we let Claude or any agents loose with a PLAN.md file it was worth doing some reconnaissance of the area.

What was the state of the existing Spring Data Solr library?
What un-merged and unresolved issues were there on GitHub?
What significant API changes have happened in Spring framework and Spring boot?
What features are available in the newest version of Apache Solr?

Claude was excellent at this, especially using a swarm agentic pattern to research multiple disparate sources of information and tie it together. I would recommend playing around with swarms, this one was quite useful.

Some examples of the insights it gave were:

8 unresolved bugs in the original Spring Data Solr project; they should be fixed or avoided by new approaches
A community fork existed but only supported Solr versions < 9
SolrJ library was split into several artifacts in its latest version compared to older ones solr-solrj

,solr-solrj-jetty

,sollr-solrj-zookeeper

Springs health and metrics tooling had changed considerably since v2.x.x to v4.x.x

This automated foraying into digital archaelogy gave a primer in what would be involved in getting a v.001 implementation off the ground. It was also a relief to see that some issues that affected the older Spring Data Solr library could be bypassed in the newest versions of Spring Data.

The final step was to find a template or archetype to be our guide when assembling this new library. For certain Claude has been trained on thousands of Java library repos and has opinions and ideas on how to structure one, but I didn't want to stray away from the Spring way of doing things. So I opted for a combination of using a mature real library from ElasticSearch and this relatively up to date Spring Boot starter template provided by ericus20.

Do the build dance #

It was time to then build it and a cunning PLAN.md was hatched, which went roughly like this.

Features to build (essential):
  
  1. Multi-module Maven project — autoconfigure, starter, sample
  2. SolrProperties binding spring.solr.* config
  3. SolrAutoConfiguration creating HttpJdkSolrClient (standalone) or CloudSolrClient (SolrCloud)
  4. SolrTemplate — collection-aware CRUD and query execution wrapping SolrJ
  5. Criteria API — fluent query builder converting to native SolrQuery
  6. SolrRepository<T> — Spring Data repository abstraction with @EnableSolrRepositories
  7. Derived query methods via Spring Data PartTree → SolrQueryCreator
  8. @Query annotation support for raw Solr query strings
  9. SolrHealthIndicator — collection ping with admin endpoint fallback
  10. Highlighting, faceting, and cursor-based deep paging on top of the query layer
  11. Testcontainers integration tests against real Solr 9 and Solr 10
  12. Spring Boot auto-configuration — zero XML, annotation-only setup

Then it was time to release several agents at the implementation and check in at several stages.

I opted for a TDD agent and developer approach with writing unit and integration tests in a style I like. Then following up with an agent to do the implementation.

Most features were small enough that I could then review them locally or on Github.

Within a short period of time, a first version was ready to try out; with auto-configuration, SolrTemplate

, Criteria API, partial update, pagination, Spring Data repository support, and Testcontainers.

Gathering Feedback #

After playing around with the app locally it looked good to me, but I didn't have any external reviewers I could grab a hold of easily.

What if I could simulate some reviewers to check through the library with a bias or niche area of interest, could Claude express and simulate that well.

I decided to simulate two very important people in the Spring community and get Claude to output code review files.

Rod Johnson; the creator of Spring

"This is a well-structured, cleanly implemented Spring Boot starter that correctly uses Spring Data Commons' extension points and Boot's auto-configuration conventions. It is substantially more complete than I expected for a 0.1.0 artefact."

"Verdict: Not ready for Maven Central as-is. Fix the injection vulnerability, the mutating count(), the broken CrudRepository contract, and remove getSolrClient() from the interface. Then this is a legitimate 0.1.0."

Josh Long; Spring dev and evangelist

"Oh my goodness, where do I begin. I was handed a link to this repository and my first reaction — after reading the name — was an involuntary grin. 'Spring Data Solr Lazarus.' The audacity. The drama. I love it."

"Would I tweet about this? Yes — once the injection issue is fixed. Would I demo it? Absolutely."

This gave some really useful feedback to start building out features and fixing the library so it could become more serious than a two hour hack session with Claude.

After a few more days of tweaking I then reached out to the original creators of the Spring Data Solr library. They gave some pointers on the structure

Align the template and repository layer with MongoDB or Cassandra rather than using the Elastic or Solr modules as a guide, and draw freely from the other Spring Data implementations to make it feel native to the ecosystem.

It was also interesting to hear from them on why the original was archived

Solr's habit of breaking compatibility with each major upgrade made it impossible to give the module the attention it needed alongside the rest of the Spring Data portfolio.

This gave a nice confirmation that we were on the right tracks, just a few more tweaks.

Hardening the rough edges #

The following weeks consisted of an iterative approach, reviewing code with Claude, creating Github issues and then creating PRs to review those fixes and features.

Some thing furnished into the library which would be of use or prevent annoyance:

Renaming SolrDocument

toSolrEntity

to prevent a clash with SolrJ's annotations. - Adding instrumentation for Actuator; so that micrometer can publish them to an observation stack like Prometheus

Adding SolrMappingContext

so that the Spring Data middleware can understand what entities, repositories and annotations a user has created in their service and do all the Spring magic to wire it up. - Removing an assumption that SolrRepository<T, ID> can have generic IDs for a SolrEntity. In reality Solr only deals in String IDs so it became SolrRepository<T>

After this period it was time to deploy and test it in the wild.

Building and deploying a Sample app #

People love demos and Solr loves documents, so the logical step was to flesh out a Spring demo app and get it deployed somewhere. The default for demos in Spring Boot is the Spring Pet Clinic.

The Pet Clinic app would have worked on Solr but I wanted a dataset with lots of interesting attributes. Claude suggested some book datasets from kaggle (a website I had never come across).

A 7000 book dataset from Dylan Castilla was chosen.

A few interesting observations at this stage when extracting, transforming and the data into Solr:

Claude loves Python; I do not know Python or care much to learn about it. So I forced it to use Java in a shell scripting fashion, in a way it was using JavaScript (bad joke). Eventually we settled on a mature data curator Java app rather than a script approach.

(It was interesting that it default to Python though, possibly because there is so many Github repos and project out there with Python and that Python is the lingua franca of Data scientists. Afterwards I modified my user Claude.md to force it to use a Java approach or shell scripts but not Python please)

Once it had downloaded and ingested the book dataset, it by default went with a very low amount to build; 100 books. When prompting against this it then bumped it to 250. It took several prompts to get it up to 1000 books. I am not 100% sure why it went conservative in its ETL process, perhaps it didn't know what a "sensible" amount of data was without human input.

For hosting I had ran out of free AWS and GCP credits, so I wanted somewhere free or cheap to host. I decided to go with Railway for hosting

It was a novel experience using Railway however their agentic assistant caused more harm than good, and their opinionated way of configuring ports and health checks caused a bit of frustration.

You can play with the demo app here - https://spring-data-solr-sample-book-production.up.railway.app/swagger-ui/index.html

A small fail after this deployment was realising that we had added destructive delete and create endpoints to the demo app (imagine what users could've added on that dataset...). A quick push and that was rectified.

Releasing v1.0.0 and next steps #

A few more bits of tidying up and updating docs and the new library was ready to ship to Maven central.

So a newly resurrected library from zero to Maven Central in 24 days spanned. Although in real dev hours, I would put that number to 21-23 hours with checking code, writing issues or testing and playing about.

Was this vibe coded? I would say no. It was nostalgia coded with a desire to rebirth a library back into a modern Spring ecosystem. I have now integrated this starter into my original project, so you could say it was a very roundabout way to build a modular library for a personal project.

My hope though is that it will be of use to people and won't have killed too many dolphins.

A further hope is that it shows devs out there that vibe coded slop isn't the guarantee and that we can use LLMs to take some of the maintenance load out of opensource projects and prevent abandonment or dev burnout.

There are still a few open issues on this project and there will be more features and bugs to squash

If you're using Solr with Spring Boot and want the Spring Data experience back, it's there.

github.com/tomaytotomato/spring-data-solr

Issues and PRs are welcome.

Final Remark:

There is a lot of emotions currently flying around with opensource and its clash with slop and vibe coded PRs. We now have projects locking themselves down and rejecting any contributions, on the other hand we have projects being completely rewritten with Claude in a laissez-faire approach.

We need a happy medium, a balance that keeps projects alive and developed without becoming alien to their original creators.

Code on.

source & further reading

tomaytotomato.com — original article RFV-0001: Request for Vibes