# Ancestry has spent decades digitizing family records. AI is helping speed it up.

> Source: <https://www.businessinsider.com/ancestry-ai-llm-record-transcription-digitization-facial-recognition-2026-6>
> Published: 2026-06-30 15:56:02+00:00

The real-world downstream effects of AI adoption and implementation.

Over the past 42 years, Ancestry has collected more than 71 billion birth certificates, marriage licenses, and other family records from 88 countries to build 148 million family trees.

For most of the genealogy company's history, collecting, labeling, and [organizing that data](https://www.businessinsider.com/how-healthcare-companies-can-prepare-data-for-ai-assisted-management-2025-6) was a painstakingly slow process. Ancestry's employees and third-party vendors would spend months manually inputting data and transcribing international family records. International expansion began in 2001 when the company launched a website in the UK, but adding additional markets came at a high cost, said [Sriram Thiagarajan](https://www.businessinsider.com/how-companies-are-using-ai-generative-ai-daily-tasks-automation-2024-1), Ancestry's chief technology officer since November 2020.

"The cost for the time taken to digitize this rich content from around the world was the limiting factor for us," Thiagarajan told Business Insider.

Most of Ancestry's AI efforts have been led by Thiagarajan, who joined the company as chief information officer in September 2017; His expanded role came a month before the investment firm Blackstone closed on its $4.7 billion acquisition of Ancestry.

Since then, Ancestry's investments in machine learning and artificial intelligence, along with advancements in traditional and generative AI, have [accelerated the digitization process](https://www.businessinsider.com/how-ancestry-uses-ai-records-processing-2025-8), Thiagarajan said. AI has also made way for new user tools, including AI-powered facial and handwriting recognition technologies, he added.

**Training AI models**

Back in 2003, Jackson Reese, a data scientist and software research engineer, was recruited by a friend to join Ancestry as its head of digital imaging in preservation services. At the time, Ancestry had a one-person imaging department digitizing census data, birth and death records, immigration forms, and other historical records, Reese told Business Insider. He was initially hired to bring the company's imaging operations in-house and he also expanded its digital imaging team to more than 70 employees within three years.

Reese told Business Insider that the expanded team worked with now-outdated technologies such as microfilm scanners, which convert government archives and newspaper clippings into digital files.

Beginning in 2014, Ancestry leaned into early AI projects focused on developing Ancestry's own proprietary machine learning models and computer vision systems to build algorithms that could read paper documents, Reese said. This early work continued through 2016.

His team then worked with BERT, a family of natural language processing models that Google released in October 2018, to build more precise data extraction tools. Previously, when the Ancestry team received millions of new birth records, domain experts would review the documents then pass them to indexers, who transcribed and labeled them, said Reese.

Ancestry would then train its proprietary AI models on this data, with the hope that, after a few rounds of back-and-forth between domain experts, indexers, and data scientists, the AI model's accuracy would be north of 90%.

"That was the best-case scenario; sometimes we'd be at eight, 10, or a dozen iterations to really get a model dialed in," Reese said.

By 2019, Ancestry incorporated BERT-based models to more quickly process its obituary collection and other record-extraction initiatives. The company also kept employees in the loop to continue to validate the training data and ensure the models were effectively handling the records, said Reese.

## The ChatGPT tipping point

Thiagarajan said that the advent of ChatGPT in November 2022 was another "tipping point in terms of figuring out the art of possibilities." New large language models from OpenAI, Anthropic, and other AI hyperscalers opened up the possibility of speeding up the digitization of unstructured data, including user-generated images, scanned documents, and written stories, he said.

Now that AI is capable of performing record extraction more quickly and accurately, Ancestry can ingest birth records and other data, apply a [blend of proprietary models](https://www.businessinsider.com/ancestry-ai-models-cto-sriram-thiagarajan-2025-9) from OpenAI, Google, and Anthropic as well as open-source AI models, and "fine-tune them a little bit for our use cases," Reese said. He added that the company can process nearly 200 different languages with very little iterative model training.

By September 2023, Ancestry was using LLMs for user-facing features too, said Thiagarajan. Face Match, an [AI-powered facial recognition tool](https://www.businessinsider.com/ai-training-beyond-facial-recognition-gait-detection-heartbeat-sensors-2019-10) that helps users identify people in their family photos, debuted in July 2024.

In April 2025, the company unveiled a document transcription feature that lets customers upload scans of JPG and PNG files and [generate transcriptions](https://www.businessinsider.com/ai-notetaker-work-meetings-tech-save-time-boost-efficiency-2024-12) of their families' handwritten notes. Ancestry's AI Stories, launched in December 2025, allows customers to click on an ancestor's page within the company's database and hear a narrated audio story of their life, read by AI.

**The outcome**

By the end of 2025, more than 50% of Ancestry's historical records published on the website were generated using AI, said Thiagarajan. AI has helped triple the rate of content growth from 800 million records in 2021, to 5.2 billion new records in 2022, and 18.6 billion the following year, according to the company.

Ancestry has also continued to launch external AI use cases, including in June 2026, when it added [language translation](https://www.businessinsider.com/reddit-ai-translation-language-2024-9) to its customer-facing document transcription tool, Thiagarajan said.
