{"slug": "ancestry-has-spent-decades-digitizing-family-records-ai-is-helping-speed-it-up", "title": "Ancestry has spent decades digitizing family records. AI is helping speed it up.", "summary": "Ancestry has spent 42 years digitizing 71 billion family records from 88 countries, a process that was slow and costly. Since 2014, the company has used AI, including machine learning and natural language processing models like BERT, to accelerate digitization and develop new tools such as facial and handwriting recognition. The advent of ChatGPT in 2022 further advanced Ancestry's AI capabilities, reducing the time and cost of expanding its global record collection.", "body_md": "The real-world downstream effects of AI adoption and implementation.\n\nOver the past 42 years, Ancestry has collected more than 71 billion birth certificates, marriage licenses, and other family records from 88 countries to build 148 million family trees.\n\nFor most of the genealogy company's history, collecting, labeling, and [organizing that data](https://www.businessinsider.com/how-healthcare-companies-can-prepare-data-for-ai-assisted-management-2025-6) was a painstakingly slow process. Ancestry's employees and third-party vendors would spend months manually inputting data and transcribing international family records. International expansion began in 2001 when the company launched a website in the UK, but adding additional markets came at a high cost, said [Sriram Thiagarajan](https://www.businessinsider.com/how-companies-are-using-ai-generative-ai-daily-tasks-automation-2024-1), Ancestry's chief technology officer since November 2020.\n\n\"The cost for the time taken to digitize this rich content from around the world was the limiting factor for us,\" Thiagarajan told Business Insider.\n\nMost of Ancestry's AI efforts have been led by Thiagarajan, who joined the company as chief information officer in September 2017; His expanded role came a month before the investment firm Blackstone closed on its $4.7 billion acquisition of Ancestry.\n\nSince then, Ancestry's investments in machine learning and artificial intelligence, along with advancements in traditional and generative AI, have [accelerated the digitization process](https://www.businessinsider.com/how-ancestry-uses-ai-records-processing-2025-8), Thiagarajan said. AI has also made way for new user tools, including AI-powered facial and handwriting recognition technologies, he added.\n\n**Training AI models**\n\nBack in 2003, Jackson Reese, a data scientist and software research engineer, was recruited by a friend to join Ancestry as its head of digital imaging in preservation services. At the time, Ancestry had a one-person imaging department digitizing census data, birth and death records, immigration forms, and other historical records, Reese told Business Insider. He was initially hired to bring the company's imaging operations in-house and he also expanded its digital imaging team to more than 70 employees within three years.\n\nReese told Business Insider that the expanded team worked with now-outdated technologies such as microfilm scanners, which convert government archives and newspaper clippings into digital files.\n\nBeginning in 2014, Ancestry leaned into early AI projects focused on developing Ancestry's own proprietary machine learning models and computer vision systems to build algorithms that could read paper documents, Reese said. This early work continued through 2016.\n\nHis team then worked with BERT, a family of natural language processing models that Google released in October 2018, to build more precise data extraction tools. Previously, when the Ancestry team received millions of new birth records, domain experts would review the documents then pass them to indexers, who transcribed and labeled them, said Reese.\n\nAncestry would then train its proprietary AI models on this data, with the hope that, after a few rounds of back-and-forth between domain experts, indexers, and data scientists, the AI model's accuracy would be north of 90%.\n\n\"That was the best-case scenario; sometimes we'd be at eight, 10, or a dozen iterations to really get a model dialed in,\" Reese said.\n\nBy 2019, Ancestry incorporated BERT-based models to more quickly process its obituary collection and other record-extraction initiatives. The company also kept employees in the loop to continue to validate the training data and ensure the models were effectively handling the records, said Reese.\n\n## The ChatGPT tipping point\n\nThiagarajan said that the advent of ChatGPT in November 2022 was another \"tipping point in terms of figuring out the art of possibilities.\" New large language models from OpenAI, Anthropic, and other AI hyperscalers opened up the possibility of speeding up the digitization of unstructured data, including user-generated images, scanned documents, and written stories, he said.\n\nNow that AI is capable of performing record extraction more quickly and accurately, Ancestry can ingest birth records and other data, apply a [blend of proprietary models](https://www.businessinsider.com/ancestry-ai-models-cto-sriram-thiagarajan-2025-9) from OpenAI, Google, and Anthropic as well as open-source AI models, and \"fine-tune them a little bit for our use cases,\" Reese said. He added that the company can process nearly 200 different languages with very little iterative model training.\n\nBy September 2023, Ancestry was using LLMs for user-facing features too, said Thiagarajan. Face Match, an [AI-powered facial recognition tool](https://www.businessinsider.com/ai-training-beyond-facial-recognition-gait-detection-heartbeat-sensors-2019-10) that helps users identify people in their family photos, debuted in July 2024.\n\nIn April 2025, the company unveiled a document transcription feature that lets customers upload scans of JPG and PNG files and [generate transcriptions](https://www.businessinsider.com/ai-notetaker-work-meetings-tech-save-time-boost-efficiency-2024-12) of their families' handwritten notes. Ancestry's AI Stories, launched in December 2025, allows customers to click on an ancestor's page within the company's database and hear a narrated audio story of their life, read by AI.\n\n**The outcome**\n\nBy the end of 2025, more than 50% of Ancestry's historical records published on the website were generated using AI, said Thiagarajan. AI has helped triple the rate of content growth from 800 million records in 2021, to 5.2 billion new records in 2022, and 18.6 billion the following year, according to the company.\n\nAncestry has also continued to launch external AI use cases, including in June 2026, when it added [language translation](https://www.businessinsider.com/reddit-ai-translation-language-2024-9) to its customer-facing document transcription tool, Thiagarajan said.", "url": "https://wpnews.pro/news/ancestry-has-spent-decades-digitizing-family-records-ai-is-helping-speed-it-up", "canonical_source": "https://www.businessinsider.com/ancestry-ai-llm-record-transcription-digitization-facial-recognition-2026-6", "published_at": "2026-06-30 15:56:02+00:00", "updated_at": "2026-06-30 16:31:13.915270+00:00", "lang": "en", "topics": ["machine-learning", "computer-vision", "natural-language-processing", "ai-products", "ai-tools"], "entities": ["Ancestry", "Sriram Thiagarajan", "Blackstone", "Jackson Reese", "Google", "OpenAI", "ChatGPT", "BERT"], "alternates": {"html": "https://wpnews.pro/news/ancestry-has-spent-decades-digitizing-family-records-ai-is-helping-speed-it-up", "markdown": "https://wpnews.pro/news/ancestry-has-spent-decades-digitizing-family-records-ai-is-helping-speed-it-up.md", "text": "https://wpnews.pro/news/ancestry-has-spent-decades-digitizing-family-records-ai-is-helping-speed-it-up.txt", "jsonld": "https://wpnews.pro/news/ancestry-has-spent-decades-digitizing-family-records-ai-is-helping-speed-it-up.jsonld"}}