{"slug": "build-your-own-transaction-foundation-model-for-financial-intelligence", "title": "Build Your Own Transaction Foundation Model for Financial Intelligence", "summary": "NVIDIA released a developer example for building a transaction foundation model using accelerated computing, demonstrating a near-50% lift in fraud detection Average Precision over an XGBoost baseline. The model, pre-trained on unlabeled transaction sequences, can be applied to tasks including fraud detection, credit scoring, and lifetime value prediction.", "body_md": "Every swipe, transfer, and payment on a modern financial network encodes a pattern of human behavior. Transaction data is one of the richest signals an enterprise owns. Yet most production use cases for such tabular data still depend on hand-engineered features and rule sets that are brittle, expensive to maintain, and blind to the sequential structure inside a customer history.\n\nFoundation models, pre-trained on large volumes of unlabeled transaction sequences, change this equation by producing general-purpose representations of financial behavior that transfer across a wide array of downstream tasks. A single backbone covers fraud detection, credit scoring, lifetime value prediction, segmentation, personalized recommendations, recurrent-transaction detection, and more.\n\nThe industry signal is strong and accelerating. Innovative financial firms are training transformer-based models on billions of transactions, reporting double-digit relative lifts on production-scale tasks while simultaneously streamlining operations. See Stripe’s [payments foundation model](https://stripe.com/us/newsroom/news/sessions-2025), Nubank’s [NuFormer](https://arxiv.org/abs/2507.23267), Visa’s [TransactionGPT](https://arxiv.org/abs/2511.08939), Mastercard’s [large tabular model](https://www.mastercard.com/global/en/news-and-trends/stories/2026/mastercard-new-generative-ai-model.html), Revolut’s [PRAGMA](https://arxiv.org/abs/2604.08649), Plaid’s [transaction foundation model](https://plaid.com/blog/building-transaction-foundation-model-intelligent-finance/), and more.\n\nThe NVIDIA [Build Your Own Transaction Model developer example](https://build.nvidia.com/nvidia/build-your-own-transaction-foundation-model) walks through how to build a transaction foundation model end-to-end using accelerated computing.\n\nYou will progress through five steps in this workflow:\n\n- GPU-accelerated data processing with NVIDIA CUDA-X library\n[cuDF](https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf) - Custom tokenization with NVIDIA CUDA-X libraries\n[cuDF](https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf)and[cuML](https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cuml) - Transformer decoder model pretraining from scratch with\n[NVIDIA NeMo AutoModel](https://docs.nvidia.com/nemo/automodel/latest/index.html)open library, part of[NVIDIA NeMo framework](https://github.com/NVIDIA-NeMo/) - Extracting learned embeddings\n- Augmenting a downstream fraud classifier with embeddings\n\nBy the end, you will reproduce a near-50% lift in Average Precision (“AP”)— the area under the precision-recall curve—capturing how well the model ranks fraud across all operating thresholds), over a strong [XGBoost](https://www.nvidia.com/en-us/glossary/xgboost/) baseline on the [IBM TabFormer](https://github.com/IBM/TabFormer) fraud dataset. Figure 1, below, shows the end-to-end pipeline.\n\n## Why transformers fit transaction histories\n\nLarge language models learn from sequences of words. During pretraining, a model sees text and learns that words, phrases, and sentences carry meaning through order and context. A transaction foundation model applies the same principle to financial behavior. A sequence such as “paycheck deposit, grocery purchase, transit fare, recurring subscription, card-present restaurant payment” carries information that no single transaction row can express alone.\n\nTransformers are well suited to this structure because self-attention can connect events that sit far apart in history. A fraudulent transaction may only look suspicious when paired with a recent travel pattern or a sudden burst of small authorizations. Traditional tabular features can approximate these patterns, but engineers must decide which windows, aggregates, and rules to build up front. A pretrained transformer learns those relationships directly from the sequence.\n\nThis approach complements other NVIDIA financial AI workflows, including the [NVIDIA AI Blueprint for financial fraud detection](https://developer.nvidia.com/blog/supercharging-fraud-detection-in-financial-services-with-graph-neural-networks/) using graph neural networks (GNNs). GNNs capture relationships across connected entities such as accounts, merchants, devices, and transactions. Transaction foundation models focus on behavioral histories within a customer or account sequence. In practice, both methods produce rich embeddings with complementary information that pair naturally.\n\n## Load the data and set a baseline\n\nNotebook `01_dataset_baseline.ipynb`\n\nloads the [IBM TabFormer dataset](https://github.com/IBM/TabFormer), roughly 24.4M synthetic card transactions with a ~0.12% fraud rate, directly into GPU memory with [cuDF](https://developer.nvidia.com/topics/ai/data-science/cuda-x-data-science-libraries/cudf).\n\nThe dataset splits are partitioned temporally by cumulative transaction count: the first 80% of transactions by date is used for training; the next 10% becomes validation; and the final 10% becomes test. These splits therefore occupy disjoint and ordered time windows, preventing data leakage and reflecting real-world production environments.\n\nWith the splits in place, the notebook trains an XGBoost classifier utilizing native GPU acceleration with `tree_method=\"hist\"`\n\nand `device=\"cuda\"`\n\non a 1M-row balanced training sample. Evaluation runs on a 100k stratified holdout that preserves the realistic ~0.1% fraud prevalence.\n\nThe baseline numbers set the bar for the rest of the tutorial:\n\n- Test ROC-AUC: 0.9885\n- Test AP: 0.1238\n\nPay attention to AP rather than ROC-AUC. Under 0.1% class imbalance, ROC-AUC saturates quickly and hides meaningful differences in high scoring regions. AP measures across the full recall curve and responds to improvements where they matter operationally. Every subsequent model in this tutorial is judged by AP first.\n\n## Tokenize transactions on the GPU\n\nGeneral-purpose LLM tokenizers waste capacity on tabular financial data. For example, a byte pair encoding (BPE) tokenizer splits a single transaction into roughly 39 subword tokens, where most encode commas and dollar signs rather than behavior. Notebook 02_seq_preproc_tokenization.ipynb introduces a custom domain tokenizer that converts each transaction into roughly 12 semantic tokens with a much smaller vocabulary (6,251 symbols vs. 50,257 from BPE).\n\nIn addition to token information density, this efficiency also enables more than 3x the number of transactions for a set token budget. Practically speaking, a model with a context window of 4,092 can fit a history of ~315 transactions from the domain tokenizer and only ~102 transactions from a BPE tokenizer.\n\nFigure 2, below, compares token counts per transaction between the two tokenization methods on the same records.\n\nThe domain tokenizer is implemented in [src/tokenizer/financial_pipeline.py](https://github.com/NVIDIA-AI-Blueprints/transaction-foundation-model/blob/main/src/tokenizer/financial_pipeline.py). This flexible pipeline handles amount binning, merchant hashing, hour-of-day and day-of-week, month, card identity, chip type, ZIP3 and state, and customer identity. Every step runs on the GPU through cuDF.\n\nThe tokenizer can be readily adapted to different transaction schema by adding or replacing individual steps in the modular pipeline. Each step implements a small [BaseTokenizer](https://github.com/NVIDIA-AI-Blueprints/transaction-foundation-model/blob/main/src/tokenizer/base.py) interface, so extending coverage to new fields such as device ID or beneficiary country takes just a short subclass.\n\n## Pretrain with NeMo AutoModel\n\nNeMo AutoModel is a Pytorch-native open-source training library under the NVIDIA NeMo Framework, designed to streamline and scale training and finetuning for LLMs and VLMs.\n\nNotebook `03_foundation_model_training.ipynb`\n\npretrains a decoder-only foundation model on the tokenized corpus using causal language modeling. The objective is simple — to predict the next token given every previous token — but the supervision signal is dense. Every position in a sequence contributes a gradient, so a single packed transaction sequence yields thousands of next-event predictions.\n\nThe model is a compact Llama decoder defined in [configs/pretrain_financial_decoder.yaml](https://github.com/NVIDIA-AI-Blueprints/transaction-foundation-model/blob/main/configs/pretrain_financial_decoder.yaml):\n\n- ~29M parameters\n- Hidden size 512, 8 transformer layers\n- Grouped-Query Attention with 8 query heads and 2 KV heads\n- 8,192-token RoPE context window\n- SwiGLU activation, RMSNorm, domain vocabulary of 6,251 tokens\n\nNeMo AutoModel handles the rest of the stack. Kick off a single-GPU sanity run.\n\n```\npython scripts/train_decoder_model.py \\\n  --config configs/pretrain_financial_decoder.yaml \\\n  --step_scheduler.max_steps 30\n```\n\nThe 30-step demo drops training loss from `ln(6251)≈8.74`\n\n(the random-guess baseline for this vocabulary) to around 6.0. To scale the same run to eight GPUs, simply prefix the command with `torchrun --nproc-per-node=8`\n\n—no changes to the script or distributed boilerplate required. Multi-node scaling is straightforward as well. NeMo AutoModel wires up FSDP2 sharding, mixed precision, gradient accumulation, and checkpoint consolidation from the YAML.\n\nCheckpoints land as standard `safetensors`\n\nfiles, which means the trained backbone loads with a one-liner anywhere HuggingFace Transformers is installed:\n\n``` python\nfrom transformers import AutoModelForCausalLM\n\nmodel = AutoModelForCausalLM.from_pretrained(\"models/decoder-foundation-model\")\n```\n\nThe repository ships a full checkpoint trained for 3,000 steps, which Notebooks 04 and 05 load; the 30-step test is for demonstrative and validation purposes.\n\nTo swap architectures, edit model._target_ and model.config._target_ in the YAML. Any HuggingFace-compatible decoder is designed to drop in without training-code changes.\n\n## Extract embeddings at scale\n\nNotebook `04_inference_embedding_extraction.ipynb`\n\nturns the pretrained backbone into a feature extractor. It loads the checkpoint with `AutoModelForCausalLM`\n\n, requests `output_hidden_states=True`\n\n, and pools the final hidden layer down to a 512-dim vector per user history.\n\nFor decoder-only models with causal attention, only the final position has observed the entire sequence while earlier positions are blind to later tokens. Last-token pooling therefore picks the most informative location in the sequence. The implementation in [src/decoder_inference.py](https://github.com/NVIDIA-AI-Blueprints/transaction-foundation-model/blob/main/src/decoder_inference.py) uses the attention mask to find the last non-pad token per row and gathers its hidden state.\n\nThe extraction loop is a single call:\n\n```\nembeddings = inference.extract_embeddings_batched(\n    padded_ids, batch_size=1024, show_progress=True\n)\n```\n\nThe notebook extracts and saves train, validation, and test embeddings as .npy files. Additionally, a metadata.json describing shapes and row alignment is saved, which is later used in Notebook 05 to join embeddings back to the associated raw tabular features.\n\nFigure 3, below, shows a 3D UMAP projection of 50k validation embeddings, colored by merchant industry category and zip code. Visible clusters in each field confirm that the backbone has learned semantically coherent representations without ever seeing any target labels during pretraining.\n\n*Figure 3. 3D UMAP projection of 50,000 validation-set transaction embeddings. Points colored by merchant industry and user zip code each show clear behavioral clusters in the learned representation space*\n\n## Measure lift on a downstream task\n\nNotebook `05_xgboost_fraud_detection.ipynb`\n\nanswers the billion dollar question: Can transaction foundation model embeddings move downstream metrics?\n\nIt trains three GPU XGBoost classifiers and evaluates all of them on the same 100k stratified test set:\n\n- Raw—13 hand-engineered tabular features (the baseline from Step 1)\n- Embeddings—512-dim foundation-model vectors compressed to 64d with PCA (~78% variance retained)\n- Combined—raw features concatenated with the 64d embeddings, 77d total\n\nTable 1, below, summarizes the test results.\n\n| Raw (baseline) | 13 | 0.9885 | 0.1238 |\n| Embeddings only | 64 | 0.8775 | 0.0123 |\n| Combined | 77 | 0.9925 | 0.1755 |\n\n*Table 1. Downstream fraud-detection results on the TabFormer temporal test split. The combined model delivers a +0.41% ROC-AUC lift and a +41.76% AP lift over the raw-feature baseline*\n\nThe combined model lifts ROC-AUC by 0.41% and AP by 41.76% over the baseline. That AP delta is the operational win: a review team with fixed daily capacity catches materially more fraud at the same workload.\n\nEmbeddings encode the user’s transaction history and provide predictive power, but underperform the baseline as lone features. The combined model leverages event-level information from the raw tabular row and sequence-level historical context from embeddings that were learned during pretraining. Figure 4, below, shows the comparison visually.\n\n*Figure 4. Side-by-side comparison of test ROC-AUC and test AP for the three downstream models. The combined model (raw features + foundation-model embeddings) wins on both metrics*\n\n## Customize the developer example\n\nThe repository is structured so that each component is swappable independently:** **\n\n—**Tokenizer:** Adapt the pipeline in [src/tokenizer/](https://github.com/NVIDIA-AI-Blueprints/transaction-foundation-model/tree/main/src/tokenizer) to any transaction schema by adding or replacing steps. Each step is a small subclass of `BaseTokenizer`\n\n, so supporting new fields such as device fingerprint, beneficiary country, and merchant country is a short addition.\n\n—**Model architecture:** Edit `model._target_`\n\nand `model.config._target_`\n\nin the training YAML to point at any HuggingFace-compatible decoder. The rest of the training pipeline using NeMo (data loader, FSDP2, checkpointing, evaluation) stays put.\n\n—**Downstream task:** Replace XGBoost with any model that consumes fixed-length feature vectors. Churn prediction, customer segmentation, lifetime value regression, next-best-action ranking, and credit scoring all fit the same embedding-plus-head pattern.\n\nThe developer example is designed to extend to labels other than fraud as well, exhibiting foundational capabilities. Swap `Is Fraud?`\n\nin Step 5, above, for any event label that aligns with the user histories encoded by the backbone.\n\n## Get started\n\nYou now have a reference path from raw transaction logs to a pretrained foundation model that augments a downstream classifier, accelerated end-to-end with NVIDIA. The three components — a custom tokenizer, a transformer decoder backbone, and an embedding-driven XGBoost head — together deliver a near-50% AP lift over a strong industry standard baseline on the TabFormer fraud benchmark.\n\nVisit [build.nvidia.com](https://build.nvidia.com/nvidia/build-your-own-transaction-foundation-model) to deploy the notebook in a GPU-accelerated environment via [NVIDIA Launchable](https://brev.nvidia.com/launchable/deploy?launchableID=env-3CBNhPVekCGLYa412iKiXGqMwVJ) or your own environment via [GitHub repository](https://github.com/NVIDIA-AI-Blueprints/transaction-foundation-model).", "url": "https://wpnews.pro/news/build-your-own-transaction-foundation-model-for-financial-intelligence", "canonical_source": "https://developer.nvidia.com/blog/build-your-own-transaction-foundation-model-for-financial-intelligence/", "published_at": "2026-06-16 20:30:08+00:00", "updated_at": "2026-06-16 20:51:28.513448+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "ai-infrastructure", "ai-products", "developer-tools"], "entities": ["NVIDIA", "Stripe", "Nubank", "Visa", "Mastercard", "Revolut", "Plaid", "IBM"], "alternates": {"html": "https://wpnews.pro/news/build-your-own-transaction-foundation-model-for-financial-intelligence", "markdown": "https://wpnews.pro/news/build-your-own-transaction-foundation-model-for-financial-intelligence.md", "text": "https://wpnews.pro/news/build-your-own-transaction-foundation-model-for-financial-intelligence.txt", "jsonld": "https://wpnews.pro/news/build-your-own-transaction-foundation-model-for-financial-intelligence.jsonld"}}