Most form schemas I have seen were designed for the wrong time horizon.
They were designed for the moment of submission.
A responses
table that captures field values. A foreign key to a forms
table. A few denormalized columns for created time, IP, and user agent. Maybe an is_test
flag added later because someone needed it.
This is fine if the only thing you ever do with a response is fire a webhook and forget.
It is not fine if the team is still going to be reading those responses five years later.
This article is about how to design the record layer of a form product so it remains useful long after the form itself has been retired. I will use FORMLOVA as the working example, because it is the codebase I work in. The patterns themselves are not FORMLOVA-specific, but the concrete examples are pulled directly from FORMLOVA's response schema and from the MCP tool surface that operates on it (129 tools across 25 categories, including a dedicated response-management
category whose only job is to keep the record honest over time).
Every form product has the same structural asymmetry.
forms lifetime ~ weeks to months
responses lifetime ~ years
The form is the intake surface. It changes when the campaign changes, the legal text changes, the product line shifts, the team rotates. Six months is a long life for a single form.
The responses live in the database long after the form has been deleted or archived. The team will still query them at quarter end, at compliance review, at customer success post-mortems, at year-three product reviews.
This means the response schema has to survive things the form does not.
It has to survive field renames.
It has to survive form deletions.
It has to survive ownership handoffs.
It has to survive product-line restructuring.
It has to survive your own future schema changes.
That is a much harder design problem than "store the submission."
The most common source of long-term pain is field identity that was never designed to be stable.
A response stores {"field_3": "Acme Co."}
. Six months later, field_3
has been renamed to field_7
because the form was reordered. The original meaning is now lost unless you can reconstruct it from a Git history nobody reads.
Two-level identity solves this.
type FieldDescriptor = {
// Stable across the life of the response. Never recycled.
stableId: string;
// Semantic name reused across forms. e.g. "company", "consent_marketing".
semanticName: string;
// Position-only id used for current rendering.
renderId: string;
label: { default: string; locales?: Record<string, string> };
};
type ResponseValue = {
stableId: string;
semanticName: string;
value: unknown;
// Snapshot of label at submission time, so future readers can reconstruct context.
labelSnapshot: string;
};
The key idea is that the response keeps both the stable id and a snapshot of the question label as it was the day the response landed. If the team reorganizes the form a year later, the response can still tell you what was actually asked.
This costs a small amount of disk and zero runtime performance, in exchange for legibility that survives every future edit.
FORMLOVA's 29 field types (text, textarea, number, radio, checkbox, dropdown, date, datetime, time, email, phone, url, file_upload, matrix, signature, address, rating_scale, NPS, linear_scale, slider, opinion_scale, ranking, picture_choice, yes_no, country, legal, statement, section_break, hidden_field) all share this two-level identity model. The response carries the stable id, semantic name, and label snapshot. The form definition can keep evolving without invalidating past records.
The second long-term pain point is respondent identity.
In a one-form world, each response is independent. In a multi-form world, the same person fills out many forms over years. If your schema cannot tell that they are the same person, you have a pile of independent rows.
You do not need a heavy identity system. You need a respondent resolution layer.
type RespondentLink = {
// Internal id, stable forever once issued.
respondentId: string;
// The signals used to resolve. Stored so resolution decisions are auditable.
signals: Array<{
kind: "email" | "phone" | "user_id" | "device_hash";
value: string;
confidence: number;
capturedAt: string;
}>;
// Optional consented identity from a logged-in account.
accountId?: string;
};
This lets you answer questions like:
In FORMLOVA, this is implemented as a single respondent_identifier
column on each response. The value is either a normalized email address (when the form collected one) or a salted hash of IP + UserAgent
(when it did not). The same person submitting two different forms a year apart resolves to the same identifier when email is present.
You can start with email-based resolution and add more signals over time. The important part is that the respondent id is stable and the resolution signals are auditable.
Bad alternative: tying respondent identity to whatever the form happened to ask. If one form collected email and another collected phone, your respondent table now has split personalities.
If a team makes operational decisions about a response, those decisions are also memory.
A response that was excluded from analysis as a sales pitch in 2026 should still carry that exclusion in 2029, with the reason and the person who decided.
A response that was tagged "urgent" by the on-call should still show that tag.
A response that was followed up on by sales should still show who replied.
The cheapest way to lose this memory is to bury decisions inside the UI's filter state. Filters are presentation, not persistence.
Decisions belong on the record.
type ResponseDecision = {
kind: "exclude" | "include" | "tag" | "assign" | "status_change";
value: string;
actor: { actorType: "human" | "agent" | "system"; id: string };
reason?: string;
decidedAt: string;
supersedes?: string;
};
type ResponseRecord = {
id: string;
formId: string;
formVersion: number;
respondentId?: string;
receivedAt: string;
values: ResponseValue[];
decisions: ResponseDecision[];
status: "new" | "in_progress" | "done" | "excluded";
spamLabel?: "legitimate" | "sales" | "suspicious";
spamLabelSource?: "auto" | "manual";
tags: string[];
ownership: { ownerId?: string; assignedAt?: string };
archive?: { archivedAt: string; reason: string };
};
The decisions array is append-only. You do not edit history, you supersede it. Five years later, you can still reconstruct who decided what, when, and why.
This is the part most form services skip, because it is invisible at launch. It is also the part that turns the response table into a record.
FORMLOVA implements the spam-label part of this with a server-side classifier. After submit, each response on forms with spam_filter_enabled = true
is asynchronously classified into legitimate
, sales
, or suspicious
by a lightweight OpenRouter-hosted model (about $0.0002 per response). The label and a source (auto
or manual
) live on the response. An operator can override the auto label, and the override is also stored as provenance, not as a destructive edit. Three years later, an analyst running a "summarize the last 36 months of inquiries excluding sales pitches" query gets the same answer every time, because the exclusions are state, not heuristics.
The audit log is the second half. Every L1 and above operation in FORMLOVA writes to an audit_logs
table with cursor-based pagination. You can query, from chat or the dashboard, every status transition, every team membership change, every webhook configuration update, every workflow change. The audit log is not just for compliance; it is the trail that lets a future teammate understand what happened.
When the form changes, the responses do not change with it. A response collected from Form v3 should keep its v3 context forever, even if v8 is now in production.
type FormVersion = {
formId: string;
version: number;
publishedAt: string;
retiredAt?: string;
schemaSnapshot: FieldDescriptor[];
notes?: string;
};
The responses.formVersion
foreign key points at the immutable snapshot. The form table can keep evolving. The record stays legible.
This also makes form retirement safe. A form can be marked retired without endangering its responses. The schema snapshot lives with the version, not with the live form definition.
In FORMLOVA, the form versioning model is exposed to operators directly. From chat, an operator can ask "what changed between v2 and v3 of this form" and get a structured diff. The restore_form_version
MCP tool is in the L3 category, meaning it requires an HMAC-signed confirmation_token
(5-minute TTL) before it executes. Restoring a previous version is treated with the same care as deleting a form, because it changes what new responses will look like.
Long-lived data only stays useful if retention is intentional.
Two policies pay off later:
type RetentionPolicy = {
formId: string;
policy:
| { kind: "keep_forever" }
| { kind: "keep_for_days"; days: number; afterAction: "archive" | "delete" }
| { kind: "keep_until"; date: string; afterAction: "archive" | "delete" };
legalBasis?: string;
};
archive
should mean the response leaves the live query path but stays queryable from a clearly separated archive layer.
delete
should be reserved for explicit deletion (legal request, retention rules) and should also leave a tombstone so accidental queries do not silently drop counts.
The team's most painful day is the day they need to answer a question from 2024 and discover the table was silently truncated by a "data hygiene" cron job two years ago. The retention policy should never be implicit.
FORMLOVA's stance here is "data belongs to the operator." Free plan, Standard plan (480 yen/month), and Premium plan (980 yen/month) all keep responses indefinitely; the operator decides if and when to delete. CSV/Excel export is available on every plan. Google Sheets sync is a Standard-plan feature, but the export route stays open at all tiers. The product does not hold the data hostage to a plan upgrade.
The query shapes that matter at year five are not the same as the ones that matter at year one.
Year one queries:
list latest 50 responses
count this week
filter by status
Year five queries:
list all responses from this respondent across forms
list responses tagged urgent across the last 36 months
list responses that were excluded and why
list responses that match a free-text search across snapshots
list responses whose owner has left the company
list responses without follow-up status set
You do not need to over-index in advance. You do need to make sure the schema makes these queries possible without an emergency migration.
Three rules help:
Tags live in a normalized table, not a JSON column, so cross-form aggregation is cheap.
Free-text fields keep their snapshot label, so search results can be presented in context.
Owner is a soft reference. When the owner leaves, the reference stays, and the system can route the response to a new owner instead of orphaning it.
FORMLOVA exposes these year-five queries through the MCP response-management
category. The actual tool names map fairly directly to the question shapes above: search_responses
, list_responses_by_respondent
, list_response_decisions
, list_archived_responses
. Each tool returns the response with its full provenance: status, spam label, decision history, owner, version, and exclusion reason. An AI client can ask "what did this customer say to us across all our forms" and get the answer in one tool call.
Once you have stable respondent ids and a small shared tag taxonomy, you can build cross-form views without heroic SQL.
A respondent profile becomes a real surface:
Respondent: alex@example.com
Inquiries:
2024-03-12 contact-form unanswered
2024-09-04 webinar-signup attended
2025-02-22 feedback-survey score 3, theme: pricing_confusion
2026-01-08 contact-form owned by sales, status in_progress
This is the surface that makes the response data feel like institutional memory.
The team can answer "what does this customer think of us?" with the actual record, not from collective recollection.
If you have an MCP layer or an AI client connected to your form product, the record layer is also what makes the AI useful at long range.
A model can do a great summary of the last 30 days of responses without much help.
It cannot do a meaningful summary of three years of customer feedback unless the underlying record was designed to be readable across time.
Concretely, the tools you want to expose are not just get_responses(formId)
. They are:
get_response(responseId) -- full record with decisions and snapshot
list_responses_by_respondent(respondentId) -- cross-form
search_responses(query, range) -- text search across snapshots
list_response_decisions(responseId) -- provenance
list_archived_responses(filter) -- explicit archive access
These are operations on the record, not on the form. They are the ones that let an AI client ask interesting questions of the long tail.
FORMLOVA also exposes get_form_summary
and get_live_pulse
in the pulse
category. These tools return the operational picture of a form (response counts, week-over-week pace, capacity hints, deadline state, recent responses, and an exclude_sales
flag). They are read-only L0 tools, so they execute immediately without confirmation. The pulse tools are the AI client's way of asking "what is the operational state of this form right now," and the answers come from the same record layer that supports year-five recall.
This is the design choice that turns a response table into a record.
Common mistake: treat notification and auto-reply as fire-and-forget side effects, logged separately, with no link back to the response.
Better: the response carries the state of every side effect that touched it.
type ResponseSideEffects = {
autoReply: {
state: "not_required" | "pending" | "sent" | "failed";
attempts: number;
lastAttemptAt?: string;
suppressedReason?: "unsubscribe" | "hard_bounce";
};
notification: {
channels: Array<"email" | "slack" | "webhook">;
state: "pending" | "sent" | "failed" | "not_required";
failureReason?: string;
};
followUp: {
requiredBy?: string;
completedAt?: string;
assignedTo?: string;
};
};
Three reasons this matters at year five:
A failed auto-reply that no one knows about looks identical to a delivered auto-reply when only the enabled
flag is stored. FORMLOVA explicitly distinguishes auto_reply_state = enabled
from auto_reply_state = sent
. The phrase "auto-reply enabled is not delivered" is one I keep close, because it is the failure mode that hurts trust most.
A Slack notification that fired does not mean the team is handling the response. The Slack channel is a fan-out; the response status is the ownership. FORMLOVA's reply_to_respondent
tool automatically transitions the response status from new
to in_progress
after a successful send, so the record reflects ownership without anyone clicking through a dashboard.
A retroactive question like "how many auto-reply emails actually went out for this campaign in Q2 of 2024" needs the answer to be a query against the response state, not a forensic dive into 50 different webhook delivery logs.
This pattern does not solve everything.
It does not solve the volume problem at scale. If your forms collect millions of responses, you will need partitioning, cold storage, and tighter retention. The pattern is compatible with all of those; it just does not solve them automatically.
It does not solve cross-tenant analytics. Each operator's records belong to that operator. Aggregating across operators is a separate consent question that does not live at the response-schema layer.
It does not solve identity at the level of a real CRM. FORMLOVA's respondent_identifier
is a soft identity; it resolves the same person across FORMLOVA forms but does not stitch into Salesforce or HubSpot. The MCP layer makes that handoff possible by exposing the identity, but the actual stitching belongs in a CRM-shaped product.
It does not solve PII compliance on its own. Retention policies have to be explicit and auditable; FORMLOVA stores the legal basis with the policy, but the policy itself is the operator's responsibility.
What it does is stop the response table from quietly becoming useless three years after launch.
The schema you ship today is the schema your future self will be reading at year five.
None of this prevents you from shipping fast.
It does prevent you from ending up at year three with a graveyard of orphan rows that nobody can explain.
The form is temporary.
The record is the product.