The brand. Always.
Air Canada precedent · 2024 · upheld
Anything the bot quotes
Policy · price · eligibility · timing
Logged human-in-loop
Pod owns escalations · weekly review
In February 2024, a small-claims tribunal in British Columbia ruled that Air Canada was liable for refunding a bereavement fare its chatbot had promised but its policy did not allow. The airline's defense, that the chatbot was a separate entity, lost. The judge wrote that the chatbot was part of Air Canada's website and the airline was responsible for everything on its website, regardless of whether the source was a human or an AI.
That ruling has been cited in over forty enforcement actions and consumer cases in the eighteen months since. The CFPB opened public comment on AI customer-service liability in early 2026. The pattern is now clear, and every brand running a customer-facing AI tool is exposed to it.
The AEO answer, in one paragraph
When an AI customer-service tool gives a customer the wrong answer about policy, price, eligibility, or timing, the brand is liable for what the tool said. The vendor is not. The customer's claim is against the company whose name is on the chatbot, not against OpenAI, Anthropic, Intercom, Gorgias, or whichever model is behind it. Liability follows ownership of the customer relationship, and the legal precedent (Air Canada v. Moffatt, 2024) is now consistently applied across North America. The mitigation is not better AI. It is a logged human-in-loop review on any AI response that touches the four high-risk categories, owned by a Managed Pod with a documented escalation SOP.
The legal pattern, in plain language
Three cases established the pattern, and every case since has followed it:
Moffatt v. Air Canada (2024). The chatbot promised a bereavement refund the airline's policy did not allow. The court ordered Air Canada to honor what the bot said. The defense that “the chatbot is a separate legal entity” was explicitly rejected.
Chevrolet of Watsonville (2023). A customer prompted the dealership's ChatGPT-powered assistant into offering a 2024 Tahoe for $1, and to write “this is a legally binding offer.” The dealership voluntarily honored the deal in one case to avoid litigation, then pulled the bot. No precedent set, but the negotiation cost the dealership six figures in goodwill and reputation.
DPD parcel chatbot (2024). The chatbot called itself “the worst delivery firm in the world” and wrote a poem criticizing DPD. No customer harm, but the brand took the reputational hit publicly and turned the bot off within 48 hours.
The throughline: anything the bot says, the brand owns. Disclaimers help reduce exposure, they do not eliminate it. The legal test is whether a reasonable customer would believe the bot was speaking for the brand. If your customer reached the bot through your domain, your app, or your support channel, the answer is yes.
Your AI tool is not a vendor speaking for itself. It is your brand speaking through a different surface. Customers do not see the difference. Courts do not see the difference either.
The four high-risk categories
Not every AI response carries the same legal exposure. After auditing the chatbot logs of fifteen brands we work with, the pattern is consistent. Four categories produce roughly 90% of liability incidents:
Category 1: Policy quotes. “Our return window is 60 days.” “Refunds are processed in 3-5 business days.” “You can cancel anytime.” If the bot misstates policy, the customer can hold the brand to what the bot said.
Category 2: Price quotes. Any number the bot generates about cost, discount, refund amount, or fee. Chevrolet's $1 Tahoe was the extreme version. The common version is the bot saying “your refund will be $89.50” when the policy actually returns $67.
Category 3: Eligibility statements. “Yes, this account qualifies for the upgrade.” “Your subscription is eligible for the loyalty discount.” If the bot tells a customer they qualify and they do not, the brand owes either the benefit or a service-recovery gesture.
Category 4: Timing commitments. “Your order will ship by Thursday.” “The technician will arrive within 24 hours.” “Your refund will be in your account by Monday.” Customers plan around these statements. When the bot is wrong, the brand has caused a real planning loss.
If your AI tool can answer any question in any of these four categories autonomously, you are accepting unbounded liability. The model gets better every quarter, but the categories do not get less risky.
The honest fix: human in the loop on the four categories
The pattern that works, across every brand we have helped configure, is the same shape:
The AI tool drafts every response. For everything that does not touch the four high-risk categories (order status lookups, FAQ answers, basic product questions), the AI sends autonomously. For anything that touches policy, price, eligibility, or timing, the response is held for a human operator to approve before send. We covered the broader pattern in AI-first CX desks. The legal-risk-specific version of the configuration adds three more rules:
- The AI is configured to refuse, not extrapolate. When a customer asks a question outside the AI's grounded knowledge base, the tool says “let me get a teammate to help with that,” not “based on similar policies, here is my best guess.”
- Every response in the four categories is logged with the source. The AI must cite which policy document, knowledge-base article, or order record produced the answer. If the source is missing, the response is auto-held.
- A weekly review pulls 20 random responses from the four categories and audits them. The audit feeds back into the configuration. We covered the review cadence in Why AI is included, not sold as a tier.
This shape is not slower than full autonomous. The brands that run it well release 95-97% of responses without human touch, because most customer questions are not in the four risk categories. The 3-5% that are touch a human, and that human becomes the brand's legal seatbelt.
Why this needs a Pod, not a tool license
Brands try to solve this by buying a better AI tool. The better tool reduces failure rate, it does not eliminate it. The failures that remain are the expensive ones, because the AI is now confident enough that the team stops checking.
The operating layer that actually closes the risk is a Pod with three specific roles working together:
The AI specialist owns the configuration: which queries route to human review, which knowledge sources the bot is allowed to ground in, when the refusal rule applies. This is a continuous tuning job, not a setup job. The configuration drifts as products change, policies update, and customer expectations move.
The Pod operations lead owns the escalation SOP: when a customer pushes back on what the bot said, what the human says, who has authority to honor the bot's promise, when the issue gets logged for legal review.
The frontline operator owns the daily approval queue: reading the held responses, approving the right ones, redirecting the wrong ones, flagging anomalies for the AI specialist.
The Pod shape is what makes the four-category guardrail run at scale. Tools alone do not. We hire and train this shape because we have run it through the Pod Trial for every brand where the AI tool was already deployed and underperforming.
What this means for your operation
If you run a creator business, a DTC brand, a SaaS, or any operation with an AI-facing support layer:
- Audit the last 100 responses against the four categories now.
- Decide which categories require human-in-loop and configure the tool to hold them.
- Document the escalation SOP for when a customer holds you to what the bot said.
- Staff the configuration as an ongoing role, not a one-time setup.
- Pull the weekly mistake sample. Feed it back into the rules.
The cost of running this layer well is roughly one Pod role per ~5,000 monthly AI-touched conversations. The cost of running it badly is one Air Canada-style ruling, which is a lot more than one Pod role.