img_pruchat-landingE@2x

YEAR
2020

TYPE
Augmented AI
Conversation Design
Information Architecture
Product Management

Designing Trustworthy AI in a Regulated Service Environment

I led the experience design for PRUChat, Prudential’s customer-facing AI assistant, during a period of rising service demand, COVID pressure, and strict insurance compliance. The challenge was not to make AI feel human, it was to make it reliable, useful, and safe enough for a regulated environment.

img_pruchat_UI2

WHY THIS STILL MATTERS IN 2026

The model changed, but the trust problem hasn't

Today’s AI models are more powerful, but the core design challenge is the same: users need to know when to trust AI, when to question it, and when to escalate.

PRUChat was an early lesson in designing around AI limitations in a high-risk domain. The hybrid architecture, combining NLP with deterministic compliance flows and built pre-ChatGPT/Claude era, turned out to anticipate a lot of what regulated AI deployment now requires. 

IMPACT AT A GLANCE

Delivered $2.3M Annual Savings

BUSINESS TARGET

25%

Call Volume Reduction

ACTUAL PERFORMANCE

32%

30%

Email Inquiry Reduction

28%

<5 mins

Average Response Time

4.2 mins

42%

Successful Self-Service Rate

52%

35%

Agent Escalation Rate

31%

*Note: These metrics reflect 2020 AI capabilities 

Role

As design lead, I orchestrated a cross-functional team of 12+ members across business, product, engineering, and automation teams.

Established shared decision frameworks that enabled non-technical stakeholders to make informed AI product decisions, resulting in 3x faster alignment on compliance requirements.

Shipped working AI product before the current AI wave, gaining unique experience in how NLPs and pre-scripted bots work and how people use it.

CONTEXT & PROBLEM

Service demand was growing faster than support capacity

Customers needed faster answers outside business hours, while customer service agents were handling rising call and email volume. In insurance, speed alone was not enough. The assistant had to reduce operational load without creating compliance, privacy, or trust risks.

FRAMEWORK

AI for intent.
Structured flows for trust.
Humans for risk.

We made a deliberate architectural decision to keep humans in the loop at consequential decision points because of regulatory requirements.

We used NLP to help interpret how customers described their problems, while scripted flows handled regulated or predictable resolutions. PRUChat is designed as an augmented AI system: deliberately keeping customer service agents and customers in the decision seat, while using AI to compress the time, effort, and cognitive load required to get there.

When confidence was low or the topic was sensitive, the system escalated instead of guessing.

Key Design Decisions

Confidence-based routing

Designed high, medium, and low-confidence states so the assistant could answer, clarify, or escalate.

Compliance-first guardrails

Removed high-risk features like personal data and document attachments when privacy and security risk outweighed customer value.

Cost-aware conversation patterns

Used quick replies, concise prompts, summaries, and expandable details to reduce unnecessary conversation length and improve clarity.

Approach

ic_survey

Customer and Financial Consultant Surveys and Feedback Analytics

ic_benchmark

Competitive Research
and Benchmarking

ic_taskflow

Decision Trees and Chat Flow Generation

ic_usertest

Evaluation and Testing

ic_bottrain

Bot Training, Intent Classification
and NLP Planning

ic_ui

Interaction and Conversation Design

DISCOVERY & PLANNING

Contextualising the Landscape

Defining the MVP started with gathering internal and external data to frame the problem and understand what and who shaped it.

IDEATION & SYNTHESIS

Generating and Synthesizing Ideas to leverage Key Insights

4 cross-functional workshops turned siloed compliance and tech concerns into shared design principles, and buy-in within a month.

CONVERSATION FLOW DESIGN

A Conversation Ecosystem brought to Life

The ideation stage yielded a foundation framework of our conversation ecosystem. In-depth bot and NLP training mixed with systems thinking was required to take the experience to the next level.

NLP VS PRE-SCRIPTED BOTS

Shaping Our Rationale

Customers speak naturally while compliance needs structural data. So we built a hybrid that handled both, navigating a 30% budget cut and saving $800K while keeping regulatory approval intact.

proscon

What are our customers actually asking?

this determines if NLP is worth the complexity

How predictable are the solutions?

structured solutions favor pre-scripted responses

What's the cost of being wrong?

compliance and legal liability require exact question phrasing and response accuracy

How often do conversations go off-script?

complex trouble-shooting and personalized interactions (eg. financial advice)

HOW I WORKED WITH AI

Intent Confidence Handling

When the AI wasn't sure, we didn't guess. A 3-tier confidence system routed users to answers, confirmation prompts, or a human depending on how clearly intent came through.

High confidence (↑75%): Direct answer (lower threshold due to model limitations)
Medium confidence (50-75%): Confirmation required
Low confidence (↓50%): Immediate human escalation (eg. Prudential hotline,  customer service agent or financial consultant)

I audited customer service logs to identify top query types, mapped them into intent clusters, then designed end-to-end conversation flows in Excel covering decision branches, fallbacks, and agent escalation triggers.

intent9

PRECISION VS RECALL

When to Escalate to PRU's Customer Service Team

We asked ourselves:
What's worse, missing something important or including something irrelevant?

Missing something important → Optimize for Recall (catch everything)
Including irrelevant stuff → Optimize for Precision (be selective)

We chose precision over recall: the AI only escalates when 90%+ confident a human is needed. This kept our customer service agents free for cases that actually required them, and made self-service flows more reliable.

SYNTAX MATTERS

Train Keywords then Phrases

Context bridging is critical. Customers want to see their original words reflected in structured flows, but pre-scripted flows for insurance was more appropriate because regulatory compliance requires collecting specific information in exact order.

Adding features to our AI increased conflict potential - the more capabilities we build, the higher the risk of the AI breaking due to overlapping intents. We needed to strategically balance comprehensiveness with accuracy.

Unclear syntax  =  unclear intent  =  confused AI

train7

We pulled the top 20 customer questions from analytics to build our FAQ content matrix, prioritising Claims, PRUShield, COVID, and forms. That became the training baseline, refined continuously. Token-based pricing then shaped what came next.

REDUCING AVERAGE COST

Optimizing Conversation Design via Token-based Interaction Patterns

A token is roughly equivalent to a word or part of a word that the AI processes.

"Hello"  =   1 token  /  750 words  =  1,000 tokens

Token-based pricing, you pay for every token the AI reads (input) AND writes (output).

Token-based interactions refer to how the AI's token processing affects the way users interact with the system - essentially how the "token reality" shapes the user experience. Unlike a normal app that just responds to clicks, AI systems and chat agents must "read" and "write" tokens for every word and interaction, which creates unique UX patterns.

Why This Matters

ic-length

LENGTH AFFECTS COST EXPONENTIALLY

Insurance claim description and analysis gets longer as you go

Longer paragraphs analysis (document length) costs more as the text grows. Limit document length.

ic-output

OUTPUT LENGTH IS EXPENSIVE

The longer the AI's response, the more expensive it gets

Output tokens cost 2-5x more than input tokens. AI generating 500-word explanations for every question may not always be good design

ic-context

CONTEXT MEMORY COSTS MONEY

Every message in a conversation counts as input tokens

Decide how much conversation history to keep vs starting fresh.

img-length2

Feature Optimization while staying Compliant

Attachment Feature Removal
Analytics showed low usage and real risk: uploaded files could expose personal data under PDPA*, and opened the door to malicious content. Easier to cut than to patch.

Character Limits for Token Efficiency
Shorter inputs meant less AI confusion, lower token costs, and better responses. A small constraint with outsized returns.

*Singapore Personal Data Protection Act

Token-Aware Interactions

I designed interactions that work with token processing, not against it. Chunked information became a feature, not a constraint.

DESIGN PATTERNS

Chat Button Suggestions
Low-cost interaction that simplifies information and immediately lets users know available actions to perform.

Expandable Cards and 'Read More'
Expandable cards showing summary first → detail on demand. Less tokens, more control.

chatgifblu

WHAT I WOULD EVOLVE TODAY

How I'd redesign this for the
AI-native era now

Back then, we built a safe chat assistant using IBM Watson. Today, I'd evolve that into an enterprise-grade AI Copilot layer to pair well with Prudential's enterprise suite.

The real shift isn't just updating the tech; it's moving from a basic support tool to a highly governed system.

By pairing generative understanding with deterministic, rules-based logic, Agile and Lean UX frameworks serve as continuous validation loops—orchestrating real-time user testing, mitigating AI hallucinations, and ensuring strict compliance at every consequential branch.

1-2

Ground answers in approved knowledge sources using retrieval-based flows

2

Add source transparency and confidence signals for sensitive answers

3

Summarise conversations before human handoff

4

Track false confidence, escalation accuracy, and unresolved intent patterns

5

Create an AI governance model for what the assistant can answer, ask or escalate

ayla-workflowcomp5

The Shift in Plain Terms

IBM Watson needed thousands of labelled examples to recognise specific phrases. Slightly different wording? It breaks.

Copilot and Claude already understand language, you configure the rules, not the model.

Why it matters for PRUChat

The original PRUChat was capped by Watson's training data and model. A redesign today isn't a UI refresh, it's a fundamentally different capability ceiling. Deterministic where it must be, generative where it helps.

KEY TAKEAWAY

Good AI design knows when not to answer

PRUChat taught me that designing AI products is not about making systems sound intelligent. It is about designing the boundaries around intelligence: what the system can do, what it should not do, and how users are protected when uncertainty appears.

This project shows that I can design AI experiences beyond the interface layer. I can work with ambiguity, technical constraints, compliance risk, business pressure, and imperfect models — then turn that into a usable, measurable, and scalable service experience.

Our results show strong potential for integration across Prudential's digital ecosystem, with opportunities to automate and digitise operational tasks, support our human agents, and immediately provide our customers the answers they need. 

WORK

FIND ME ON

GET IN TOUCH