Loading...
Loading...
NLP pipeline that processes 200-2,000 sales transcripts per engagement. Semantic embeddings, structured extraction via GPT-4o-mini, and corpus-wide pattern analysis. Identified $361K-$531K in revenue opportunities from one 427-transcript engagement.

Scriptal Analysis is a done-for-you service where I take a client's entire library of sales call transcripts and extract systematic intelligence. A typical engagement is 200-2,000 transcripts representing hundreds of thousands to millions of words of conversations.
The engineering challenge is turning unstructured conversation text into structured, queryable data. Each transcript is split into 800-word chunks with 100-word overlap to preserve context across boundaries. Every chunk gets embedded using OpenAI's text-embedding-3-small model (1536 dimensions) and stored as native PostgreSQL vectors with pgvector. This creates a semantic index across the entire corpus — millions of words searchable by meaning, not just keywords.
The extraction pipeline runs each transcript through GPT-4o-mini with a 500+ line structured prompt that forces JSON output. The model classifies meeting type (sales, support, internal, interview, customer success) and extracts type-specific insights. For sales calls: objections categorized by type (pricing, features, timing, competition, authority, trust, risk, contract) with resolution quality scored. Buying signals scored by strength (strong/medium/weak). Competitor mentions tracked with sentiment. Pain points classified by severity and whether they were addressed. Call quality metrics including discovery questions asked, talk ratios, and longest monologues.
All of this lands in 25+ structured database tables — not a PDF, not a spreadsheet, but normalized relational data that I can query across the entire corpus. "Show me every pricing objection where the rep successfully resolved it" is a database query, not a manual search. "Find all calls where a specific competitor was mentioned negatively" returns results in seconds.
The deliverables synthesize these patterns into actionable intelligence. The Executive Summary identifies the biggest opportunities and quick wins. ICP Demographics profiles who actually bought and why, based on patterns across hundreds of real conversations rather than assumptions. The Objection Database documents every objection raised with the specific language reps used to handle them successfully — organized by category and resolution quality so teams know which responses actually work.
Buying Signals catalogs the exact phrases and situations that indicate purchase readiness. The Sales Playbook synthesizes what's working into scripts and talk tracks. Competitor Intelligence extracts everything prospects mentioned about alternatives — positioning, pricing, strengths, weaknesses — organized by competitor with sentiment tracking.
For one engagement, I analyzed 427 transcripts and identified $361K-$531K in immediate revenue opportunities from leads that had gone cold but showed strong buying signals. The structured data made it possible to find patterns that are invisible when reviewing calls one at a time — like discovering that deals mentioning a specific competitor had a 40% lower close rate, or that prospects who asked about implementation timelines were 3x more likely to buy.
A client had 427 sales call transcripts — over 2 million words of conversations — sitting in their meeting recording tool. They knew the data contained patterns about why deals closed or stalled, which objections killed deals, and what their best reps did differently. But there was no way to extract that intelligence systematically. Manual review of 427 calls would take months. Keyword search misses conversational nuance. The insights were locked in unstructured text.
Built an NLP pipeline that chunks transcripts into overlapping segments, embeds them as 1536-dimension vectors with pgvector for semantic search, and runs structured extraction through GPT-4o-mini to classify every objection, buying signal, competitor mention, and pain point into normalized database tables. The structured output enables corpus-wide queries that return results in seconds instead of days of manual review.
Analyzed 427 transcripts (2M+ words) and identified $361K-$531K in immediate revenue opportunities from leads showing strong buying signals that had gone cold. Extracted 17,600+ structured data points across objections, signals, competitors, pain points, and call quality metrics. Surfaced patterns invisible to manual review — like specific competitor mentions correlating with 40% lower close rates.
Transcripts ingested into Supabase PostgreSQL, chunked into overlapping segments, and embedded as native pgvector columns (1536 dimensions, HNSW-indexed). Structured extraction runs through GPT-4o-mini with forced JSON output, classifying meeting types and extracting insights into 25+ normalized tables. Semantic queries combine cosine similarity search with full-text matching for hybrid retrieval. Analysis jobs distributed across self-spawning AWS Lambda workers for parallel processing at scale.

Check out some of my other work