June 2023 – Present
Washington, DC
Software Engineer II, AI & Full Stack · American Express
- Designed and shipped an LLM-powered transaction categorization system (OpenAI API + embedding-based retrieval for few-shot context) that replaced a legacy rules-based pipeline, improving classification accuracy from 78% → 94% on a held-out production dataset of ~50K labeled transactions.
- Built a RAG pipeline over internal transaction metadata using LLaMA embeddings + ElasticSearch as the vector/keyword hybrid store, served via Spring Boot; reduced p95 query latency ~40% through Redis caching and re-ranking optimization.
- Established LLM evaluation framework with precision/recall tracking, prompt regression tests, and A/B comparisons across model versions; used telemetry to iteratively tune prompts and retrieval parameters based on measurable impact.
- Optimized LLM inference tradeoffs (model choice, token budget, batching, caching) to balance latency vs. accuracy, cutting per-request cost while maintaining classification precision above SLA.
- Led design reviews and code reviews across a 3-engineer team, driving production rollout decisions, architecture proposals, and safe deployment patterns for AI features in a regulated financial environment.
- Partnered cross-functionally with Product and Analytics to translate business requirements into AI features; resolved ingestion/search inconsistencies with idempotent pipelines and validation checks.