AI / LLM Development
AI / LLM product development — multi-LLM, RAG, agents.
We route Claude, Gemini, and OpenAI inside a single codebase, run pgvector- and Upstash-backed RAG, and ship Function-Calling agents to production. Because we designed our own voice-analysis engine (SpeechMap), we judge fast: "this slot only needs an LLM API, this one needs forced alignment, this one is better off without AI at all."
Technical problems we have actually solved in this category
01
Multi-LLM routing in one codebase
Inside one workflow Claude drafts emails, Gemini analyzes images and video, OpenAI handles embeddings and structured responses — each model plays to its strength.
Marea Holdings · in-house history-book · band-ai
02
pgvector RAG (3072-dim, 10 Function-Calling tools)
Supabase + pgvector for the index, plus a 10-tool agent. Retrieval and reasoning caching are split to keep tokens and latency in check.
lms-ct (B2B corporate-training LMS)
03
Korean-language chunking
LLM tokenizers slice Korean inefficiently. We apply different chunking strategies per domain: word-based, sentence-based, or sliding window.
lms-ct · jahwalcoop self-help cooperative RAG
04
Anthropic Computer Use applied to Korean B2B workflow
We applied Computer Use to a mass professor-outreach workflow at KAIST — xlsx parsing → personalized email drafts in parallel → Resend bulk dispatch.
Marea Holdings automation platform
05
Cutting LLM cost to 1/10
Prompt caching, batch APIs, model mix (Haiku for summarization, Sonnet/Opus for precision reasoning), and result caching — four patterns applied together.
veltis-ai-studio · automation
06
AI Agent Function-Calling reliability
When an agent has 10+ tools, models start calling the wrong tool or skipping arguments. We catch this pattern and fix it with tool grouping and previous-turn summarization.
lms-ct · band-ai
Recommended stack
- Anthropic Claude SDK
- Google Gemini
- OpenAI
- pgvector
- Upstash Redis
- Supabase
- Next.js 16
- Cloud Run
Frequently asked questions
- Should I start with Claude API or Gemini API?
- Claude Sonnet/Opus has stronger Korean reasoning depth; Gemini wins on multimodal (image/video) and cost efficiency. Text-heavy B2B workflows: start with Claude. Image/video-heavy: route to Gemini first. Mixing both inside one codebase is the standard pattern, and Songstark runs that routing code in our own products today.
- pgvector vs Pinecone for RAG?
- If you already use Postgres (Supabase), pgvector is 95% the right answer — no separate infra, RLS-integrated permissions, near-zero cost (storage only). Pinecone makes sense at 100M+ vectors or when you need sub-100ms precision. lms-ct runs pgvector at 3072 dimensions in production.
- How is AI agent development priced?
- Under 5 Function-Calling tools + a single LLM = 4–6 weeks. 10+ tools + multi-LLM routing + memory management = 8–12 weeks. Run-rate cost is typically $200–$2000/month (LLM tokens + Supabase + Vercel/Cloud Run). Send the tool list and expected call volume — we return a precise quote within one week.
- Can we run on-prem LLMs?
- Yes — Llama 3.3 / Qwen 2.5 / DeepSeek-V3-class open models served via vLLM or Text Generation Inference. Be aware of GPU server cost ($1500+/month) and the model-quality gap (15–30% behind GPT-4o / Claude Sonnet). For most B2B workflows, cloud APIs with data masking + Anthropic's Zero-Data-Retention option give better ROI.
- How does Korean LLM performance compare to English?
- As of April 2026, Claude Sonnet 4.6's Korean reasoning is roughly 90–95% of its English performance, and Gemini 2.5 Pro is similar. Korean tokens cost 1.5–2× more (same meaning takes more tokens), so always factor that into cost and latency budgets.
- How is data protected when we engage you?
- Standard NDA + isolated environment (separate Supabase project) + Zero-Data-Retention LLM mode (supported by both Anthropic and OpenAI). For medical or clinical data, we add multi-tenant RLS and an anonymization workflow. The clinical SaaS page covers that in detail.
Related engineering notes
Articles for this cluster are publishing soon.
Let's figure out how to wire AI into your product, together.
No sales PM in between. The CEO or a core engineer replies directly within one business day.
