Question 1

Should I start with Claude API or Gemini API?

Accepted Answer

Claude Sonnet/Opus has stronger Korean reasoning depth; Gemini wins on multimodal (image/video) and cost efficiency. Text-heavy B2B workflows: start with Claude. Image/video-heavy: route to Gemini first. Mixing both inside one codebase is the standard pattern, and Songstark runs that routing code in our own products today.

Question 2

pgvector vs Pinecone for RAG?

Accepted Answer

If you already use Postgres (Supabase), pgvector is 95% the right answer — no separate infra, RLS-integrated permissions, near-zero cost (storage only). Pinecone makes sense at 100M+ vectors or when you need sub-100ms precision. lms-ct runs pgvector at 3072 dimensions in production.

Question 3

How is AI agent development priced?

Accepted Answer

Under 5 Function-Calling tools + a single LLM = 4–6 weeks. 10+ tools + multi-LLM routing + memory management = 8–12 weeks. Run-rate cost is typically $200–$2000/month (LLM tokens + Supabase + Vercel/Cloud Run). Send the tool list and expected call volume — we return a precise quote within one week.

Question 4

Can we run on-prem LLMs?

Accepted Answer

Yes — Llama 3.3 / Qwen 2.5 / DeepSeek-V3-class open models served via vLLM or Text Generation Inference. Be aware of GPU server cost ($1500+/month) and the model-quality gap (15–30% behind GPT-4o / Claude Sonnet). For most B2B workflows, cloud APIs with data masking + Anthropic's Zero-Data-Retention option give better ROI.

Question 5

How does Korean LLM performance compare to English?

Accepted Answer

As of April 2026, Claude Sonnet 4.6's Korean reasoning is roughly 90–95% of its English performance, and Gemini 2.5 Pro is similar. Korean tokens cost 1.5–2× more (same meaning takes more tokens), so always factor that into cost and latency budgets.

Question 6

How is data protected when we engage you?

Accepted Answer

Standard NDA + isolated environment (separate Supabase project) + Zero-Data-Retention LLM mode (supported by both Anthropic and OpenAI). For medical or clinical data, we add multi-tenant RLS and an anonymization workflow. The clinical SaaS page covers that in detail.

AI / LLM product development — multi-LLM, RAG, agents.

Technical problems we have actually solved in this category

Multi-LLM routing in one codebase

pgvector RAG (3072-dim, 10 Function-Calling tools)

Korean-language chunking

Anthropic Computer Use applied to Korean B2B workflow

Cutting LLM cost to 1/10

AI Agent Function-Calling reliability

Recommended stack

Frequently asked questions

Related engineering notes

Let's figure out how to wire AI into your product, together.