Groundwork

Wired to. Think.

Ship AI-powered features that actually work in production. LLM pipelines, RAG systems, and smart automation wired into your Rails app.

LLM Integration RAG Pipelines OpenAI / Anthropic Vector Search

Most AI features fail in production because they were prototyped in a notebook and never engineered.

LLM Integration

Wire GPT-4, Claude, Gemini into real product flows. Streaming responses, token budgeting, retry with fallback providers. Built as proper services with tests, not inline API calls.

OpenAI / Anthropic Streaming Fallback providers

RAG Pipelines

Retrieval-augmented generation with your data. Ingest documents, chunk intelligently, embed and store, then retrieve only what's relevant at query time. Answers grounded in fact, not hallucination.

Document ingestion Chunking strategy Grounded responses

Vector Search

Semantic similarity search using pgvector in your existing PostgreSQL database. No new infrastructure, no third-party vector DB bill. Embeddings stored and queried where your data already lives.

pgvector Semantic search PostgreSQL-native

AI Agents & Tool Calling

Multi-step reasoning agents that can call your app's own functions. Search the database, send an email, update a record, then report back. Controllable, observable, with human-in-the-loop checkpoints.

Function calling Tool use Human-in-the-loop

Prompt Engineering

Production prompts are software, not chat messages. We version them, test them against evals, and iterate based on real failure cases. No prompt spaghetti six months in.

Prompt versioning Evals System prompt design

Cost & Latency Control

Token usage, latency, and cost per call tracked from day one. Caching repeated queries, batching where possible, routing cheap queries to cheaper models. AI that doesn't burn the budget.

Token budgeting Response caching Model routing

The tools we actually use in production, not the ones that demo well.

OpenAI / Anthropic

gpt-4o · stream: true ● thinking

Tokens: 0 Latency: —

Streaming responses with real-time token counting. Built as a proper service layer with retries, fallback providers, and structured output parsing.
pgvector / Embeddings

# app/models/document.rb

has_neighbors :embedding

# Semantic search at query time

Document.nearest_neighbors(

:embedding,

query_vec,

distance: "cosine"

).limit(8)

Embeddings stored in the same PostgreSQL database your app already uses. No new infrastructure, no extra operational cost.
RAG Pipeline

📄

Ingest

→

✂️

Chunk

→

📐

Embed

→

🔍

Retrieve

→

💬

Generate

Each stage is a testable unit. Chunking strategy, embedding model, retrieval k, and prompt template tuned independently against an eval set.
Rails + AI Services

# app/services/ai/summarizer.rb

class AI::Summarizer

def call(text, model: :claude)

client = provider_for(model)

client.complete(prompt(text))

end

end

Provider-agnostic service layer. Swap models without touching callers. Tested with fixtures, not live API calls.
Evals & Cost Monitoring

AI PIPELINE METRICS

Avg latency
840ms

Cost / call
$0.004

Eval pass rate
94%

Cache hit rate: 62% · saved $1.24 today

You know your token cost before the month-end bill. Eval pass rates tracked against a fixed test set — regression alerts when accuracy drops.

What is the scope of your return policy?

context policy.pdf · orders_table · faq_docs

Define

We start with the problem, not the model. What question needs answering? What data exists? What does a good answer look like? Context boundaries set here prevent hallucinations later.

returns_within_30dPASS

international_shippingPASS

gift_card_refundFAIL

Evaluate

We build an eval set from real user questions before writing the first prompt. Every iteration is measured. When the score stops going up, we stop, and tell you what it can and can't do.

Deploying AI pipeline v2.4 →

Kamal deploy ✓ Eval gate passed ✓

Ship

AI features deploy through the same Kamal pipeline as everything else. No model deployments detached from the app. Eval gate in CI — if accuracy drops, the deploy blocks.

day 1

$9.40

day 7

$5.90

day 14

$2.10

Optimise

Cache hit rates, cost per user, token savings from smarter chunking. We watch the numbers weekly and come back with a second pass once real traffic patterns emerge.

Still copying from ChatGPT?

Tell us where AI would save your users time. We'll tell you honestly what's worth building and what's a distraction.

hey@3rdrock.dev ↗