WhatsApp us

Scan with your phone to open WhatsApp

Message on desktop
Groundwork

Wired to. Think.

Ship AI-powered features that actually work in production. LLM pipelines, RAG systems, and smart automation wired into your Rails app.

LLM Integration RAG Pipelines OpenAI / Anthropic Vector Search

Most AI features fail in production because they were prototyped in a notebook and never engineered.

LLM Integration
Wire GPT-4, Claude, Gemini into real product flows. Streaming responses, token budgeting, retry with fallback providers. Built as proper services with tests, not inline API calls.
OpenAI / Anthropic Streaming Fallback providers
RAG Pipelines
Retrieval-augmented generation with your data. Ingest documents, chunk intelligently, embed and store, then retrieve only what's relevant at query time. Answers grounded in fact, not hallucination.
Document ingestion Chunking strategy Grounded responses
Vector Search
Semantic similarity search using pgvector in your existing PostgreSQL database. No new infrastructure, no third-party vector DB bill. Embeddings stored and queried where your data already lives.
pgvector Semantic search PostgreSQL-native
AI Agents & Tool Calling
Multi-step reasoning agents that can call your app's own functions. Search the database, send an email, update a record, then report back. Controllable, observable, with human-in-the-loop checkpoints.
Function calling Tool use Human-in-the-loop
Prompt Engineering
Production prompts are software, not chat messages. We version them, test them against evals, and iterate based on real failure cases. No prompt spaghetti six months in.
Prompt versioning Evals System prompt design
Cost & Latency Control
Token usage, latency, and cost per call tracked from day one. Caching repeated queries, batching where possible, routing cheap queries to cheaper models. AI that doesn't burn the budget.
Token budgeting Response caching Model routing

The tools we actually use in production, not the ones that demo well.

  • OpenAI / Anthropic
    gpt-4o  ·  stream: true ● thinking
    Streaming responses with real-time token counting. Built as a proper service layer with retries, fallback providers, and structured output parsing.
  • pgvector / Embeddings
    # app/models/document.rb
    has_neighbors :embedding
    # Semantic search at query time
    Document.nearest_neighbors(
    :embedding,
    query_vec,
    distance: "cosine"
    ).limit(8)
    Embeddings stored in the same PostgreSQL database your app already uses. No new infrastructure, no extra operational cost.
  • RAG Pipeline
    📄
    Ingest
    ✂️
    Chunk
    📐
    Embed
    🔍
    Retrieve
    💬
    Generate
    Each stage is a testable unit. Chunking strategy, embedding model, retrieval k, and prompt template tuned independently against an eval set.
  • Rails + AI Services
    # app/services/ai/summarizer.rb
    class AI::Summarizer
    def call(text, model: :claude)
    client = provider_for(model)
    client.complete(prompt(text))
    end
    end
    Provider-agnostic service layer. Swap models without touching callers. Tested with fixtures, not live API calls.
  • Evals & Cost Monitoring
    AI PIPELINE METRICS
    Avg latency
    840ms
    Cost / call
    $0.004
    Eval pass rate
    94%
    You know your token cost before the month-end bill. Eval pass rates tracked against a fixed test set — regression alerts when accuracy drops.
01
What is the scope of your return policy?
context policy.pdf · orders_table · faq_docs
Define

We start with the problem, not the model. What question needs answering? What data exists? What does a good answer look like? Context boundaries set here prevent hallucinations later.

02
returns_within_30dPASS
international_shippingPASS
gift_card_refundFAIL
Evaluate

We build an eval set from real user questions before writing the first prompt. Every iteration is measured. When the score stops going up, we stop, and tell you what it can and can't do.

03
Deploying AI pipeline v2.4 →
Kamal deploy     Eval gate passed  
Ship

AI features deploy through the same Kamal pipeline as everything else. No model deployments detached from the app. Eval gate in CI — if accuracy drops, the deploy blocks.

04
day 1
$9.40
day 7
$5.90
day 14
$2.10
Optimise

Cache hit rates, cost per user, token savings from smarter chunking. We watch the numbers weekly and come back with a second pass once real traffic patterns emerge.

Still copying from ChatGPT?

Tell us where AI would save your users time. We'll tell you honestly what's worth building and what's a distraction.