← Items Hub/ Docs

Consuming Items Hub

docs/CONSUMING_ITEMS_HUB.md

Consuming Items Hub

Who this is for. You (or a teammate, or an LLM agent, or another app like Brickston) want to read from the Items Hub data lake once sources have landed rows in the inbox and the extractor has normalized them. This page shows the three supported access patterns, with working examples.

The short version. Three ways in, ranked by coupling:

  1. Direct SQL to Neon — tightest, lowest-latency, full expressive power. For services you own end-to-end (Brickston, internal scripts).
  2. REST /api/* on the Items Hub Vercel deployment — safest for external or LLM consumers. Serverless, rate-limited by Vercel, no secrets sprayed around.
  3. Brickston's proxy router (/api/items-hub/* on the Brickston FastAPI) — only relevant if you're already inside Brickston's auth/RBAC surface. This is how Brickston surfaces Items Hub context on property / entity / deal pages.

All three read the same Neon schema. There is one source of truth (items.* in Neon); these are just three ways to get at it.


Pattern 1 — Direct SQL to Neon

When to use: internal services that already have Neon creds (Brickston backend, admin scripts, scheduled-task runners). Shortest path, full SQL expressivity.

Credentials. DATABASE_URL for runtime pooled traffic, DATABASE_URL_UNPOOLED for migrations / long txns. Read-only? Make a dedicated Neon role — the project's pooled connection is owner-level and should not leave trusted boundaries.

# Mac / Linux quick one-liner
URL="postgresql://…"  # from .env.local or Keychain
psql "$URL" -c "SELECT display_name, tenant, status FROM items.source_config ORDER BY sort_order;"

The tables (stable contract)

Useful query cookbook

-- All open tasks for a tenant, prioritized the way the dashboard does
SELECT t.title, t.due_date, t.priority_score, e.name AS entity
FROM items.tasks t
LEFT JOIN items.entities e ON e.id = t.entity_id
WHERE t.status = 'open' AND t.tenant = 'lfi'
ORDER BY
  CASE WHEN t.due_date < NOW() THEN 0 ELSE 1 END,
  t.due_date ASC NULLS LAST,
  COALESCE(t.priority_score, 0) DESC;

-- Everything that mentions an entity (name match, tenant-scoped)
WITH target AS (
  SELECT id FROM items.entities
  WHERE tenant = 'mosser' AND name ILIKE '%Ji Won%' LIMIT 1
)
SELECT 'task'     AS kind, t.title AS body, t.created_at FROM items.tasks t
  WHERE t.entity_id = (SELECT id FROM target)
UNION ALL
SELECT 'decision' AS kind, d.title, d.created_at FROM items.decisions d
  WHERE (SELECT id FROM target) = ANY(d.entity_ids)
UNION ALL
SELECT 'document' AS kind, dc.title, dc.extracted_at FROM items.documents dc
  WHERE (SELECT id FROM target) = ANY(dc.entity_ids)
ORDER BY 3 DESC LIMIT 100;

-- Semantic nearest-neighbor over everything (pgvector)
-- Requires: the query vector as text like '[0.123,-0.456,...]' (1536 floats)
SELECT source_table, source_id, text,
       1 - (embedding <=> $1::vector) AS similarity
FROM items.embeddings
WHERE tenant IN ('lfi','mosser')
ORDER BY embedding <=> $1::vector
LIMIT 20;

-- Extractor queue depth per source
SELECT c.display_name, c.status,
       COUNT(*) FILTER (WHERE ii.processed_at IS NULL) AS queue,
       COUNT(*) FILTER (WHERE ii.error IS NOT NULL)     AS errors
FROM items.source_config c
LEFT JOIN items.inbox_items ii ON ii.source = c.source
GROUP BY c.display_name, c.status
ORDER BY queue DESC;

Pooled vs unpooled


Pattern 2 — REST /api/* on Items Hub

When to use: any consumer that shouldn't have Neon creds — LLM agents, third-party dashboards, mobile clients, curl from a workstation. These endpoints are the stable public contract.

Base URL. https://lfiqdataflow-git-main-lfiq.vercel.app

Auth. Read endpoints (search, sources, overall) are public for now. Write endpoints (POST /api/ingest/*) require x-ingest-secret: $INGEST_SECRET. If you expose this deployment externally, gate reads with Vercel Auth or an Edge-middleware check.

Available endpoints

MethodPathPurpose
GET/api/healthLiveness + DB ping.
GET/api/overallAggregate counters the home header uses (today / 7d / queue / open tasks / …).
GET/api/sourcesOne row per source in source_config, with recent counts + health shape the dashboard grid consumes.
GET/api/search?q=&tenant=&table=&since_days=&limit=Semantic search. Returns source_db/source_table/source_id/text/similarity hits; caller hydrates from the owning table.
POST/api/search with { query, tenant?, table?, since_days?, limit?, entity_id? }Same as GET but body-based, used by Brickston's proxy.
POST/api/extract with { limit? }Manually drain the inbox queue (Mac-local scheduled task uses this; external callers normally don't).
GET/api/priorities/digest?format=html|json&tenant=?Daily priorities digest. html returns email-safe inline-styled HTML (fetch from PKM-AI-Workflow → forward to Outlook); json returns {tasks, commitments, counts}. Public read, no auth.
POST/api/ingest/<source>Write rows into items.inbox_items. Secret required. This is how data gets in, not out.

Example — semantic search from a shell

curl -sS 'https://lfiqdataflow-git-main-lfiq.vercel.app/api/search?q=Ji+Won+follow+up+on+870+Oak&tenant=lfi&limit=5' \
  | jq '.results[] | { similarity, source_table, occurred_at, text }'

Example — semantic search from Python

import httpx, json

r = httpx.post(
    "https://lfiqdataflow-git-main-lfiq.vercel.app/api/search",
    json={
        "query": "870 Oak loan draw schedule",
        "tenant": "lfi",
        "since_days": 180,
        "limit": 10,
    },
    timeout=30,
)
for hit in r.json()["results"]:
    print(f"{hit['similarity']:.3f}  {hit['source_table']}  {hit['text'][:120]}")

Example — embed an Items Hub source card in another app

const res = await fetch("https://lfiqdataflow-git-main-lfiq.vercel.app/api/sources");
const { rows } = await res.json();
const granola = rows.find((r: any) => r.source === "granola");
// { displayName, lastSuccessAt, last7dRecords, backlog, ... } — render directly

Pattern 3 — Brickston proxy (/api/items-hub/*)

When to use: you're building inside Brickston and want to surface Items Hub context on existing pages. The proxy wraps direct Neon reads + the Items Hub search endpoint so Brickston can enforce its own RBAC and session auth before letting a user see any rows.

Where it lives. Brickston backend at 02-brickston2.ai/backend/app/routers/items_hub.py. Requires two env vars to be set on the Brickston backend:

BRICKSTON_ITEMS_HUB_DATABASE_URL=postgresql://…   # same Neon URL as Items Hub
BRICKSTON_ITEMS_HUB_INGEST_SECRET=…               # to proxy writes if ever needed

Endpoints (on the Brickston backend)

MethodPathPurpose
GET/api/items-hub/healthConfirms the second asyncpg pool is alive.
GET/api/items-hub/entity/{name}/activity?since_days=90&limit=50Fuzzy-match entity by name, return open tasks + decisions + recent inbox mentions.
GET/api/items-hub/property/{code}/activity?since_days=180&limit=50Same shape, keyed on property code / address.
GET/api/items-hub/tasks/open?tenant=&limit=100Open-task roll-up.
POST/api/items-hub/searchProxies Items Hub /api/search.
POST/api/items-hub/events/marketProxies writes to /api/ingest/market with the ingest secret injected server-side.

How Brickston UI consumes it

apps/web/lib/api/hooks/use-items-hub.ts exposes TanStack-Query hooks:

import { useEntityActivity, usePropertyActivity, useItemsHubSearch } from "@/lib/api/hooks/use-items-hub";

// On a property page:
<ItemsHubActivity propertyCode={property.name} sinceDays={180} />

The ItemsHubActivity component (apps/web/components/domain/items-hub-activity.tsx) renders a two-column "Activity" tab with open tasks + decisions + recent mentions. Graceful fallback: if the backend is missing BRICKSTON_ITEMS_HUB_DATABASE_URL, the tab shows a "not configured" hint instead of 500-ing.


Writing into Items Hub (for completeness)

Any new source follows the convergent pattern:

  1. Trigger runs (Anthropic cloud / Windows Claude Desktop / Mac login-item).
  2. Trigger POSTs JSON to one of /api/ingest/<source> with x-ingest-secret.
  3. Row lands in items.inbox_items with a source-specific idempotency key.
  4. Mac-local scheduled task items-hub-extract drains the queue every 10 min.
  5. Normalized rows land in items.tasks / commitments / decisions / entities / documents + a row in items.embeddings.
  6. Dashboard + consumers see the new data on the next query.

New source checklist (for future expansion):

  1. Add a row to app/lib/sources.ts with status: 'designed' and the next free sort_order.
  2. Run npm run bootstrap:sources (upserts into items.source_config).
  3. Add a new /api/ingest/<source>/route.ts or reuse /api/ingest/market if the payload is a summary-style blob.
  4. Add a payload kind clause to the extractor's prompt dispatch in app/lib/extractor.ts.
  5. Register a trigger (cloud scheduled task, Windows Claude Desktop, or LaunchAgent) that posts.
  6. First successful POST auto-promotes the card to live.
  7. Update docs/DATA_DICTIONARY.md with the new source's row + payload contract.

When things look broken


See also