Dashboard

The Neon dashboard provides real-time visibility into agent execution, evaluation results, and performance trends.

Overview

The dashboard is built with Next.js 15 and React 19, providing:

  • Trace visualization — Hierarchical span trees with timing
  • Evaluation tracking — Real-time progress and results
  • Score analytics — Trends, distributions, and comparisons
  • Component analysis — Cross-component correlation

Pages

Home Dashboard (/)

The main dashboard shows:

  • Recent Traces — Latest agent executions with status
  • Active Runs — In-progress evaluation runs
  • Score Summary — Pass rates and trends
  • Quick Filters — Time range, agent, status
┌─────────────────────────────────────────────────────────────┐
│  Neon Dashboard                                    [Filters]│
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │ Traces      │  │ Pass Rate   │  │ Avg Score   │         │
│  │ 1,234       │  │ 87%         │  │ 0.82        │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
│                                                             │
│  Recent Traces                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ agent-run-001  │ ✓ ok  │ 1.2s  │ 5 spans │ 2m ago   │  │
│  │ agent-run-002  │ ✗ err │ 3.4s  │ 8 spans │ 5m ago   │  │
│  │ agent-run-003  │ ✓ ok  │ 0.8s  │ 3 spans │ 8m ago   │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                             │
│  Active Evaluation Runs                                     │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ core-tests  │ ████████░░ 80%  │ 8/10 cases │ 2m     │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Trace Viewer (/traces/[id])

Detailed view of a single trace execution:

Span Tree — Hierarchical view of all operations

agent-run (1.2s)
├── planning (50ms)
├── generation: llm-call (800ms)
│   └── model: claude-3-5-sonnet
│   └── tokens: 150 → 200
├── tool: web-search (200ms)
│   └── status: success
│   └── results: 5
└── generation: response (150ms)

Timeline — Waterfall visualization

|--planning--|
             |--------llm-call--------|
                                      |--search--|
                                                 |--response--|
0ms         50ms                     850ms     1050ms       1200ms

Span Details — Click any span to see:

  • Input/output content
  • Token counts and costs
  • Timing breakdown
  • Custom attributes
  • Associated scores

Scores Panel — Evaluation scores for this trace

┌────────────────────────────────────────┐
│ Scores                                 │
├────────────────────────────────────────┤
│ tool_selection    0.95  ████████████░ │
│ reasoning         0.82  █████████░░░░ │
│ grounding         0.78  ████████░░░░░ │
│ overall           0.85  █████████░░░░ │
└────────────────────────────────────────┘

Trace Comparison (/traces/diff)

Compare two traces side-by-side:

┌─────────────────────────┬─────────────────────────┐
│ Baseline (v1.2.2)       │ Candidate (v1.2.3)      │
├─────────────────────────┼─────────────────────────┤
│ agent-run (1.5s)        │ agent-run (1.2s) ↓      │
│ ├── planning (80ms)     │ ├── planning (50ms) ↓   │
│ ├── llm-call (900ms)    │ ├── llm-call (800ms) ↓  │
│ ├── search (400ms)      │ ├── search (200ms) ↓    │
│ └── response (120ms)    │ └── response (150ms) ↑  │
├─────────────────────────┼─────────────────────────┤
│ Score: 0.78             │ Score: 0.85 ↑           │
└─────────────────────────┴─────────────────────────┘

Key differences are highlighted:

  • 🟢 Improvements (faster, higher score)
  • 🔴 Regressions (slower, lower score)
  • 🟡 Changed (different structure)

Evaluation Runs (/runs)

List of evaluation runs:

RunSuiteStatusProgressPass RateDuration
run-001core-tests✓ completed10/1090%45s
run-002regression⏳ running5/2080%
run-003edge-cases✗ failed3/1030%12s

Run Detail (/eval-runs/[id])

Detailed view of an evaluation run:

Progress Tracker — Real-time updates

┌──────────────────────────────────────────────────────┐
│ Evaluation Run: core-tests                           │
│ Status: Running                                      │
│                                                      │
│ Progress: ████████████░░░░░░░░ 60%                  │
│ Completed: 6/10 cases                               │
│ Passed: 5  Failed: 1                                │
│                                                      │
│ Elapsed: 2m 30s                                      │
└──────────────────────────────────────────────────────┘

Results Table — Per-case breakdown

CaseStatusScoretool_selectionllm_judgeDuration
weather-query✓ pass0.921.00.841.2s
math-query✓ pass0.881.00.760.8s
complex-query✗ fail0.450.30.602.1s

Score Distribution — Histogram of scores


   8 │       ████
   6 │    ████████
   4 │ ████████████
   2 │ ████████████████
     └─────────────────────
       0.0  0.5  0.8  1.0

Analytics (/analytics)

Score trends and analysis over time.

Score Trends — Time series charts

Score over Time
1.0 │                    ╭───╮
0.8 │     ╭──────────────╯   ╰───
0.6 │ ────╯
0.4 │
    └──────────────────────────────
      Jan    Feb    Mar    Apr

Component Health — Health status by component

ComponentScorePass RateTrendHealth
weather-tool0.9598%🟢 healthy
search-tool0.7285%🟡 warning
calculator0.4560%🔴 critical

Correlation Matrix — Cross-component correlation

              weather  search  calc
  weather      1.00    0.65   0.23
  search       0.65    1.00   0.45
  calc         0.23    0.45   1.00

Human Feedback (/feedback)

Collect human preferences for RLHF training.

Preference Collection

┌─────────────────────────────────────────────────────────┐
│ Which response is better?                               │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌─────────────────────┐  ┌─────────────────────┐      │
│  │ Response A          │  │ Response B          │      │
│  │                     │  │                     │      │
│  │ The weather in      │  │ Tokyo weather:      │      │
│  │ Tokyo is currently  │  │ 72°F, sunny.       │      │
│  │ 72°F with sunny     │  │                     │      │
│  │ skies...            │  │                     │      │
│  │                     │  │                     │      │
│  │      [Select A]     │  │      [Select B]     │      │
│  └─────────────────────┘  └─────────────────────┘      │
│                                                         │
│  [ ] Both are good    [ ] Both are bad    [Skip]       │
│                                                         │
└─────────────────────────────────────────────────────────┘

Components

DashboardFiltersBar

Filter traces by multiple criteria:

<DashboardFiltersBar
  filters={{
    dateRange: { start: Date, end: Date },
    agentId: string,
    status: 'ok' | 'error' | 'all',
    minScore: number,
    tags: string[],
  }}
  onChange={(filters) => { /* update */ }}
/>

TraceTimeline

Hierarchical span visualization:

<TraceTimeline
  trace={traceWithSpans}
  selectedSpanId={spanId}
  onSpanSelect={(span) => { /* show details */ }}
  highlightErrors={true}
  showTiming={true}
/>

SpanDetail

Detailed view of a single span:

<SpanDetail
  span={span}
  scores={spanScores}
  showInput={true}
  showOutput={true}
  showAttributes={true}
/>

EvalRunProgress

Real-time progress tracking:

<EvalRunProgress
  runId={runId}
  onComplete={(result) => { /* handle */ }}
  pollIntervalMs={1000}
/>

ScoreTrends

Time series score charts:

<ScoreTrends
  data={trendData}
  scorers={['tool_selection', 'llm_judge']}
  dateRange={{ start, end }}
  groupBy="day"
/>

CorrelationHeatmap

Cross-component correlation:

<CorrelationHeatmap
  correlations={correlationMatrix}
  components={componentList}
  onCellClick={(pair) => { /* drill down */ }}
/>

TraceSelector

Multi-trace selection for comparison:

<TraceSelector
  traces={traceList}
  selected={selectedIds}
  onSelect={(ids) => { /* update */ }}
  maxSelection={2}
/>

Keyboard Shortcuts

ShortcutAction
j / kNavigate traces up/down
EnterOpen selected trace
EscClose modal / deselect
fFocus filter input
rRefresh data
?Show shortcuts help

Theming

The dashboard supports light and dark themes:

Dark Theme (Default)

  • Deep charcoal backgrounds (#0a0c10)
  • Vibrant neon accents (cyan, magenta, lime)
  • Glow effects on interactive elements

Light Theme

  • Soft blue-white gradients
  • Professional accent colors
  • Clean, high-contrast text

Toggle via the theme button in the header.

API Integration

The dashboard uses tRPC for type-safe API calls:

// frontend/lib/trpc.ts
import { createTRPCProxyClient } from '@trpc/client'

export const trpc = createTRPCProxyClient<AppRouter>({
  links: [
    httpBatchLink({
      url: '/api/trpc',
    }),
  ],
})

// Usage in components
const traces = trpc.traces.list.useQuery({ limit: 50 })
const trace = trpc.traces.get.useQuery({ traceId })

Real-Time Updates

The dashboard uses polling for live updates:

// Poll for run progress
const { data: status } = useQuery({
  queryKey: ['runStatus', runId],
  queryFn: () => getRunStatus(runId),
  refetchInterval: 1000, // Poll every second
  enabled: status !== 'completed',
})

For high-frequency updates, WebSocket support is planned.

Performance

Query Optimization

  • ClickHouse queries are optimized for time-range filters
  • Trace list uses cursor-based pagination
  • Span trees are loaded on demand (not all at once)

Caching

  • React Query provides client-side caching
  • Stale data shown while refetching
  • Cache invalidation on mutations

Lazy Loading

  • Large trace outputs loaded on demand
  • Score details fetched when expanded
  • Images and charts use intersection observer