Failed Jobs Dashboard

/admin/jobs — the FailedJobsView wraps the JobQueuesTab component and lists every BullMQ job that exhausted its retries across the platform. Today this is a list view only — there is no detail panel, replay button, dismiss action, or filter UI.

What ends up here

The platform runs background jobs for:

content-publish — publishing scheduled posts to GBP, Facebook, Instagram.
monitor — checking profile status, syncing reviews, daily metric snapshot.
reports — generating scheduled / on-demand reports.
billing — Stripe webhook processing, invoice generation, subscription state changes.
notifications — outbound email and in-app notification dispatch.
campaigns — series generation, bulk operations, multi-platform syndication.

A job lands here when:

It threw an unhandled exception.
It exhausted its automatic retry count (typically 3 with exponential backoff).
An upstream API returned a persistent error.

The list view

Backed by GET /admin/jobs/failed (list) and GET /admin/jobs/counts (aggregate counts). Each row shows:

Queue name (content-publish, monitor, etc.).
Job type (publish_entry, fetch_reviews, etc.).
Organization — which org's data this job was working on.
Timestamp of failure.
Retry count.
Error class (NetworkError, AuthError, ValidationError, etc.).
Truncated error message.

Use this for at-a-glance health and to spot patterns ("everything is failing in content-publish since 14:00 — what changed?"). For deeper inspection, query MongoDB / Redis directly today.

Common job types and what failure means

Job	Failure usually means
`publish_entry`	GBP API hiccup, expired OAuth, post policy violation.
`fetch_reviews`	OAuth revoked, profile suspended.
`daily_metric_snapshot`	GBP per-metric 403 (metric N/A for profile), or GA4 broken connection.
`send_notification`	Mailgun delivery failed (bad email address, bounce, deliverability block).
`stripe_webhook_process`	Webhook payload changed shape, or the processor has a bug. Always investigate.
`image_generate`	Gemini quota / safety filter rejection.
`text_generate`	Anthropic quota / rate-limit.
`generate_report`	DB query timeout, large dataset, GCS write failure.

Alerting (today)

The platform sends email alerts on permanently-failed jobs in billing and content-publish queues via jobAlerts.ts. The destination is MAILGUN_FROM_EMAIL.

🚧 Coming Soon — vote at /roadmap
Per-job drill-down: full payload, full error stack trace, all prior retry attempts, and related context (org / profile / case links) for any failed job.

🚧 Coming Soon — vote at /roadmap
Replay, Dismiss, and Bulk Replay actions: re-queue a failed job with the same payload (e.g. after fixing an OAuth token or waiting out an upstream outage), or mark it permanently resolved with a reason.

🚧 Coming Soon — vote at /roadmap
Filters by org, queue, error class, and age, plus a "has been retried by admin" toggle for triage workflows.

Next: Onboarding Stalls Triage

Failed Jobs Dashboard ​

What ends up here ​

The list view ​

Common job types and what failure means ​

Alerting (today) ​

Failed Jobs Dashboard

What ends up here

The list view

Common job types and what failure means

Alerting (today)