Skip to content

Failed Jobs Dashboard

/admin/jobs — the FailedJobsView wraps the JobQueuesTab component and lists every BullMQ job that exhausted its retries across the platform. Today this is a list view only — there is no detail panel, replay button, dismiss action, or filter UI.

What ends up here

The platform runs background jobs for:

  • content-publish — publishing scheduled posts to GBP, Facebook, Instagram.
  • monitor — checking profile status, syncing reviews, daily metric snapshot.
  • reports — generating scheduled / on-demand reports.
  • billing — Stripe webhook processing, invoice generation, subscription state changes.
  • notifications — outbound email and in-app notification dispatch.
  • campaigns — series generation, bulk operations, multi-platform syndication.

A job lands here when:

  • It threw an unhandled exception.
  • It exhausted its automatic retry count (typically 3 with exponential backoff).
  • An upstream API returned a persistent error.

The list view

Backed by GET /admin/jobs/failed (list) and GET /admin/jobs/counts (aggregate counts). Each row shows:

  • Queue name (content-publish, monitor, etc.).
  • Job type (publish_entry, fetch_reviews, etc.).
  • Organization — which org's data this job was working on.
  • Timestamp of failure.
  • Retry count.
  • Error class (NetworkError, AuthError, ValidationError, etc.).
  • Truncated error message.

Use this for at-a-glance health and to spot patterns ("everything is failing in content-publish since 14:00 — what changed?"). For deeper inspection, query MongoDB / Redis directly today.

Common job types and what failure means

JobFailure usually means
publish_entryGBP API hiccup, expired OAuth, post policy violation.
fetch_reviewsOAuth revoked, profile suspended.
daily_metric_snapshotGBP per-metric 403 (metric N/A for profile), or GA4 broken connection.
send_notificationMailgun delivery failed (bad email address, bounce, deliverability block).
stripe_webhook_processWebhook payload changed shape, or the processor has a bug. Always investigate.
image_generateGemini quota / safety filter rejection.
text_generateAnthropic quota / rate-limit.
generate_reportDB query timeout, large dataset, GCS write failure.

Alerting (today)

The platform sends email alerts on permanently-failed jobs in billing and content-publish queues via jobAlerts.ts. The destination is MAILGUN_FROM_EMAIL.


🚧 Coming Soon — vote at /roadmap

Per-job drill-down: full payload, full error stack trace, all prior retry attempts, and related context (org / profile / case links) for any failed job.

🚧 Coming Soon — vote at /roadmap

Replay, Dismiss, and Bulk Replay actions: re-queue a failed job with the same payload (e.g. after fixing an OAuth token or waiting out an upstream outage), or mark it permanently resolved with a reason.

🚧 Coming Soon — vote at /roadmap

Filters by org, queue, error class, and age, plus a "has been retried by admin" toggle for triage workflows.


Next: Onboarding Stalls Triage