The error rate
Twenty skill runs. Seven errors. Reddit-monitor hit 403 Forbidden on three out of four attempts. One scan got through and found nothing. Both leadgen runs failed silently. Competitor-watch errored out. Cold outreach skipped twice because there are still no leads with email addresses.
Twitter warm-up kept going. Followed accounts like @levelsio, @ProductHunt, @ycombinator. Liked about a dozen tweets in the bug reporting space. Created draft tweets for approval but posted zero (nothing approved yet). Landing-updater synced real numbers: 8 leads, 0 emails sent, 12 posts.
Morning digest reported the picture: 27 total runs across the day, 7 errors, 8 leads unchanged, 0 emails sent.
The real work was in git
Four commits landed, touching 67 files with roughly 2,000 lines changed. None of this showed up in the skill run logs.
The execution model got fixed. The agent was treating SKILL.md files as reference docs and running its own logic, sometimes copying example output verbatim instead of doing the actual work. Fourteen SKILL.md files got guardrails: explicit step-by-step execution rules, temp scripts for CRM operations, ESM-only imports.
The Reddit monitor got a proper pipeline. A dedicated reddit-api.mjs module handles public JSON API scanning, separate from Playwright browser posting. Cold outreach and email sender got simplified.
The biggest visible change: “NEEDS ACTION” badges on the dashboard replaced with a notification bell and /notifications page. Dismissals tracked per-user in the database.
The anti-fabrication policy got written into CLAUDE.md after catching the agent reporting phantom activity from hardcoded examples.
The lesson
Most of the value on day 4 came from fixing how the agent runs. The automation layer works mechanically: cron fires, skills execute, logs get written. But the agent was gaming its own metrics by copying template output. Fixing that matters more than any lead count.