Commit Graph

204 Commits

Author SHA1 Message Date
Peter Steinberger
8a32936855 refactor(test): dedupe cron isolated-agent e2e setup 2026-02-15 00:26:46 +00:00
Peter Steinberger
07fbf46091 fix(test): avoid vitest mock type inference issues 2026-02-15 01:06:02 +01:00
Peter Steinberger
20dea3cdb1 perf(cron): make wakeMode now busy-wait configurable 2026-02-14 23:51:47 +00:00
Peter Steinberger
c000847dc0 fix(test): remove unused cron import 2026-02-14 23:51:41 +00:00
Peter Steinberger
a6cd7ef49c refactor(test): share cron service fixtures 2026-02-14 23:51:41 +00:00
Peter Steinberger
9b9dc65a22 fix(test): remove unused cron imports 2026-02-14 22:54:37 +00:00
Peter Steinberger
9a26a735e4 refactor(test): share cron isolated agent fixtures 2026-02-14 22:54:37 +00:00
Peter Steinberger
6b400eca5c refactor(cron): share job tick state normalization 2026-02-14 21:44:30 +00:00
Peter Steinberger
775a6c6620 refactor(test): reuse isolated agent turn helpers 2026-02-14 21:28:10 +00:00
Peter Steinberger
badde6e29f perf(test): speed up cron schedule suite 2026-02-14 21:20:15 +00:00
Peter Steinberger
50900721c3 perf(test): speed up cron one-shot suite 2026-02-14 21:20:15 +00:00
Peter Steinberger
5c6318b583 test(cron): assert cron run session ids 2026-02-14 22:01:54 +01:00
Peter Steinberger
e1220c48f5 perf(test): skip skills snapshot work in fast env 2026-02-14 20:12:27 +00:00
Peter Steinberger
387fb40745 perf(test): skip heavy boot paths in reply suites 2026-02-14 20:12:26 +00:00
Peter Steinberger
82576aa684 test(cron): deflake read ops while job is running 2026-02-14 21:04:27 +01:00
zerone0x
c60844931b fix(cron): prevent list/status from silently skipping recurring jobs (openclaw#16201) thanks @zerone0x
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test:macmini

Co-authored-by: zerone0x <39543393+zerone0x@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-14 13:33:29 -06:00
Shadow
a73ccf2b53 fix: deliver cron output to explicit targets (#16360) (thanks @rubyrunsstuff) 2026-02-14 12:43:11 -06:00
青雲
80407cbc6a fix: recompute all cron next-run times after job update (openclaw#15905) thanks @echoVic
Verified:
- pnpm check
- pnpm vitest src/cron/service.issue-regressions.test.ts src/cron/service.issue-13992-regression.test.ts

Co-authored-by: echoVic <16428813+echoVic@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-14 12:37:22 -06:00
Peter Steinberger
240cdd3749 perf(test): speed up cron read ops test 2026-02-14 17:56:39 +00:00
Peter Steinberger
50645b905b refactor(outbound): centralize outbound identity 2026-02-14 16:44:43 +01:00
Robby
09e1cbc35d fix(cron): pass agent identity through delivery path (#16218) (#16242)
* fix(cron): pass agent identity through delivery path

Cron delivery messages now include agent identity (name, avatar) in
outbound messages. Identity fields are passed best-effort for Slack
(graceful fallback if chat:write.customize scope is missing).

Fixes #16218

* fix: fix Slack cron delivery identity (#16242) (thanks @robbyczgw-cla)

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
2026-02-14 16:08:51 +01:00
Peter Steinberger
4335668d28 chore(test): fix cron every-jobs-fire unused import 2026-02-14 04:57:28 +00:00
Peter Steinberger
e6d5b5fb11 perf(test): remove slow port inspection and reconnect sleeps 2026-02-14 04:57:28 +00:00
Peter Steinberger
2d4d32cb2d test(cron): await persistence before temp cleanup 2026-02-14 03:18:27 +00:00
Peter Steinberger
9126930363 test(cron): remove flaky real-timer polling 2026-02-14 03:00:06 +00:00
Peter Steinberger
0b8227fa92 perf(test): trim redundant suites and tighten wait loops 2026-02-14 02:02:03 +00:00
Peter Steinberger
53055aeafe perf(test): consolidate cron and canvas regression setups 2026-02-14 01:42:47 +00:00
Peter Steinberger
445b4facd7 perf(test): collapse isolated cron heartbeat delivery cases 2026-02-14 01:26:12 +00:00
Artale
0942ecb54f fix(cron): use job config for cleanup instead of hardcoded "keep" (openclaw#15427) thanks @arosstale
Verified:
- pnpm install --frozen-lockfile
- pnpm build
- pnpm check
- pnpm test

Co-authored-by: arosstale <117890364+arosstale@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-13 19:10:17 -06:00
Peter Steinberger
dac8f5ba3f perf(test): trim fixture and import overhead in hot suites 2026-02-13 23:16:41 +00:00
Brandon Wise
b0728e605d fix(cron): skip relay only for explicit delivery config, not legacy payload
Fixes #15692

The previous fix was too broad — it removed the relay for ALL isolated jobs.
This broke backwards compatibility for jobs without explicit delivery config.

The correct behavior is:
- If job.delivery exists → isolated runner handles it via runSubagentAnnounceFlow
- If only legacy payload.deliver fields → relay to main if requested (original behavior)

This addresses Greptile's review feedback about runIsolatedAgentJob being an
injected dependency that might not call runSubagentAnnounceFlow.

Uses resolveCronDeliveryPlan().source to distinguish between explicit delivery
config and legacy payload-only jobs.
2026-02-14 00:08:56 +01:00
Peter Steinberger
45a2cd55cc fix: harden isolated cron announce delivery fallback (#15739) (thanks @widingmarcus-cyber) 2026-02-13 23:49:10 +01:00
Marcus Widing
ea95e88dd6 fix(cron): prevent duplicate delivery for isolated jobs with announce mode
When an isolated cron job delivers its output via deliverOutboundPayloads
or the subagent announce flow, the finish handler in executeJobCore
unconditionally posts a summary to the main agent session and wakes it
via requestHeartbeatNow. The main agent then generates a second response
that is also delivered to the target channel, resulting in duplicate
messages with different content.

Add a `delivered` flag to RunCronAgentTurnResult that is set to true
when the isolated run successfully delivers its output. In executeJobCore,
skip the enqueueSystemEvent + requestHeartbeatNow call when the flag is
set, preventing the main agent from waking up and double-posting.

Fixes #15692
2026-02-13 23:49:10 +01:00
Peter Steinberger
9131b22a28 test: migrate suites to e2e coverage layout 2026-02-13 14:28:22 +00:00
Peter Steinberger
8899f9e94a perf(test): optimize heavy suites and stabilize lock timing 2026-02-13 13:29:07 +00:00
青雲
fd076eb43a fix: /status shows incorrect context percentage — totalTokens clamped to contextTokens (#15114) (#15133)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: a489669fc7e86db03484ef5a0ae222d9360e72f7
Co-authored-by: echoVic <16428813+echoVic@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-12 23:52:19 -05:00
Vladimir Peshekhonov
957b883082 fix(agents): stabilize overflow compaction retries and session context accounting (openclaw#14102) thanks @vpesh
Verified:
- CI checks for commit 86a7ecb45ebf0be61dce9261398000524fd9fab6
- Rebase conflict resolution for compatibility with latest main

Co-authored-by: vpesh <9496634+vpesh@users.noreply.github.com>
2026-02-12 17:53:13 -06:00
Kyle Tse
abdceedaf6 fix: respect session model override in agent runtime (#14783) (#14983)
Merged via /review-pr -> /prepare-pr -> /merge-pr.

Prepared head SHA: ec47d1a7bf4e97a5db77281567318c1565d319b5
Co-authored-by: shtse8 <8020099+shtse8@users.noreply.github.com>
Co-authored-by: gumadeiras <5599352+gumadeiras@users.noreply.github.com>
Reviewed-by: @gumadeiras
2026-02-12 17:12:15 -05:00
Sebastian
d31caa81ef fix(runtime): guard cleanup and preserve skipped cron jobs 2026-02-12 09:28:47 -05:00
niceysam
f7e05d0136 fix: exclude maxTokens from config redaction + honor deleteAfterRun on skipped cron jobs (#13342)
* fix: exclude maxTokens and token-count fields from config redaction

The /token/i regex in SENSITIVE_KEY_PATTERNS falsely matched fields like
maxTokens, maxOutputTokens, maxCompletionTokens etc. These are numeric
config fields for token counts, not sensitive credentials.

Added a whitelist (SENSITIVE_KEY_WHITELIST) that explicitly excludes
known token-count field names from redaction. This prevents config
corruption when maxTokens gets replaced with __OPENCLAW_REDACTED__
during config round-trips.

Fixes #13236

* fix: honor deleteAfterRun for one-shot 'at' jobs with 'skipped' status

Previously, deleteAfterRun only triggered when result.status was 'ok'.
For one-shot 'at' jobs, a 'skipped' status (e.g. empty heartbeat file)
would leave the job in state but disabled, never getting cleaned up.

Now deleteAfterRun also triggers on 'skipped' status for 'at' jobs,
since a skipped one-shot job has no meaningful retry path.

Fixes #13249

* Cron: format timer.ts

---------

Co-authored-by: nice03 <niceyslee@gmail.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-12 07:55:05 -06:00
WalterSumbon
39e3d58fe1 fix: prevent cron jobs from skipping execution when nextRunAtMs advances (#14068)
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-11 23:33:22 -06:00
大猫子
a88ea42ec7 fix(cron): prevent one-shot at jobs from re-firing on restart after skip/error (#13845) (#13878)
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-11 23:33:15 -06:00
0xRain
b0dfb83952 fix(cron): use requested agentId for isolated job auth resolution (#13983)
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-11 23:32:45 -06:00
石川 諒
04e3a66f90 fix(cron): pass agentId to runHeartbeatOnce for main-session jobs (#14140)
* fix(cron): pass agentId to runHeartbeatOnce for main-session jobs

Main-session cron jobs with agentId always ran the heartbeat under
the default agent, ignoring the job's agent binding. enqueueSystemEvent
correctly routed the system event to the bound agent's session, but
runHeartbeatOnce was called without agentId, so the heartbeat ran under
the default agent and never picked up the event.

Thread agentId from job.agentId through the CronServiceDeps type,
timer execution, and the gateway wrapper so heartbeat-runner uses the
correct agent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* cron: add heartbeat agentId propagation regression test (#14140) (thanks @ishikawa-pro)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
2026-02-11 22:22:29 -06:00
MarvinDontPanic
04f695e562 fix(cron): isolate schedule errors to prevent one bad job from breaking all jobs (#14385)
Previously, if one cron job had a malformed schedule expression (e.g. invalid cron syntax),
the error would propagate up and break the entire scheduler loop. This meant one misconfigured
job could prevent ALL cron jobs from running.

Changes:
- Wrap per-job schedule computation in try/catch in recomputeNextRuns()
- Track consecutive schedule errors via new scheduleErrorCount field
- Log warnings for schedule errors with job ID and name
- Auto-disable jobs after 3 consecutive schedule errors (with error-level log)
- Clear error count when schedule computation succeeds
- Continue processing other jobs even when one fails

This ensures the scheduler is resilient to individual job misconfigurations while still
providing visibility into problems through logging.

Co-authored-by: Marvin <numegilagent@gmail.com>
2026-02-11 22:17:07 -06:00
Tom Ron
ace5e33cee fix(cron): re-arm timer when onTimer fires during active job execution (#14233)
* fix(cron): re-arm timer when onTimer fires during active job execution

When a cron job takes longer than MAX_TIMER_DELAY_MS (60s), the clamped
timer fires while state.running is still true.  The early return in
onTimer() previously exited without re-arming the timer, leaving no
setTimeout scheduled.  This silently kills the cron scheduler until the
next gateway restart.

The fix calls armTimer(state) before the early return so the scheduler
continues ticking even when a job is in progress.

This is the likely root cause of recurring cron jobs silently skipping,
as reported in #12025.  One-shot (kind: 'at') jobs were unaffected
because they typically complete within a single timer cycle.

Includes a regression test that simulates a slow job exceeding the
timer clamp period and verifies the next occurrence still fires.

* fix: update tests for timer re-arm behavior

- Update existing regression test to expect timer re-arm with non-zero
  delay instead of no timer at all
- Simplify new test to directly verify state.timer is set after onTimer
  returns early due to running guard

* fix: use fixed 60s delay for re-arm to prevent zero-delay hot-loop

When the running guard re-arms the timer, use MAX_TIMER_DELAY_MS
directly instead of calling armTimer() which can compute a zero delay
for past-due jobs.  This prevents a tight spin while still keeping the
scheduler alive.

* style: add curly braces to satisfy eslint(curly) rule
2026-02-11 22:13:27 -06:00
Xinhua Gu
dd6047d998 fix(cron): prevent duplicate fires when multiple jobs trigger simultaneously (#14256)
The `computeNextRunAtMs` function used `nowSecondMs - 1` as the
reference time for croner's `nextRun()`, which caused it to return the
current second as a valid next-run time. When a job fired at e.g.
11:00:00.500, computing the next run still yielded 11:00:00.000 (same
second, already elapsed), causing the scheduler to immediately re-fire
the job in a tight loop (15-21x observed in the wild).

Fix: use `nowSecondMs` directly (no `-1` lookback) and change the
return guard from `>=` to `>` so next-run is always strictly after
the current second.

Fixes #14164
2026-02-11 22:04:17 -06:00
Tak Hoffman
d2c2f4185b Heartbeat: inject cron-style current time into prompts (#13733)
* Heartbeat: inject cron-style current time into prompts

* Tests: fix type for web heartbeat timestamp test

* Infra: inline heartbeat current-time injection
2026-02-10 18:58:45 -06:00
Gustavo Madeira Santana
e19a23520c fix: unify session maintenance and cron run pruning (#13083)
* fix: prune stale session entries, cap entry count, and rotate sessions.json

The sessions.json file grows unbounded over time. Every heartbeat tick (default: 30m)
triggers multiple full rewrites, and session keys from groups, threads, and DMs
accumulate indefinitely with large embedded objects (skillsSnapshot,
systemPromptReport). At >50MB the synchronous JSON parse blocks the event loop,
causing Telegram webhook timeouts and effectively taking the bot down.

Three mitigations, all running inside saveSessionStoreUnlocked() on every write:

1. Prune stale entries: remove entries with updatedAt older than 30 days
   (configurable via session.maintenance.pruneDays in openclaw.json)

2. Cap entry count: keep only the 500 most recently updated entries
   (configurable via session.maintenance.maxEntries). Entries without updatedAt
   are evicted first.

3. File rotation: if the existing sessions.json exceeds 10MB before a write,
   rename it to sessions.json.bak.{timestamp} and keep only the 3 most recent
   backups (configurable via session.maintenance.rotateBytes).

All three thresholds are configurable under session.maintenance in openclaw.json
with Zod validation. No env vars.

Existing tests updated to use Date.now() instead of epoch-relative timestamps
(1, 2, 3) that would be incorrectly pruned as stale.

27 new tests covering pruning, capping, rotation, and integration scenarios.

* feat: auto-prune expired cron run sessions (#12289)

Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.

New config option:
  cron.sessionRetention: string | false  (default: '24h')

The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.

Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely

Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps

Closes #12289

* fix: sweep cron session stores per agent

* docs: add changelog for session maintenance (#13083) (thanks @skyfallsin, @Glucksberg)

* fix: add warn-only session maintenance mode

* fix: warn-only maintenance defaults to active session

* fix: deliver maintenance warnings to active session

* docs: add session maintenance examples

* fix: accept duration and size maintenance thresholds

* refactor: share cron run session key check

* fix: format issues and replace defaultRuntime.warn with console.warn

---------

Co-authored-by: Pradeep Elankumaran <pradeepe@gmail.com>
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Co-authored-by: max <40643627+quotentiroler@users.noreply.github.com>
Co-authored-by: quotentiroler <max.nussbaumer@maxhealth.tech>
2026-02-09 20:42:35 -08:00
max
8d75a496bf refactor: centralize isPlainObject, isRecord, isErrno, isLoopbackHost utilities (#12926) 2026-02-09 17:02:55 -08:00