feat(shipote): quota enforce + cgroup memory.max + pipeline restart (fase L)

- WorkspaceSpec.quota_enforce: QuotaAction (None|Log|Kill) por recurso
  (mem, nproc). reap_dead aplica policy; Kill usa stop_with_grace(ZERO).
- ente_incarnate::cgroup::apply_rlimits_to_cgroup escribe memory.max y
  pids.max. WorkspaceManager::create_with_id lo invoca si soma.cgroup.path
  y delegation. Kernel hace OOM kill al exceder; falla silenciosa si no
  hay delegation.
- PipelineSpec.restart_on_failure: bool. register_pipeline_supervisor
  retiene spec; reap_dead detecta all-dead + any-failed → push a queue;
  daemon reaper drena y relanza pipeline ENTERO (los pipes intermedios
  no permiten restart parcial).

82 tests pasan (ente-incarnate 16, nouser-core 27, shipote-card 8,
shipote-core 24, shipote-discern 5, yahweh-provider-fs 3).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
sergio
2026-05-11 10:22:46 +00:00
parent 324a0c2d5d
commit 4c9d1b4c1d
7 changed files with 401 additions and 5 deletions
@@ -94,6 +94,34 @@ pub struct WorkspaceSpec {
/// Política al terminar el workspace.
#[serde(default)]
pub on_exit: ExitPolicy,
/// Política de enforcement automático cuando un recurso excede su
/// rlimit declarado en `soma.rlimits`. Default = sólo accounting
/// (None) — el quota report sigue funcionando, pero no hay kill.
#[serde(default)]
pub quota_enforce: QuotaEnforcement,
}
/// Acción cuando un recurso excede su límite. Aplica por recurso (mem,
/// nproc, ...).
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum QuotaAction {
/// Sólo accounting: la breach aparece en `workspace_quota`.
#[default]
None,
/// Loguear la breach (info-level del daemon).
Log,
/// Matar todos los comandos vivos del workspace (SIGKILL, sin grace).
Kill,
}
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
pub struct QuotaEnforcement {
#[serde(default)]
pub mem: QuotaAction,
#[serde(default)]
pub nproc: QuotaAction,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
@@ -189,6 +217,11 @@ pub struct PipelineSpec {
pub edges: Vec<FlowEdge>,
#[serde(default)]
pub discern: DiscernPolicy,
/// Si `true` y cualquier comando del pipeline termina con exit!=0,
/// el daemon relaunch el pipeline ENTERO (stop + nuevo run_pipeline).
/// Útil para pipelines de procesamiento continuo.
#[serde(default)]
pub restart_on_failure: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
@@ -468,6 +501,7 @@ mod subst_tests {
}],
edges: vec![],
discern: DiscernPolicy::default(),
restart_on_failure: false,
};
let out = substitute_vars(&spec, &vars).unwrap();
assert_eq!(out.label, "p-renamed");
@@ -487,6 +521,7 @@ mod subst_tests {
nodes: vec![],
edges: vec![],
discern: DiscernPolicy::default(),
restart_on_failure: false,
};
let out = substitute_vars(&spec, &vars).unwrap();
assert_eq!(out.label, "p-${UNDEFINED}");
@@ -509,6 +544,7 @@ mod tests {
scope: FlowScope::Public,
}],
on_exit: ExitPolicy::Reap,
quota_enforce: Default::default(),
}
}
@@ -566,6 +602,7 @@ mod tests {
to_input: "y".into(),
}],
discern: DiscernPolicy::default(),
restart_on_failure: false,
};
assert!(p.validate().is_err());
}