Set up Kubernetes manifests for omnirepo agent workloads

t-422·WorkTask·
·
·
Created1 month ago·Updated1 month ago

Description

Edit

Set up Kubernetes PVC and manifests for running agents on the omnirepo.

Context

To run agent jobs on the omnirepo in Kubernetes, we need: 1. A PersistentVolumeClaim to hold the workspace 2. Job templates for running agent tasks 3. Initial setup to clone the repo into the PVC

Current State

  • No k8s manifests exist for agent workloads
  • Agent images exist but aren't in a registry yet (see related task)
  • No workspace persistence

Goals

1. Create k8s manifests for omnirepo agent workloads 2. Document the setup process 3. Make it easy to run agent tasks via kubectl

Implementation

1. Create Namespace

k8s/agents/namespace.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: agents

2. Create Secrets

k8s/agents/secrets.yaml (template, actual values via kubectl):

apiVersion: v1
kind: Secret
metadata:
  name: agent-secrets
  namespace: agents
type: Opaque
stringData:
  anthropic-api-key: ""  # Set via: kubectl create secret ...

Create with:

kubectl create secret generic agent-secrets \
  --namespace=agents \
  --from-literal=anthropic-api-key="$ANTHROPIC_API_KEY"

3. Create PVC

k8s/agents/workspace-pvc.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: omnirepo-workspace
  namespace: agents
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  # storageClassName: standard  # Adjust for your cluster

4. Create Init Job (Clone Repo)

k8s/agents/init-workspace.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: init-omnirepo
  namespace: agents
spec:
  ttlSecondsAfterFinished: 300
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: git
        image: alpine/git:latest
        command:
          - /bin/sh
          - -c
          - |
            if [ -d /workspace/.git ]; then
              echo "Repo already exists, pulling latest..."
              cd /workspace && git pull
            else
              echo "Cloning repo..."
              git clone https://github.com/your-org/omnirepo.git /workspace
            fi
        volumeMounts:
        - name: workspace
          mountPath: /workspace
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: omnirepo-workspace

5. Create Job Template

k8s/agents/job-template.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: agent-task-PLACEHOLDER
  namespace: agents
spec:
  activeDeadlineSeconds: 3600
  backoffLimit: 0
  ttlSecondsAfterFinished: 86400  # Clean up after 24h
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: agent
        image: ghcr.io/your-org/agent-base:latest
        args: ["--json", "TASK_PLACEHOLDER"]
        env:
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: agent-secrets
              key: anthropic-api-key
        - name: AGENT_WORKSPACE
          value: /workspace
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2"
        volumeMounts:
        - name: workspace
          mountPath: /workspace
        workingDir: /workspace
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: omnirepo-workspace

6. Create Helper Script

k8s/agents/run-agent.sh:

#!/usr/bin/env bash
set -euo pipefail

TASK="$1"
RUN_ID="${2:-agent-$(date +%s)}"

# Create job from template
sed "s/PLACEHOLDER/$RUN_ID/g; s/TASK_PLACEHOLDER/$TASK/g" \
  k8s/agents/job-template.yaml | kubectl apply -f -

echo "Created job: $RUN_ID"
echo "Watch logs: kubectl logs -f job/$RUN_ID -n agents"
echo "Delete: kubectl delete job $RUN_ID -n agents"

Or simpler, use kubectl directly:

kubectl create job agent-$(date +%s) \
  --namespace=agents \
  --image=ghcr.io/your-org/agent-base:latest \
  -- agent --json "fix the bug"

7. Document Usage

k8s/agents/README.md:

# Agent Kubernetes Setup

## Initial Setup (once)

1. Create namespace and secrets:
   ```bash
   kubectl apply -f k8s/agents/namespace.yaml
   kubectl create secret generic agent-secrets \
     --namespace=agents \
     --from-literal=anthropic-api-key="$ANTHROPIC_API_KEY"
   ```

2. Create workspace PVC:
   ```bash
   kubectl apply -f k8s/agents/workspace-pvc.yaml
   ```

3. Initialize workspace with repo:
   ```bash
   kubectl apply -f k8s/agents/init-workspace.yaml
   kubectl logs -f job/init-omnirepo -n agents
   ```

## Running Tasks

Create a job

kubectl create job fix-bug-123 \ --namespace=agents \ --image=ghcr.io/your-org/agent-base:latest \ -- agent --json "fix the bug in auth.py"

Watch logs

kubectl logs -f job/fix-bug-123 -n agents | summarize

Delete when done

kubectl delete job fix-bug-123 -n agents


## Steering (when implemented)

kubectl exec job/fix-bug-123 -n agents -- \ sh -c 'echo "try a different approach" >> .steering'

Files to Create

  • k8s/agents/namespace.yaml
  • k8s/agents/workspace-pvc.yaml
  • k8s/agents/init-workspace.yaml
  • k8s/agents/job-template.yaml
  • k8s/agents/run-agent.sh
  • k8s/agents/README.md

Testing

1. Apply manifests to a test cluster (kind, k3s, or real) 2. Run init job, verify repo is cloned 3. Run a simple agent task 4. Verify logs are visible 5. Verify workspace changes persist

Dependencies

  • A Kubernetes cluster (kind for local, or cloud)
  • kubectl configured
  • Agent images in registry (see related task)

Acceptance Criteria

  • [ ] All manifests created and valid
  • [ ] Namespace and secrets set up
  • [ ] PVC created and bound
  • [ ] Init job clones repo successfully
  • [ ] Agent jobs can run and access workspace
  • [ ] Logs streamable via kubectl
  • [ ] README documents full workflow

Timeline (3)

💬[human]1 month ago

Added k8s manifests and README for agent namespace, PVC, init job, job template, and run-agent helper script (with SSH secret and registry pull secret guidance).

🔄[human]Open → Done1 month ago