Skip to main content

Troubleshooting

Common issues and solutions for Garnet deployment and operation.

Installation Issues

Kubernetes: Pods Stuck in Pending

Symptoms:
kubectl get pods -n garnet
# NAME           READY   STATUS    RESTARTS   AGE
# jibril-xyz     0/1     Pending   0          5m
Diagnosis:
kubectl describe pod -n garnet
kubectl get events -n garnet --sort-by='.lastTimestamp'
Common causes:
Error: 0/10 nodes available: insufficient cpu/memoryFix:
resources:
  requests:
    cpu: 50m      # Reduce from 100m
    memory: 64Mi  # Reduce from 128Mi
Error: pods "jibril-xyz" is forbidden: unable to validate against any pod security policyFix: Enable privileged pods in your PSP or use Pod Security Standards:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: garnet-privileged
spec:
  privileged: true
  allowedCapabilities:
    - SYS_ADMIN
    - NET_ADMIN
Error: 0/10 nodes available: node(s) didn't match node selectorFix: Check your node labels:
kubectl get nodes --show-labels
Update nodeSelector in values.yaml or remove it:
nodeSelector: {}  # Allow all nodes

GitHub Actions: Step Fails

Symptoms:
Error: Garnet authentication failed (401 Unauthorized)
Diagnosis:
1

Verify Secret Exists

# Check if secret is set
gh secret list | grep GARNET
Should show GARNET_API_TOKEN.
2

Test Token

curl -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.garnet.ai/v1/health
Should return {"status":"ok"}.
3

Regenerate Token

If invalid, generate new token at dashboard.garnet.ai and update secret.

Connectivity Issues

No Events Appearing in Dashboard

Symptoms:
  • Agent shows “Connected” in logs
  • But no events in Dashboard → Events
Diagnosis:
  • Kubernetes
  • GitHub Actions
# Check agent logs
kubectl logs -l app=jibril -n garnet --tail=100

# Look for:
# ✓ Good: "Connected to Garnet Platform"
# ✓ Good: "Event batch sent successfully"
# ✗ Bad: "Failed to send events: 403 Forbidden"
Common causes:
Fix: Regenerate token with correct project permissions.Dashboard → Settings → API Tokens → Create new token
Test connectivity:
kubectl run test-curl --image=curlimages/curl --rm -it -- \
  curl -v https://api.garnet.ai/v1/health
Fix: Update firewall rules to allow outbound HTTPS to api.garnet.ai.
Cause: Workload isn’t making outbound connections.Fix: Trigger test traffic:
kubectl run test --image=curlimages/curl --rm -it -- \
  curl https://example.com
Check Dashboard → Events for the curl request.

Detection Issues

Too Many False Positives

Symptoms:
  • Many Issues for legitimate traffic
  • False positive rate >5%
Diagnosis: Dashboard → Issues → Review recent Issues Look for patterns:
  • Same domain flagged repeatedly?
  • Specific micro-context generating noise?
Solutions:
1

Extend Baseline Period

Dashboard → Settings → Baselining → Period: 14 days (from 7)Gives more time to learn normal behavior.
2

Create Allow Policy

For known-good domains that vary frequently:
name: "CDN allowlist"
type: allow
rules:
  - pattern: "*.cloudfront.net"
  - pattern: "*.fastly.net"
3

Review Micro-Context Granularity

For GitHub Actions, very granular contexts (workflow+job+step) can cause FPs if steps vary.Workaround: Use broader policies for CI/CD environments.

No Detections (Expected Malicious Traffic Not Caught)

Symptoms:
  • Manually triggered unknown egress not appearing as Issue
Diagnosis:
1

Check if Domain is in Baseline

Dashboard → Events → Filter by domainIf domain appears in past events, it’s already baseline.
2

Verify Enforce Mode

If in detect-only, connections are allowed but should still create Issues.Check: Dashboard → Agents → Mode column
3

Check Policy Overrides

Dashboard → Policies → Check if an allow policy matches the domain.Policies override baseline.
Test with guaranteed-unknown domain:
curl https://test-garnet-unknown-$(date +%s).com
This creates a unique domain each time, guaranteed to be unknown.

Performance Issues

High CPU Usage

Symptoms:
kubectl top pods -n garnet
# NAME        CPU    MEMORY
# jibril-xyz  500m   256Mi   # >200m is high
Causes:
Diagnosis:
kubectl get pods -o wide | grep <node-name>
# Check number of pods on this node
Fix: Increase CPU limits or reduce monitoring scope:
resources:
  limits:
    cpu: 500m  # Increase from 200m

# OR reduce scope
env:
  - name: GARNET_SAMPLE_RATE
    value: "0.5"  # Sample 50% of events
Fix: Upgrade to latest agent version:
helm repo update
helm upgrade jibril garnet/jibril -n garnet --reuse-values
Each version includes eBPF optimizations.

High Memory Usage

Symptoms:
kubectl top pods -n garnet
# NAME        CPU   MEMORY
# jibril-xyz  100m  512Mi   # >256Mi is high
Causes:
Fix: Increase memory limits and buffer size:
resources:
  limits:
    memory: 512Mi

env:
  - name: GARNET_BUFFER_SIZE
    value: "8192"  # Increase from default 4096
Diagnosis: Memory usage increases steadily over days.Fix: Restart pods:
kubectl rollout restart daemonset/jibril -n garnet
If issue persists, contact support@garnet.ai with pod logs.

Enforce Mode Issues

Legitimate Traffic Being Blocked

Symptoms:
  • Application errors like ConnectionError or EPERM
  • Dashboard shows Issue with verdict=blocked
Immediate fix (< 1 minute):
# Create emergency allow policy
name: "Emergency: Unblock example.com"
type: allow
scope: global
rules:
  - pattern: "example.com"
Apply via Dashboard → Policies → Create. Long-term fix:
  1. Review why domain wasn’t in baseline
  2. Add to corporate allowlist if recurring
  3. Extend baseline period if too short

Enforce Mode Not Blocking

Symptoms:
  • Mode set to enforce
  • Unknown egress detected (Issue created)
  • But connection NOT blocked (verdict=detected)
Diagnosis:
  • Kubernetes
  • GitHub Actions
kubectl get pods -n garnet -o yaml | grep -A5 "env:"
Check if GARNET_MODE: enforce is set.
Fix:
# Kubernetes
helm upgrade jibril garnet/jibril \
  --set mode=enforce \
  --namespace garnet \
  --reuse-values

# Verify
kubectl rollout status daemonset/jibril -n garnet

Common Error Messages

eBPF program failed to load

Error in logs:
ERROR: Failed to load eBPF program: operation not permitted
Causes:
Fix:
securityContext:
  privileged: true
Check kernel version:
kubectl debug node/NODE_NAME -it --image=ubuntu -- uname -r
Must be >=5.8. If older, upgrade node OS.
Temporary fix:
setenforce 0  # Set to permissive mode
Permanent fix: Create SELinux policy for Garnet (contact support).

Connection refused to api.garnet.ai

Error in logs:
ERROR: Failed to connect: connection refused
Causes:
Test:
kubectl run test --image=curlimages/curl --rm -it -- \
  curl -v https://api.garnet.ai
Fix: Update network policy:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-garnet-egress
spec:
  podSelector:
    matchLabels:
      app: jibril
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector: {}
    - ports:
        - protocol: TCP
          port: 443  # HTTPS
Fix: Set proxy environment variables:
env:
  - name: HTTPS_PROXY
    value: "http://proxy.corp.example.com:8080"
  - name: NO_PROXY
    value: ".svc.cluster.local"

Getting Help

Still stuck? Contact support with:
  1. Platform: GitHub Actions or Kubernetes
  2. Agent version: helm list -n garnet or workflow logs
  3. Error logs: Last 100 lines
  4. Issue ID: If related to a specific Issue
Email: support@garnet.ai Include:
# Kubernetes diagnostics
kubectl get pods -n garnet -o wide
kubectl logs -l app=jibril -n garnet --tail=200
kubectl describe pod <pod-name> -n garnet

Next Steps