Containers provide process isolation but share the host kernel, making them fundamentally different from virtual machines. A misconfigured container can give an attacker a direct path to full host control and, in orchestrated environments, the entire cluster. Organizations consistently underestimate this attack surface.
Container Security Model
- Namespaces -- PID, network, mount, user, UTS, IPC, and cgroup namespaces provide logical isolation, but were not designed as security boundaries against hostile processes.
- Cgroups -- Limit resource consumption (CPU, memory, I/O) but do not prevent access to sensitive host resources when other boundaries are weakened.
- Capabilities -- Docker drops most Linux capabilities by default, but
--privilegedor explicit grants restore dangerous permissions like CAP_SYS_ADMIN. - Seccomp and AppArmor/SELinux -- Default seccomp blocks ~44 syscalls; custom profiles or running without them significantly weakens isolation.
Container Escape Vectors
- Privileged containers --
--privilegeddisables nearly all isolation: full /dev access, all capabilities, no seccomp, no AppArmor/SELinux. - Mounted Docker socket -- /var/run/docker.sock gives effective root on the host. Create new containers mounting the host filesystem.
- Sensitive host paths -- Mounting /, /etc, /root provides direct host file access. Writable mounts enable host modification.
- Kernel exploits -- Shared kernel means any exploitable vulnerability (CVE-2022-0185, CVE-2022-0847 Dirty Pipe, CVE-2020-14386) leads to full host escape.
- SYS_PTRACE abuse -- With CAP_SYS_PTRACE and shared PID namespace, attach to host processes to inject shellcode or read memory.
Detection from Inside
- /.dockerenv -- Docker creates this file at container root.
- /proc/1/cgroup -- Contains docker/kubepods/containerd references in container; shows / as root on host.
- Environment variables -- KUBERNETES_SERVICE_HOST, Docker Compose service names reveal the orchestration platform.
- Limited PID namespace --
ps auxshows very few processes vs. hundreds on the host.
Escape Techniques
- Docker socket --
docker run -v /:/hostfs -it alpine chroot /hostfsfor a root shell on the host. Use curl against the Docker API if CLI is unavailable. - Privileged mode --
mount /dev/sda1 /mntto access host filesystem. Write SSH keys or crontab entries for persistence. - CAP_SYS_ADMIN via cgroup release_agent -- Create a cgroup, set release_agent to a host script, trigger by emptying the cgroup. Executes on host as root because cgroup operations escape mount namespace.
- runc vulnerabilities -- CVE-2019-5736 allowed overwriting the host runc binary during container startup. Runtime vulnerabilities require no special container configuration.
Image Security
- Base image vulnerabilities -- Outdated base layers inherit all known CVEs. Scan with Trivy or Grype before deployment.
- Secrets in layers -- Immutable layers preserve secrets even when "removed" in later layers.
docker history --no-truncreveals embedded credentials. - Multi-stage builds -- Copy only final artifacts to minimal runtime images, excluding build tools and source code.
- CI/CD scanning -- Trivy, Snyk Container, and Anchore can fail builds on critical CVEs.
Kubernetes-Specific Issues
- Service account tokens -- Mounted at /var/run/secrets/kubernetes.io/serviceaccount/token by default. Elevated RBAC permissions enable API interaction to list secrets or create pods.
- Unauthenticated etcd -- Stores all cluster state including base64-encoded secrets. Port 2379 without auth exposes every secret.
- Kubelet API (10250) -- Anonymous-auth enables command execution in any pod on the node.
- Pod Security Admission bypass -- Namespace label or exemption misconfigurations allow privileged pods in restricted namespaces.
- RBAC misconfigurations -- Overly permissive ClusterRoleBindings and wildcard permissions enable cluster-wide privilege escalation.
- Secrets as environment variables -- Visible in /proc/[pid]/environ. Mount as files with restrictive permissions instead.
Supply Chain and Network
- Typosquatting -- Malicious images with names similar to official ones on public registries. Verify sources and use image signing (cosign, Docker Content Trust).
- Network segmentation -- Default allows all container-to-container communication. Kubernetes NetworkPolicy restricts ingress/egress at pod level.
- Service mesh -- Istio/Linkerd provide mutual TLS, encryption, and Layer 7 access control between services.
Tools and Hardening
- deepce -- Container enumeration, privilege escalation, and escape detection from inside.
- CDK -- Container penetration toolkit for information gathering and exploitation.
- Trivy -- Image and cluster vulnerability scanning for CVEs, misconfigurations, and secrets.
- Falco -- Runtime monitoring detecting anomalous behavior (shell spawning, sensitive file access).
- kube-hunter / peirates -- Kubernetes penetration testing for privilege escalation and lateral movement.
Hardening: run rootless containers, enforce read-only filesystems, apply strict security contexts (runAsNonRoot, drop ALL capabilities), enable Pod Security Admission in enforce mode, and regularly scan images and cluster configurations.
Want to learn more about this topic? Read my expertise page on Web Application Security →
Comments
No comments yet. Be the first!
Leave a Comment