Docker

Docker is a platform for developing, shipping, and running applications inside lightweight, portable, and self-sufficient containers. At its core, Docker uses operating system-level virtualization to package applications with their dependencies into standardized units called containers. These containers run on a shared kernel, making them more efficient than traditional virtual machines, which include a separate operating system for each instance. Docker relies on images, which are blueprints of the container’s contents, and provides tools to manage, distribute, and execute them.

How docker works

Namespaces: Isolation Mechanism

Namespaces are a core feature of the Linux kernel that Docker uses to provide isolation. They create separate environments for containers, isolating processes, network interfaces, file systems, and IPC mechanisms. Key namespaces Docker uses include:

PID namespace: Isolates process IDs, ensuring that processes in one container don’t see or affect those in another.
NET namespace: Provides each container with its own virtual network stack, including interfaces, IP addresses, and routing tables.
MNT namespace: Isolates the container’s file system, ensuring each container has its own root directory.
UTS namespace: Allows a container to have its own hostname and domain name.
IPC namespace: Isolates interprocess communication resources like message queues and semaphores.

cgroups: Resource Management

Control groups (cgroups) are another kernel feature Docker leverages to manage and limit resource usage for containers. With cgroups, Docker can:

Limit CPU usage: By defining shares or quotas for CPU cycles.
Restrict memory: By capping the maximum memory usage.
Constrain disk I/O: By setting read/write bandwidth limits.
Manage networking bandwidth: By controlling the network traffic containers can generate.

This ensures resource isolation and prevents one container from monopolizing system resources.

Union File Systems: Copy-on-Write (CoW)

Docker uses union file systems (such as OverlayFS, AUFS, or Btrfs) to implement its layered storage mechanism. This approach allows Docker to create lightweight and efficient images by:

Layering: Each instruction in a Dockerfile (e.g., RUN, COPY) creates a new layer.
Sharing layers: Common layers between images are reused, reducing storage space and improving build speed.
Copy-on-Write: When a container modifies a file from its image, the file is copied to the container’s writable layer. This ensures the base image remains immutable while allowing container-specific changes.

How CoW works?

Read-Only Base Layers: When a container is created from an image, it uses the image’s layers as read-only base layers. These layers are shared between all containers derived from the same image, saving disk space.

Writable Container Layer: Each container gets a thin, writable layer on top of the image’s read-only layers. Any changes made by the container (like modifying files or creating new ones) are stored in this writable layer.

Copy-on-Modify: When a container attempts to modify a file in a read-only layer, Docker doesn’t directly alter the file. Instead: The file is copied from the read-only layer to the container’s writable layer. The modification happens only in the writable layer. This ensures that the original layers remain untouched, allowing them to be reused by other containers.

Unchanged Data: Files that are not modified remain in the shared, read-only layers, reducing redundancy and improving performance.

Docker Networking

Docker creates isolated virtual networks for containers using network namespaces and virtual Ethernet interfaces. Containers can connect to one of several network types:

Bridge network: Containers communicate through a virtual bridge.
Host network: Containers share the host’s network stack.
Overlay network: Spans multiple hosts, enabling container communication across them.
MACvlan: Assigns MAC addresses to containers for direct access to the host network.

Docker uses iptables and routing rules to manage traffic and isolate networks.

Docker Images and Containers

Images: Docker images are read-only templates composed of multiple layers. Layers are stored in the local image cache, and Docker uses a content-addressable storage model to ensure efficiency and consistency.

Containers: Containers are runtime instances of images. A container consists of the image plus a writable layer where changes are made.

Container Runtime

The container runtime is the component that starts and manages containers. Docker initially used its own runtime, but now it uses containerd, a lightweight runtime that implements the Open Container Initiative (OCI) standards. The runtime interacts with the Linux kernel to create namespaces, set up cgroups, and launch containers.

Dockerfile

A Dockerfile is a text file containing a set of instructions to automate the process of building a Docker image. It defines the base image, software dependencies, environment configurations, and commands to be executed within the container. Each line in a Dockerfile represents a layer in the final image, enabling efficient caching and reuse. Dockerfiles are used to create consistent, portable, and reproducible environments for applications. By running a Dockerfile with docker build, you generate an image that can be shared and deployed across various systems.

Key Directives

Command	Explanation
`FROM`	Specifies the base image for building the container.
`RUN`	Executes commands in the container during build time.
`CMD`	Provides the default command to run when the container starts.
`ENTRYPOINT`	Defines the main process to execute as the container starts.
`COPY`	Copies files or directories from the host to the container.
`ADD`	Similar to `COPY` but also supports fetching remote URLs and unpacking archives.
`WORKDIR`	Sets the working directory for subsequent commands in the Dockerfile.
`ENV`	Sets environment variables in the container.
`EXPOSE`	Informs Docker that the container listens on a specific network port.
`ARG`	Defines build-time variables that can be passed via `docker build`.
`LABEL`	Adds metadata to the image.
`VOLUME`	Specifies a directory to store persistent data, mounting it as a volume.
`USER`	Sets the user to execute commands in the container.

CMD vs ENTRYPOINT

Both CMD and ENTRYPOINT in Dockerfiles specify the command that will be executed when a container starts. However, they differ in flexibility, behavior, and use cases:

Aspect	`CMD`	`ENTRYPOINT`
Purpose	Specifies a default command to run in the container.	Defines the container’s main process.
Overriding	Can be overridden by arguments provided in `docker run`.	Arguments to `docker run` are passed as arguments to the entrypoint.
Usage	Acts as a fallback default.	Defines the container’s main application or behavior.
Format	Can be a shell form (`CMD ["command"]`) or an exec form (`CMD ["executable", "arg1"]`).	Must be in the exec form (`ENTRYPOINT ["executable", "arg1"]`).

When to Use CMD

Use CMD when you want to provide a default command that can easily be overridden. This is particularly useful when you want the user to have flexibility in defining container behavior.

Example: A container that defaults to running a Python script but allows the user to specify other scripts.

FROM python:3.9-slim
COPY script.py /app/script.py
WORKDIR /app
CMD ["python", "script.py"]

Run with docker run:

Defaults: docker run my-python-app executes python script.py. Override: docker run my-python-app python other-script.py.

When to Use ENTRYPOINT

Use ENTRYPOINT when you want to define a strict, immutable command that always runs, and only arguments to that command can be specified. This is ideal for containers that serve a single-purpose application.

Example: A container that always runs a web server.

FROM nginx:alpine
ENTRYPOINT ["nginx", "-g", "daemon off;"]

Run with docker run:

Defaults: docker run my-nginx runs nginx -g "daemon off;". Arguments: docker run my-nginx -c /custom/config.conf appends -c /custom/config.conf to the entrypoint command.

Using Both Together

CMD and ENTRYPOINT can be combined to provide default arguments for an entrypoint.

Example: A script that accepts user arguments but has defaults:

FROM ubuntu
ENTRYPOINT ["echo"]
CMD ["Hello, world!"]

Run with docker run:

Defaults: docker run my-echo outputs Hello, world!. Override CMD: docker run my-echo Goodbye outputs Goodbye.

Building Images

Docker builds package applications and dependencies into portable containers. A Dockerfile defines the steps to create these images. Each instruction in the Dockerfile creates a new layer, and efficient image building relies on understanding and optimizing these layers.

Layering

Layers are intermediate files generated by Docker instructions (e.g., RUN, COPY, ADD). They are immutable and shared across images to improve build speed and efficiency.

Common Instructions and Their Impact

FROM: Defines the base image. Starts a new layer hierarchy.
RUN: Executes commands in the shell. Generates a new layer.
COPY/ADD: Copies files into the image. Each instruction creates a layer.
CMD/ENTRYPOINT: Sets the default container command but does not create layers.

Multistage builds

Multistage builds enable you to use multiple FROM instructions in a Dockerfile. Each stage has its own build context. Typically, you use one stage for building and another for producing the final lightweight image.

Example

# Build stage
FROM golang:1.20 AS builder
WORKDIR /app
COPY . .
RUN go build -o myapp

# Final image
FROM alpine:latest
WORKDIR /app
COPY --from=builder /app/myapp .
CMD ["./myapp"]

Benefits:

Size Reduction: Excludes unnecessary build tools and dependencies in the final image.
Separation of Concerns: Keeps build and runtime environments distinct.

When to Use Multistage Builds:

Applications requiring compilation (e.g., Go, Java, C++).
Complex CI/CD pipelines where artifacts need isolation.
When image size or build reproducibility is critical.

When Not to Use Multistage Builds:

Small or simple applications that don’t use compilation steps or require long build processes
Scenarios where build time is more critical than image size (e.g., local experimentation).

Common Interview Questions

Easy Questions

What is Docker?

Docker is a platform for developing, shipping, and running applications using containerization. Containers allow developers to package applications with their dependencies into a single unit. This ensures consistency across different environments.

What is a Docker image?

A Docker image is a lightweight, standalone, and executable software package that contains the application code, runtime, libraries, and dependencies. It serves as a blueprint for creating Docker containers.

What is a Docker container?

A Docker container is a runtime instance of a Docker image. Containers are isolated environments where applications run, sharing the host OS kernel but remaining isolated from other containers.

How do you start a Docker container?

To start a Docker container, use the docker run command followed by the image name, e.g., docker run nginx. This pulls the image (if not available locally) and starts the container.

What is the purpose of docker ps?

The docker ps command lists all running containers. Use docker ps -a to include stopped containers in the list.

What is the difference between docker stop and docker kill?

The command docker stop gracefully stops a container, allowing cleanup processes to run. The command docker kill forcefully stops a container by sending a SIGKILL signal.

What is Docker Hub?

Docker Hub is a cloud-based repository where developers can find and share container images. It hosts both official and user-contributed images.

What is a Dockerfile?

A Dockerfile is a script containing instructions to build a Docker image. It specifies the base image, application code, dependencies, and configurations.

What is the purpose of docker-compose?

Docker Compose is a tool used to define and manage multi-container applications. It uses a YAML file to configure services, networks, and volumes.

How do you delete a Docker container?

Use the docker rm <container_id> command to delete a container. Add the -f flag to remove a running container forcefully.

Medium Questions

What is the difference between a virtual machine and a Docker container?

Virtual machines virtualize an entire OS, including the kernel, and are heavier. Docker containers share the host OS kernel, making them lightweight and faster to start.

How do you expose ports in Docker?

Ports are exposed using the -p flag during docker run, e.g., docker run -p 8080:80 nginx. This maps port 8080 on the host to port 80 inside the container.

What is the purpose of Docker volumes?

Docker volumes persist data generated by containers. They allow sharing data between containers or between the host and containers, ensuring data durability.

What is the COPY instruction in a Dockerfile?

The COPY instruction copies files or directories from the host filesystem into the Docker image. Example: COPY ./app /app.

What is the purpose of Docker networking?

Docker networking allows containers to communicate with each other or the external world. Docker provides different network drivers like bridge, host, and overlay.

How do you restart a stopped container?

Use the docker start <container_id> command to restart a stopped container. Use docker restart for a running container.

What is the difference between ENTRYPOINT and CMD in a Dockerfile?

Both define default commands for a container. ENTRYPOINT is the main command and is less flexible, while CMD provides default arguments that can be overridden.

What are the common Docker network types?

Docker provides the following network types:

Bridge: Default network for containers on a single host.
Host: Shares the host’s network namespace.
None: No networking for the container.

How do you clean up unused Docker objects?

Use the docker system prune command to remove unused containers, images, networks, and volumes.

What is the purpose of the docker exec command?

The docker exec command runs a command inside a running container. For example, docker exec -it <container_id> bash starts an interactive bash session.

Hard Questions

How does Docker handle isolation between containers?

Docker uses Linux namespaces for isolation (process, network, and mount) and cgroups for resource allocation and control. This ensures containers are isolated yet lightweight.

What are multi-stage builds in Docker?

Multi-stage builds allow using multiple FROM instructions in a Dockerfile to create lightweight images by copying only the necessary artifacts from build stages. This reduces image size.

How do you troubleshoot a failing container?

Check container logs using docker logs <container_id>, inspect configuration with docker inspect, and access a shell in the container with docker exec.

What is the difference between docker-compose up and docker-compose run?

The command docker-compose up starts all services defined in the Compose file, while docker-compose run starts a single service. The latter does not rebuild dependent services.

How do you secure Docker containers?

Use minimal base images.
Run containers as non-root users.
Restrict network access.
Regularly update images and dependencies.

What is the ONBUILD instruction in Dockerfile?

The ONBUILD instruction triggers instructions in child images during their build. It’s useful for base images that other images extend.

How do you monitor resource usage of Docker containers?

Use the docker stats command to view real-time CPU, memory, and I/O usage. Alternatively, integrate with monitoring tools like Prometheus and Grafana.

What is a dangling Docker image?

Dangling images are unused images with no tags. These are intermediate images created during builds and can be removed using docker image prune.

How do you limit resources for Docker containers?

Use flags like --memory and --cpus in the docker run command to set memory and CPU limits, e.g., docker run --memory="500m" --cpus="1.0" nginx.

How does Docker Swarm differ from Kubernetes?

Docker Swarm is Docker’s native orchestration tool, simpler to set up but less feature-rich. Kubernetes is more complex and provides advanced features like auto-scaling and custom resource definitions.