This article answers the question – what is Docker? Specifically, what is Docker to a developer? Why we would use it?
To give a very quick and rough explanation as to what Docker does, Docker is an alternative to Virtualization and Virtual Machines. Docker is a lot smaller for copies/similar data compared to virtualization, starts very quickly, and is scripted pretty quickly. It removes the “It works on my machine” scenarios too.
To give a quick example, running the following will download and start an Nginx web server:
Now that looks powerful and interesting! Let’s dive in!
Container and Image
First, let’s review some keywords used in the Docker ecosystem.
- Image – A blueprint of what to create (Think of this being a class in the software world). An image defines what Docker will build and what steps to build it. This is immutable. Think of it like Microsoft Windows, you don’t change Windows itself, but you do change your personal files and applications.
- Container – A running, interactive instance of the image (Think of this as an object based on a class in the software world). This is mutable. Think of it like your personal files and applications on Windows, they do change with time.
Size of Virtual Machines
To answer the question, What is Docker and why would a developer use it? One of the selling features is the small size.
Virtualization stores the operating system, file system, environment configuration and applications as one giant blob. If a change is required, lets say we wanted to install some different applications without affecting the original copy, a copy and paste is required.
Most of the virtual machines I worked on last year were over 100GB in size, meaning when I needed to work on a different client’s work, I needed to move giant files to and from my disk. Obviously that approach is not sustainable.
Layered File System
Docker uses a layered file system meaning that if 1000 Ubuntu 15.04 containers are running at once, they would all share access to the same underlying image file system, if it was the same. Therefore disk space used is tiny.
The illustration below shows how this is accomplished, with distinct layers identified by hashes:
A Dockerfile describes how to make a Docker Image. The Dockerfile is a great way to see how the layering works within Docker.
- The FROM tells Docker that the image should start from downloading a pre-built node:alpine and and start building no top. hub.docker.com is a library full of pre-made Docker images. When Docker runs this “recipe” it will see that it needs node:alpine, check in its cache to grab it. If it does not find it, reaches out to the internet to grab it.
- VOLUME – Tells Docker to mount a folder on the host file system into the Docker instance
- WORKDIR – Default start location pointing to the mounted code
- RUN – Think of this as running commands on the system directly. Lots of these RUN statements can be added to get/install/configure software or the environment. For each RUN line in the Dockerfile, a separate layer is created. This means that if more statements are made after this point, Docker doesn’t rebuild everything, just layers after this point. Speedy! Also any other images with the same steps will share the same image file.
- EXPOSE – Open a hole in the firewall to allow traffic on 8080
- CMD – Run the command nodemon with app.js as a parameterSide Note: Only 1 CMD command can be in the Dockerfile
Aside: Alpine is a super lightweight distribution of Linux coming in at around 5MB. Meaning a 2 second download on a regular internet connection (if it is not already stored in cache).
Running this Dockerfile will spin up a running Node.js application with code mounted off the host file system (the machine Docker is running on). New developers starting on a project can simply grab and run the small (kb) Dockerfile and have a working environment in the blink of an eye.
It can be concluded that this is a very quick, easy and efficient way to get an environment up and running.
Before creating your own Image, check hub.docker.com for a pre-built image. If that doesn’t exist, find an image that is close to desired and build on top.
Docker states that it solves the problem of “It works on my machine” issues. The same filesystem (image) will be reproduced for all, mitigating any install differences. This is a good feature for sharing work between developer machines or even putting Images on the cloud.
- If an image doesn’t exist locally, downloads are blazingly fast as a lot of Images are typically in the size of megabytes
- If an image exists, but has changed slightly, only the differences since it changed are processed. Docker generates a unique hashcode for each layer added to the Image. If it hasn’t changed – great!
- Docker runs on the machine’s operating system kernel – they start instantly and use less compute and RAM then Virtual Machines
Open Source software fits the paradigm of Docker and a layered filesystem well. Most Open Source software is made up of small, distinct pieces of functionality that typically get pulled into a larger software builds (think bower, npm, ruby gems, apt-get and so on). Reasons that Open Source fits the Docker and container model well:
- Open Source software is small, distinct
- It installs easily via command line – great for the RUN commands shown in the Dockerfile example
- Open Source software hosted on public repositories and Content Delivery Networks (CDN) allow for “recipes” and scripts for building required software
- Each RUN command is a layer on the image
- For the most part you don’t have to worry about licensing, distribution and running on various server architectures (Development, Multi-compute systems on the cloud)
Isolated and Contained
An interesting feature that the Docker ecosystem offers is that ability to link multiple Images together into a more complex system with Docker Compose. For example, one image could be Node.js with access to the host machines Code folder and another image running MySQL. This separation means that it is possible to swap the database out with another database such as Amazon’s Aurora.
Rather than both MySQL and the Node.js application running in the same Image, they in two separate Images. If another project needs to be created using just Node.js and no database, the Node.js Image can be built upon without having a redundant MySQL server running. This follows the Single Responsibility Principle and allows for other people to take the image and build on top.
Hopefully this has given some insight into What is Docker? and why it might be useful for a developer. It provides a series of wins, including small size, fast response times, ease of distribution and normalization of environments.