Docker Deep Dive



Containers are white hot at the moment especially Docker and Kubernetes.

Reckon: to think or believe
i,e.. I reckon you will love it.

waffle: to talk or write a lot without giving any useful information or any clear answers

modular: consisting of separate parts that, when combined, form a complete whole.

primitive: are the simplest elements available in a programming language. A primitive is the smallest 'unit of processing' available to a programmer

Refactoring: is the process of altering an application's source code without changing it's external behaviour.
The purpose code Refactoring is to improve some of the nonfunctional properties of the code, such as readability, complexity, maintainability, extensibility

Swapping: the act of swapping two variables refers to mutually exchanging the values of the variables.
Usually, this done with the data in memory ex: in a program


data_item x := 1
data_item y := 0

swap (x, y);
After swap x becomes 0 and y becomes 1.

Architecture Big Picture


All a container really is, is this ring-fenced area of an Operating System, with some limits on how much system resources it can use, and thats it.

Now to build them, we will leverage a bunch of low-level kernel stuff, in particular we use Namespace and Controlgroups.



Kernel Internals:

We use two main building blocks, when we are building containers.

Namespace
Control Groups

Both of them are Linux kernel primitives

Namespaces are about isolation
Control Groups are about grouping objects and setting limits

Docker container is basically a organised collection of namespaces.Check below diagram

So this container here, is its own isolated grouping of this Namespaces. 

So its got is own Process ID table with PID one and everything, its own network namespace with an id zero interface, IP address, its own root file system etc





And its secure, it is a secure boundary.



And we can create more, each one is isolated and looking and feeling like a stanalone OS.


Control Groups: Group processes and impose limits i.e Like Container A should get only this amount of CPU, RAM, Disk Space. Similarly for other containers.
Impose limits on each containers.


In modern containers like leveraging capabilities and setcomp and bunch of other steps to add security.

But CPU, RAM and Disk Space are the very center and everything else is like icing on the cake.


The Docker Engine

History:
Docker came out of a company called dotcloud.
In the beginning it was pipe-and-till called dc.d for dot and c for cloud.
It was basically a wrapper for LXC and AUFS.

AUFS is a union file system
LXC a bunch of tools for interfacing with the containers primitives in the kernel.

Now, I know, Docker gets all the props for making containers popular, and rightly so, but it's only fair to say that it really started with LXC. Oh, I like that. So, props to LXC. Anyway, this relationship, with LXC, didn't last long.

As soon as Docker got popular, things got complicated. One of the issues was just the sheer pace of things that were developing at. I mean, Docker was developing as a technology, and as an echo system, and at the same time, and not by coincidence, by the way, but LXC started cranking up development as well. So, of course, the inevitable happened: stuff started breaking, specifically changes in LXC would break Docker, which is no surprise, right? I mean, Docker, being reliant on an external tool like LXC, which is so insanely integral to the project and at the same time, so out of Docker's control, I mean, look, it was never going to be plain sailing.

What Docker really needed was something that did the job of LXC, but was under their control. Enter libcontainer. So, libcontainer is pretty much a light for light replacement for LXC that interface to the container stuff to the kernel, but it's under Docker's control, and this was a key move.


Anyway, right, by this time, the DC tool had become Docker, and things were developing like crazy, and before we knew it, Docker became a monolith.

Now, we call this the daemon, by the way, but rather than being a lightweight and fast like it was supposed to be, it got bloated and slow. I mean, look at it. It's doing everything. It's implementing the HTTP server, and REST API, images, builds, registry stuff, networking storage, authentication, you name it, it was bloated, and it lost its mojo, and it's ironic, right? Because on the one hand, Docker's leading the charge towards micro-services, but on the other hand, Docker itself is a monolith. Like, what? And nobody was happy.


For the Docker folks, it ran on a monolith just in there. For the ecosystem, well, they wanted to work with Docker, but really, just the run-time stuff. They didn't care about all this other stuff. Well, actually, sometimes they did. I'll give you an example.
So, Kubernets was out there, and positioning itself as a container orchestrator, and it was using Docker as its run time. But obviously, pulling in Docker meant it was also getting all of this stuff, including Docker's own, built-in orchestrator. Uh-uh. So, Kubernetes, as an orchestrator, was shipping with Docker, which had a competing orchestrator already built in. Talk about mental!
 Safe to say, the ecosystem wasn't loving the bloat. It was killing usability, and composability, simplicity, security, you name it, but, but, then users weren't ecstatic, either. But everything's fixable, right?


 So Docker set about this massive project about picking it all, and refactoring the core plumbing stuff into separate tills. And, at about the same time as this, the Open Containers Initiative starts making its way up the stage, and we start getting some standards, specifically an image spec and a container runtime spec.

Fast forward to today and we are looking like this


On linux it works like this
The Client ask the daemon for a new container.
The daemon gets container D to start and mange the containers and runs C at the OCI layer, actually builds them.
Run C by the way, is the reference implementation of the OCI runtime spec, and its the default runtime for a vanilla installation of docker.


Create a new container on Linux

Generally speaking, we use Docker client to create containers.
The client takes that run command and it post it as an API request to the container's create endpoint in the daemon.
 But guess what? All this engine refactoring has left the daemon without any code to execute run containers. Seriously, Docker no longer even knows how to create containers, all that logic's ripped out and implemented into Container D in the OCI.
  So to create the container, the daemon calls out to Container D over a GRPC API on a local Unix socket, and even though Container D has got container in its name, even it can't actually create a container. What? Yeah, that's right.
All the logic to interface with the Namespaces and stuff in the kernel is implemented by the OCI.

  • You are in the process starts the daemon
  • The daemon starts Container D, which is a daemon process(long-running process)
  • Container D creates a SHIM process for every container, and RunC creates a Container and then exits.
  • So RunC gets called for every new Container, but it does not stick around. The SHIM does, though, and it's the same for more.
  •   So C here how Container D effectively manages multiple Run Cs, or shims? And that's pretty much the process. And you know what? Architecturally, it's great.

I mean its modular, composable and reusable.

Container D and Run C here are potentially Swappable, definitely Run C, you can swap that out for pretty much any OCI compliant runtime, and they're both reusable as well
so both easily reused by players in the eco-system. And like I was saying, it's all good.




But you know what? All of this de-coupling of the container here, from the daemon, and even the Container D, it lets us do really cool stuff, like re-start the daemon in Container D, and not effect running containers. And if you do Docker in production, oh my goodness, you will know what a huge deal this is. I mean, Docker is cranking out new versions of the engine like nobody's business. And upgrading them in the past, like when an upgrade would kill all of you running containers, let's just say, it was a challenge. But now, we can re-start these here, leave all of our containers running, and when they come back up, they just re-discover running containers, and reconnect to the shim. It's a beautiful thing.


 For every container that you create, a new Run C process forks it, and then exits, leaving every container with its own shim process, connecting it back to Container D. Okay, so it's clearly a one to many relationship between Container D and shems. So, you only ever have one Container D process running on a system, and it's a daemon process, right, long-lived? But Run C isn't, that just starts a container and then bows out.






Comments

Popular posts from this blog

Postgresql Hacked ? - FATAL: pg_hba.conf rejects connection for host "127.0.0.1", user "postgres", database "", SSL on

AWS RDS