Understanding Kubernetes Readiness Probes and Host Configuration

Today, I'd like to share an interesting learning experience I had with Kubernetes and service configuration. It was a journey of discovery that provided me with valuable insights about readiness probes in Kubernetes, and the importance of correctly setting the host for your application in a microservices environment.

The Problem

One fine day, while working on a Kubernetes deployment, I noticed an alarming error message being consistently reported:

Warning  Unhealthy  118s (x21 over 3m33s)  kubelet  Readiness probe failed: Get "http://10.52.1.198:5000/api/v1/db/": dial tcp 10.52.1.198:5000: connect: connection refused

This indicated that the readiness probe in my Kubernetes Pod was failing.

The Readiness Probe

Just to familiarize some readers, a readiness probe is used by Kubernetes to determine when a Pod is ready to accept traffic. A readiness probe can fail for several reasons. It might be due to the application not running, the startup time being greater than the initialDelaySeconds parameter, a networking issue, incorrect or inaccessible port, incorrect probe path, or resource limitations. Therefore, troubleshooting this issue involved running down this list to understand the exact issue.

The Diagnosis and Solution

In my case, after a bit of fiddling around, I found that my application was starting up on 127.0.0.1. And there was the problem - which may not seem very obvious without a deeper understanding of network interfaces and their settings!

Clarifying the Host Mystery

Let's backtrack a little and discuss about the host settings like 127.0.0.1 and 0.0.0.0. In networking, 127.0.0.1 is the loopback address, also known as localhost. This IP is used when a program needs to connect to a server running on the same machine as it is. It does not listen for requests from the external network. But when you are running applications inside Docker (or specifically Kubernetes which uses Docker), the container itself is considered an isolated “machine”.

This is where 0.0.0.0 comes into picture. 0.0.0.0 specifies all IPv4 addresses on the local machine. If a host has two IP addresses, 192.168.1.1 and 10.1.2.1, and the host runs a server which listens on 0.0.0.0, then the server will be reachable at both of those IPs. Thus, when your app is running inside Docker and is listening on 0.0.0.0, it will accept connections on every network interface of the container.

When I made my application listen on 0.0.0.0:5000 instead of 127.0.0.1:5000, it meant that the application was now capable of accepting connections on all network interfaces within the container, including from outside the container. Therefore, readiness probe coming from the Kubernetes scheduler to check if the application is up and running could now execute successfully.

I confirmed this by checking back on my error logs and seeing no probe failures. I also confirmed that my application was working correctly and was able to handle incoming traffic, thus verifying the solution.

Setting the Host IP

Setting the host IP to 0.0.0.0 while launching the app with uvicorn was as simple as executing:

uvicorn main:app --host 0.0.0.0 --port 5000

Deep Dive into the Problem

Now that we have the problem solved, let's deep dive for a moment and understand why the app worked locally but failed when deployed in Kubernetes.

When we run an application in our local system and we set it to listen on 127.0.0.1, it means that the application will accept connections from the same machine. For all intents and purposes, this worked fine in a local development environment since both the application and the client making requests to it (via browser, Postman, etc) were on the same machine.

However, in a Kubernetes environment, the scenario changes. Inside a Kubernetes cluster, different applications reside in their own containers and possibly their own pods, each of which are like separate isolated machines. Consequently, an application listening on localhost is not reachable from another pod or even another container in the same pod.

For an application to be reachable from another container, the application needs to listen on a network interface that is connected to a network common to both containers. In most cases, setting the application to listen to 0.0.0.0 does the job. Technically speaking, the 0.0.0.0 IP address refers to all available network interfaces on the host system.

Conclusion

Getting a Kubernetes deployment to work correctly involves understanding various components, from containerization, networking interfaces to probes. It's an iterative process to check one element at a time to diagnose the problem.

This journey gave me a better insight into network interfaces in the context of a containerized application. I understood that an application listening on localhost (or 127.0.0.1) cannot serve traffic originating from outside of its host (in this case, the container).

My key takeaway from this is to consider and understand the environment in which the code is going to run, especially when we are deploying into different stages (like development, testing, and production). Even seemingly small details such as on which IP a service listens can make a significant difference.