{ Hoelzel.IT }

All roads will lead you to Azure

2024-09-05T00:00:00+00:00

Kubernetes is the Swiss Army knife of container orchestration. It’s versatile, flexible, and can be deployed on just about any platform—from bare metal servers at Hetzner to cloud environments like DigitalOcean. But if you’re the type who rolls your own RKE2 clusters, you know that handmade infrastructure doesn’t just save you money—it gives you better control and tighter security than you’ll ever get from a typical cloud provider.

This flexibility is crucial, especially in a multicloud world where you’re picking the best tools for the job rather than being stuck with a one-size-fits-all approach. But here’s the rub: with all this flexibility comes complexity, especially when it comes to maintaining compliance across different environments. Kubernetes is a beast when it comes to managing containers, but it doesn’t automatically solve the compliance challenges that come with a diverse infrastructure.

The Realities of a Multicloud World

When you’re running RKE2 clusters on Hetzner and DigitalOcean, you’re not just trying to avoid vendor lock-in—you’re playing to the strengths of each platform. Hetzner gives you affordable, high-performance bare metal servers where you control every detail, from networking to security policies. DigitalOcean, on the other hand, offers quick scaling and a streamlined interface that makes it easier to manage additional resources without the overhead. This setup lets you optimize for cost, performance, and control.

But as anyone who’s managed a multicloud environment knows, the more platforms you bring into the mix, the more challenging it becomes to maintain a consistent security posture. Each platform has its own tools, configurations, and idiosyncrasies, which means you’re constantly juggling different security models. Without a solid strategy, this complexity can quickly spiral into a compliance nightmare.

Compliance: Beyond the Clusters

Securing your Kubernetes clusters is critical, but it’s only one piece of the compliance puzzle. True compliance means extending your security measures to every endpoint—every laptop, mobile device, and VM that touches your infrastructure. In a world where a single compromised endpoint can undo the security of your entire environment, you can’t afford to overlook this.

Hard Drive Encryption: The Basics, Done Right

Hard drive encryption is where it all starts. If you’re dealing with sensitive data—and let’s face it, who isn’t?—you need to ensure that every device is encrypted. BitLocker, integrated with Office 365, is your go-to for Windows environments. It’s not just about ticking a box for compliance; it’s about making sure that if a device is lost or stolen, the data on it is safe. The key here is enforcing encryption across all devices, ensuring no exceptions slip through the cracks.

Advanced Threat Protection: More Than Just Antivirus

Basic antivirus software might have cut it a decade ago, but today’s threats are far more sophisticated. You need advanced threat protection that doesn’t just rely on signature-based detection. Microsoft Defender, built into Office 365, offers real-time behavioral analysis, detecting and responding to threats as they emerge. This is about more than just stopping malware—it’s about catching anomalous behavior before it can escalate into something worse.

Automated Updates: Close the Gaps

Keeping every device up to date with the latest patches is crucial, especially in a distributed team where employees might be scattered across different locations. With Intune, you can automate updates across all your devices, ensuring that every endpoint is running the latest, most secure software. This isn’t just about convenience—it’s about closing vulnerabilities before they can be exploited. In a compliance-driven world, patch management isn’t optional; it’s essential.

Remote Wipe: Protecting Data When Things Go Wrong

We all know that things go wrong. Devices get lost, stolen, or compromised. When that happens, you need to be able to remotely wipe those devices to protect your data. Intune’s integration with Azure AD gives you that capability. Whether it’s a laptop left in a cab or a mobile device stolen from an airport, you can ensure that your data doesn’t end up in the wrong hands. This isn’t just a nice-to-have—it’s a critical part of your compliance strategy.

Auditing and Verification: Proving Compliance, Every Time

Implementing security measures is one thing; proving they’re in place and effective is another. When it comes to compliance, you need to be able to demonstrate that your controls are working as intended. With Office 365, you get the tools to monitor and report on the security status of every device in your fleet. This means you can provide the evidence needed for audits without having to dig through endless logs or manually compile reports. In the world of compliance, being able to prove you’re compliant is just as important as actually being compliant.

Teleport: Securing Kubernetes Access

Now let’s shift focus to securing access to your Kubernetes clusters. In a multicloud environment, you can’t afford to rely on old-school access methods. This is where Teleport comes in. Teleport is more than just an access proxy—it’s a security gateway designed to handle the unique challenges of a distributed, containerized environment.

Multi-Factor Authentication (MFA): Strengthen Your Security

Relying on passwords alone is a recipe for disaster. Teleport integrates seamlessly with your existing identity providers to enforce MFA, adding an essential layer of security to your Kubernetes environments. MFA isn’t optional anymore—it’s the bare minimum for securing access in any modern infrastructure. By requiring multiple forms of verification, you significantly reduce the risk of unauthorized access, even if credentials are compromised.

Session Recording: Keeping a Close Eye

Teleport’s session recording feature is a game-changer for both compliance and security. It records every action taken during a session, giving you a detailed audit trail that’s invaluable for both internal reviews and compliance audits. If something goes wrong, these logs allow you to pinpoint exactly what happened and who was involved. In environments where compliance is non-negotiable, session recording isn’t just helpful—it’s essential.

Granular Access Controls: Principle of Least Privilege

Granular access controls are a cornerstone of security, and Teleport excels here. By enforcing the principle of least privilege, you ensure that users only have access to the resources they need, nothing more. This minimizes the potential damage from both insider threats and external attacks. With Teleport, you can define who can access what, under what conditions, and for how long, giving you fine-grained control over your infrastructure.

Secure Global Access: No Matter Where You Are

In today’s world, your team is likely spread out across multiple locations, maybe even multiple continents. Secure access from anywhere isn’t just a convenience—it’s a necessity. Teleport enables secure, compliant access to your Kubernetes clusters from any location, ensuring that your team can work from anywhere without compromising on security.

Real-World Implementation: RKE2, Teleport, and Office 365 in Action

Let’s take a practical look at how this all comes together in a real-world scenario. Suppose you’re running RKE2 clusters on Hetzner and DigitalOcean, with Office 365 handling your endpoint security. Here’s how you’d ensure compliance across this diverse, multicloud environment.

Provisioning with Intune

First, every laptop is provisioned through Microsoft Intune. This ensures that the moment a device is powered on, it’s configured according to your security policies. BitLocker encryption is enabled, Microsoft Defender is up and running, and automated Windows updates are in place—all without the need for manual setup. This zero-touch provisioning approach ensures that every device starts off compliant, with no gaps or oversights.

Automated Teleport Installation

Using Group Policy Objects (GPOs), Teleport is automatically deployed on all devices. This ensures secure, logged access to Kubernetes clusters from day one. No one has to worry about whether they’ve got the right tools installed or if their access is secure—Teleport takes care of it all, ensuring that every connection is authenticated, authorized, and auditable.

Integration and Immediate Productivity

With everything pre-configured, employees can start working the moment they receive their devices. They power up, connect, and within minutes they’re accessing the resources they need, securely and in compliance with company policies. This approach not only maximizes productivity but also ensures that security isn’t sacrificed for the sake of convenience.

Continuous Compliance Monitoring

Office 365 and Intune provide continuous monitoring of all devices, ensuring ongoing compliance. Regular audits are conducted to verify encryption status, antivirus definitions, and software update levels. Teleport’s logging and session recording features also play a critical role in ensuring that access to Kubernetes clusters is always compliant with your security policies.

Audit Logging: The Foundation of Compliance

Audit logging isn’t just a technical requirement—it’s the foundation of any effective compliance strategy. Kubernetes provides native logging for API requests, which gives you a baseline level of visibility into user activities within your clusters. But for environments with strict compliance requirements, you need more.

Enhanced Logging with Teleport

Teleport elevates your logging capabilities, capturing every action taken during a session. This level of detail is essential for both security and compliance, allowing you to quickly identify and respond to issues as they arise. Whether it’s for troubleshooting, auditing, or forensic analysis, having detailed logs is crucial for maintaining control over your environment.

SIEM Integration

For organizations that need to centralize their monitoring and make sense of large volumes of log data, integrating Kubernetes logs with a Security Information and Event Management (SIEM) system is essential. SIEM integration allows you to detect anomalies in real-time, correlate events across your infrastructure, and streamline the process of generating compliance reports. It’s about turning raw log data into actionable insights that keep your infrastructure secure and compliant.

Managing Apple Devices with Intune and Apple Business Manager

In a world where your infrastructure is only as secure as its weakest link,

managing Apple devices can’t be an afterthought. If your organization uses Apple devices, Intune, combined with Apple Business Manager, provides a seamless way to manage these devices with the same rigor you apply to your other endpoints.

Seamless Integration with Apple Business Manager

Apple Business Manager integrates with Intune to streamline the management of Apple devices. This means devices can be automatically enrolled in Intune right out of the box, ensuring they comply with your security policies from the moment they’re powered on. This is especially useful for organizations that need to manage a large fleet of devices with minimal manual intervention.

Enforcing Security Policies

Just like with Windows devices, Intune allows you to enforce security policies on Apple devices. This includes everything from enforcing encryption to managing software updates and applying advanced threat protection. And if a device is lost or stolen, you can remotely wipe it, ensuring that sensitive data remains protected.

Unified Compliance Reporting

By managing both Windows and Apple devices through Intune, you get a unified view of your entire endpoint environment. This simplifies the process of auditing and ensures that all devices, regardless of platform, are held to the same security standards. In a multicloud, multi-device world, having a single pane of glass to manage compliance is a significant advantage.

Wrapping It Up: Securing and Managing Compliance in a Multicloud World

Running Kubernetes in a multicloud environment gives you the flexibility to optimize your infrastructure, but it also adds complexity, especially when it comes to security and compliance. By integrating tools like Teleport, Office 365, and Intune, you can effectively manage this complexity, ensuring that your infrastructure is not only functional but also secure and compliant.

Ensuring compliance across diverse environments requires a comprehensive approach—one that secures every endpoint, controls access to your clusters, and maintains detailed audit logs. With the right tools in place, you can build an infrastructure that’s both powerful and secure, keeping your operations running smoothly, no matter where your infrastructure or your team is located.

Gaining Total Control of Your Kubernetes Nodes with Custom Images

2024-09-05T00:00:00+00:00

When managing Kubernetes clusters, ensuring that every node is secure, consistent, and optimized is crucial. We’ve all experienced situations where nodes behave unexpectedly due to configuration drift, outdated software, or poorly maintained base images. A powerful solution to these problems is using custom images—the OS-level equivalent of well-crafted container images. These images guarantee that every node you provision is identical, secure, and optimized for your workloads. In this article, we’ll dive deep into how they provide total control, enhance security, and streamline operations in Kubernetes clusters, particularly when used with RKE2.

What Are Custom Images?

Custom images or sometimes called “golden images” are immutable, pre-configured system templates that include the operating system, necessary software, and specific configurations tailored to your environment. They serve as the foundation for every node in a Kubernetes cluster, whether it’s a control plane, worker node, or specialized component like a load balancer.

They are similar to container images in that they strip away unnecessary components, leaving only what’s essential for your Kubernetes environment. These images can be versioned and reused, ensuring consistency, security, and performance across nodes and environments.

Why could they be Essential for Kubernetes Deployments

1. Consistency Across Your Cluster

One of the biggest challenges in Kubernetes environments is maintaining consistency across nodes, especially when scaling dynamically. Over time, configuration drift can cause nodes to behave unpredictably, leading to hard-to-diagnose issues.

Custom images eliminate this risk by ensuring that every node starts from an identical, pre-tested configuration. Whether you’re spinning up control plane nodes or worker nodes, each node will have the same OS version, configuration settings, and software stack. This uniformity makes troubleshooting easier and ensures that scaling operations are smooth and predictable.

The consistency provided by them also extends across environments, such as development, staging, and production. With the same image in each environment, you can be confident that any issues found in testing won’t reappear due to configuration differences in production.

2. Security by Design

Public cloud images often come with extra packages, services, or outdated components that introduce security vulnerabilities. Additionally, these images are updated according to the cloud provider’s schedule, which means you might not always have the latest security patches applied when needed.

With custom images, you control what’s included. You can:

Start with a minimal OS, stripping out unnecessary components to reduce the attack surface.
Apply security hardening by disabling unnecessary services, enforcing secure SSH configurations, and locking down permissions.
Ensure compliance with industry standards, such as CIS benchmarks, right from the start.

By maintaining and regularly updating your own images, you ensure that every node is fully patched and secure when it’s provisioned. This immutable infrastructure approach eliminates configuration drift and ensures that each node is in a known, secure state from the moment it joins your cluster.

3. Speeding Up Node Provisioning

One of the biggest benefits of such images is the speed at which they allow you to provision new nodes. Traditional setups rely on cloud-init scripts or post-boot configuration steps, which can be slow and error-prone. If a script fails, you could end up with an incomplete node configuration, leading to instability.

With custom images, all the required configurations, binaries, and tools—such as RKE2, container runtimes, and monitoring agents—are baked into the image. This means nodes are ready to join your cluster as soon as they boot, without the need for lengthy initialization scripts. In dynamic environments where nodes are frequently scaled up or down, this rapid provisioning significantly reduces operational delays.

4. Full Control Over Your Node Environment

Using cloud provider images often means you’re stuck with whatever the provider includes, such as pre-installed vendor-specific agents or unnecessary software that might not align with your Kubernetes environment. Worse, the cloud provider can update these images without notice, introducing unexpected changes to your infrastructure.

With custom images, you’re in control. You select the base OS, the installed software, and the kernel version. You can fine-tune system configurations and security policies to meet the specific needs of your Kubernetes workloads. For example:

Optimize kernel parameters for performance in Kubernetes environments.
Remove unnecessary services and packages to minimize the attack surface and improve node efficiency.
Customize security policies to enforce organization-wide standards.

They also allow for vendor independence, ensuring that your nodes are configured identically across multiple cloud providers or on-premise environments. This flexibility is particularly important for organizations using a multi-cloud or hybrid-cloud strategy.

5. Embedding Custom Scripts and Self-Written Programs

One of the biggest advantages of their is the ability to embed your own tools, scripts, and self-written programs directly into the image, far exceeding what public cloud images can offer.

For example:

Custom Go programs can be pre-installed to automate node-specific tasks, like gathering enhanced metrics for autoscaling decisions or enforcing workload-specific policies.
Automation scripts can be embedded to handle logging, monitoring, and security scanning without needing additional setup post-deployment.
You can pre-install custom agents to monitor network activity or handle distributed logging, ensuring every node is configured consistently.

This flexibility allows you to create nodes that are fully prepared to handle the exact tasks and configurations required, eliminating the need for additional manual setup and providing a level of customization that public cloud images cannot match.

Real-World Example: Financial App Deployment on Hetzner and DigitalOcean

In a real-world scenario, I worked with a financial application where custom images were implemented across infrastructure on Hetzner and DigitalOcean. The primary goal was to ensure fast, consistent node provisioning while maintaining strict security standards.

Here’s how they were implemented:

Compliance: Every image was built to comply with CIS benchmarks, starting from a hardened base OS. This ensured that each node was secure from the start, with unnecessary services disabled and access tightly controlled.
Role-Specific Optimization: Different images were created for control plane nodes, worker nodes, agents, and load balancers. Each image was optimized for its specific task, ensuring that nodes were tailored to their role in the cluster.
Go-Based Access Proxy: The Teleport Access Proxy (written in Go) was baked into the images to provide secure, auditable access to each node. This ensured compliance with strict security and access control policies, with minimal post-deployment configuration.
Fast Provisioning: Node provisioning times were reduced to minutes, even in a highly regulated environment. With everything pre-configured in the image, nodes could immediately join the cluster without needing time-consuming cloud-init scripts or additional configuration steps.

The remaining infrastructure—such as networking, firewalls, and VPCs—was managed through Terraform, allowing for a highly automated and consistent deployment process. This combination of custom images and Terraform greatly simplified the process of deploying compliant, secure, and scalable infrastructure.

Avoiding Cloud Provider Limitations

Cloud provider images often come with vendor-specific tools, logging agents, and configurations that might not be relevant to your Kubernetes environment. Additionally, they are updated on the provider’s timeline, which can leave you with outdated packages or unanticipated changes.

You eliminate these limitations:

You can build the image exactly as needed, without the bloat of vendor-specific agents or unnecessary services.
You control when updates are applied, ensuring that your infrastructure remains secure and stable on your terms, not the provider’s.
In multi-cloud environments, using custom images allows for consistent node configurations across different cloud platforms, simplifying operations and avoiding vendor lock-in.

Automating Image Updates and Infrastructure with Terraform

Maintaining and updating images requires a streamlined process to ensure that they remain secure and up to date with the latest patches and software releases. Integrating your image-building pipeline with CI/CD automation and Terraform helps ensure that your infrastructure is always in sync with the latest configurations.

1. CI/CD Integration for Automated Image Building

By integrating custom image builds into your CI/CD pipeline, you can automatically trigger new builds whenever there are security patches or software updates. A typical workflow looks like this:

Trigger a Build: The CI/CD pipeline triggers an image build whenever a patch is released or a configuration change is required.
Automated Testing: Once the image is built, automated tests are run to ensure it works as expected. This includes security scans, performance tests, and conformance checks.
Versioning and Rollout: Each image is versioned, ensuring that changes are tracked and that you can easily roll back to a previous version if needed. Once tested, the new image is rolled out incrementally to ensure stability.

2. Terraform for Infrastructure Provisioning

While custom images take care of node configuration, Terraform manages the surrounding infrastructure, such as networking, firewalls, and VPCs. Terraform automates the provisioning of the entire Kubernetes environment, ensuring that:

Nodes are deployed with the correct network configurations.
Firewall rules are enforced, ensuring that only necessary ports are open.
The infrastructure is versioned and tracked, making it easy to replicate environments or roll back changes.

Using Terraform allows for a fully automated, infrastructure-as-code approach to managing Kubernetes deployments, from node configuration to networking and security.

Edge Computing and Custom Images: Expanding Kubernetes Beyond the Cloud

As Kubernetes expands beyond traditional cloud environments into edge computing, custom images become even more valuable. Edge nodes are often resource-constrained and operate in environments with limited network connectivity. They can be optimized to handle these unique requirements by:

Stripping down the OS and minimizing the resource footprint to improve performance on constrained hardware.
Preloading essential components and enabling local caching to reduce reliance on network connectivity.
Ensuring that nodes can function autonomously when disconnected from the control plane.

Specialized images designed for edge environments allow Kubernetes to be deployed in remote locations with minimal infrastructure while still ensuring consistency and security.

Long-Term Strategy: Evolving Your Infrastructure

Custom images are not just about solving today’s problems—they’re about future-proofing your Kubernetes infrastructure. As workloads evolve, compliance standards become stricter, and new technologies like AI/ML or edge computing take hold, they provide a flexible foundation that can adapt to new requirements without significant reengineering.

1. Modular and Adaptable

Images can be designed to be modular, allowing you to build images optimized for specific workloads. For example, you might have:

A lightweight image for resource-constrained environments (e.g., edge nodes).
A high-performance image optimized for AI/ML workloads with pre-installed libraries like TensorFlow or PyTorch.

2. Collaboration Between DevOps and Development Teams

Custom images help ensure that development, staging, and production environments are consistent. By embedding standard tools, libraries, and runtime environments directly into the images, you reduce the likelihood of “works on my machine” or “works in staging” issues. This accelerates collaboration between DevOps and development teams, enabling faster debugging and fewer production issues.

3. They life in GIT

These images can easiy withstand auditing procedures by using proper GitOps practises. Environments are not only clearly built with a version history but with the use of github actions, building your image becomes an easy workflow like this:

name: Build All DigitalOcean Snapshots

on:
  workflow_dispatch:

jobs:
  build_digitalocean:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

      - name: Set up Packer
        uses: hashicorp/setup-packer@v1

      - name: Build DigitalOcean snapshots
        run: make build-digitalocean
        env:
          DO_TOKEN: $
          DISCORD_WEBHOOK_URL: $

Deep Dive: Automating Custom Kubernetes Images with Packer

When managing Kubernetes clusters in any environment, using custom images ensures consistency, security, and speed during node provisioning. HashiCorp Packer simplifies the creation of these images, automating their build process so they can be pre-configured, secure, and Kubernetes-ready.

In this chapter, we’ll walk through using Packer to build a custom K3s image for DigitalOcean, ensuring the image is optimized, hardened, and lean. This approach minimizes the need for manual node configuration, making your infrastructure more efficient and reliable. This is of course not an complete example, yet I think it will get the point across.

Packer Template for DigitalOcean

This template provisions a K3s-ready image on DigitalOcean using Ubuntu, with firewall settings, SSH hardening, and clean-up steps to ensure the image is secure and lightweight. Importantly, it includes a reset of cloud-init to ensure fresh configuration on new instances and of course if you like, even initialize the image itself with a cloud-config.

source "digitalocean" "ubuntu-k3s" {
  image       = "ubuntu-20-04-x64"
  region      = "nyc3"
  size        = var.instance_size
  ssh_username = "root"
}

build {
  name    = "ubuntu-k3s-build"
  sources = ["source.digitalocean.ubuntu-k3s"]

  provisioner "shell" {
    inline = [
      # Update the system and install necessary tools
      "apt-get update -y",
      "apt-get upgrade -y",
      "apt-get install -y curl ufw",

      # Set up firewall rules for security
      "ufw allow OpenSSH",       # SSH access
      "ufw allow 6443/tcp",      # K3s API port
      "ufw allow 8472/udp",      # Flannel networking for K3s
      "ufw allow 10250/tcp",     # Kubelet communication
      "ufw enable",              # Enable firewall

      # Disable password-based SSH for added security
      "sed -i 's/^#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config",
      "systemctl restart sshd",

      # Install K3s (lightweight Kubernetes)
      "curl -sfL https://get.k3s.io | sh -",

    ]
  }

  # Clean up to minimize image size
  provisioner "shell" {
    inline = [
      "apt-get clean",                       # Clean apt cache
      "rm -rf /var/lib/apt/lists/*",         # Remove apt lists
      "rm -rf /tmp/*",                       # Clean /tmp
      "rm -rf /var/tmp/*"                    # Clean /var/tmp
    ]
  }

  # Reset cloud-init so the image is fresh for future instances
  provisioner "shell" {
    inline = [
      # Stop cloud-init to reset its state
      "systemctl stop cloud-init",

      # Clean up cloud-init logs and state to ensure new instance gets fresh initialization
      "rm -rf /var/lib/cloud/",
      "rm -rf /var/log/cloud-init.log /var/log/cloud-init-output.log",

      # Ensure cloud-init runs on new instances
      "touch /etc/cloud/cloud-init.disabled",   # Temporarily disable cloud-init
      "rm /etc/cloud/cloud-init.disabled"       # Re-enable for the next boot
    ]
  }
}

Let’s dive into the details

Source Configuration:
- The source uses Ubuntu 20.04 on DigitalOcean, with flexibility to select the instance size via the var.instance_size variable.
- The image is built in the nyc3 region, and root is used as the SSH user during the image build process.
Provisioning:
- System Updates and Tools: Updates the OS and installs essential tools like curl and UFW (firewall).
- Firewall Setup: Configures UFW to allow only required ports:
  - SSH (OpenSSH): For secure SSH access.
  - K3s API (6443/tcp): To allow K3s communication.
  - Flannel (8472/udp): For Kubernetes networking.
  - Kubelet (10250/tcp): Ensures Kubelet communication.
- SSH Hardening: Disables password-based SSH access to enforce key-based authentication, adding another layer of security.
- K3s Installation: Downloads and installs K3s, the lightweight Kubernetes distribution, making the node ready for your cluster.
Image Cleanup:
- After provisioning, the template runs a cleanup process to remove unnecessary files, package lists, and temporary data. This step is essential for keeping the image lightweight and secure by reducing the potential attack surface and improving node performance.
Resetting Cloud-Init:
- Cloud-init is reset to ensure that when a new instance is created from this image, it runs a fresh cloud-init cycle, pulling the correct instance-specific data like networking and metadata configurations.
- Logs and cloud-init states are wiped clean, guaranteeing that each new instance starts without residual configuration from the original build process.

Why This Template Works for Kubernetes

This Packer template is designed to make DigitalOcean Kubernetes nodes more efficient and secure. It ensures that each node is:

Pre-configured and consistent: With all necessary software and security settings baked into the image, every node will behave the same way, reducing potential configuration drift.
Hardened for security: The template applies some security best practices, including firewall rules and SSH hardening, minimizing potential attack vectors. You can see how we can easily go even much deeper if we wanted to.
Optimized for Kubernetes: K3s is pre-installed, and the node is verified as ready to join the cluster, reducing the need for post-boot configurations.

By automating the creation of custom Kubernetes images using Packer, you gain control over how your nodes are built, ensuring they are secure, consistent, and ready to scale. This template demonstrates how to efficiently build K3s-ready nodes that are hardened, optimized, and require minimal manual setup. By integrating Packer into your Kubernetes workflow, you can basically do what you want and also save time, increase operational security, and improve the overall efficiency of your infrastructure.

What now?

With this setup you can initialize the kubernetes node by simply providing the k3s config through cloud-init in terraform. Or through the provider API. Or even through kubernetes operators ;).

TLDR: Custom Images as the Backbone of Kubernetes Success

In today’s fast-paced cloud-native environments, where scalability, security, and flexibility are critical, custom images are the backbone of successful Kubernetes deployments. By building and maintaining them tailored to your workloads, you ensure that your nodes are consistently configured, secure, and ready to meet the demands of production environments—whether in the cloud, at the edge, or on-prem.

From fast provisioning and automated updates to resource optimization and enhanced resilience, they are a powerful tool for any organization leveraging Kubernetes. When combined with automation tools like Terraform and integrated into a broader DevOps strategy and offer the control and flexibility needed to manage modern infrastructure at scale.

By adopting golden images, you are not only solving the challenges of today but also preparing your infrastructure for the future—whether that involves scaling across multiple clouds, expanding to edge environments, or adopting new workloads. They give you the foundation to build a Kubernetes environment that is predictable, secure, and adaptable to the ever-changing demands of modern cloud-native applications.

Building Resilience with kube-probesim

2024-09-02T00:00:00+00:00

As the creator of kube-probesim, I wanted to solve a specific problem: simulating how Kubernetes applications handle liveness and readiness probe failures. This tool was born out of a need to quickly replicate real-world failure conditions like random probe failures, latency spikes, and failing external dependencies in a controlled manner.

For many Kubernetes developers and operators, building resilient systems requires more than just passing happy path tests. You need to understand how your application behaves under stress or failure. Kubernetes’ liveness and readiness probes are crucial in this respect, but testing them without good tools can be tricky. That’s where kube-probesim comes in. Let me explain how you can leverage it and how easy it is to deploy, thanks to its availability in the GitHub Container Registry.

Why kube-probesim?

When I first started using Kubernetes in production, one of the challenges was ensuring applications were resilient enough to withstand failures—especially during scale-ups, unexpected traffic spikes, or external service downtimes. Probes (liveness and readiness) play a crucial role in ensuring the stability of the system, but testing these probes in realistic scenarios is difficult. kube-probesim simplifies this.

Here’s what it does:

Probe Failures: Simulate random or time-triggered liveness/readiness probe failures.
Latency Simulation: Introduce configurable network latencies.
External Dependency Failure: Simulate downstream service failures.
Network Partitioning: Test behavior under simulated network issues.

The goal is to empower developers and operators to test probe handling under controlled, configurable failure conditions without introducing a lot of complexity.

Deploying kube-probesim from GitHub Container Registry

Since I’m hosting kube-probesim in the GitHub Container Registry, deploying it into any Kubernetes cluster is a breeze. No need to build the image yourself—just pull the container directly from the registry.

Here’s how you can do it in your own environment:

Add kube-probesim Deployment

Create a Kubernetes deployment YAML with the GitHub-hosted container image:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-probesim
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-probesim
  template:
    metadata:
      labels:
        app: kube-probesim
    spec:
      containers:
      - name: kube-probesim
        image: ghcr.io/jhoelzel/kube-probesim:latest
        ports:
        - containerPort: 8080
        env:
        - name: FAILURE_RATE
          value: "20"
        - name: LATENCY
          value: "50"
        - name: LIVENESS_FAIL_AFTER_TIME
          value: "60"
        - name: READINESS_FAIL_AFTER_TIME
          value: "120"

This YAML uses the official kube-probesim container from GitHub’s Container Registry, configuring it with a 20% failure rate and 50ms of added latency.

Deploy it

Apply the YAML to your Kubernetes cluster:
```
kubectl apply -f kube-probesim-deployment.yaml
```
Kubernetes will pull the image directly from GitHub and start the kube-probesim pod with the parameters you’ve set.

Real-World Scenarios

Here’s how I’ve used kube-probesim in the wild:

1. Random Failures

To simulate a 10% failure rate across both probes, with a 100ms response delay:

   FAILURE_RATE=10 LATENCY=100 ./kube-probesim

This allows me to test how Kubernetes reschedules pods when failures start happening unpredictably, helping ensure the system can handle such events without downtime.

2. Failing External Dependencies

Using the /dependency endpoint with FAIL_DEPENDENCY=true helped me test scenarios where downstream services were unavailable, helping me tweak timeouts and retries for critical service dependencies.

   FAIL_DEPENDENCY=true ./kube-probesim

Observability and Monitoring

If you’re running kube-probesim in a production-like environment, don’t forget to integrate your monitoring and logging stack (e.g., Prometheus or Grafana). Observing probe failures over time will help you understand how Kubernetes reacts and will give you insights into system recovery times and automatic pod rescheduling.

TLDR: A tool to simulate probe failure

The beauty of kube-probesim lies in its flexibility and how easy it is to deploy thanks to its GitHub Container Registry image. Whether you need to test random failures, network issues, or external service dependencies, kube-probesim lets you simulate these failures in a safe, controlled environment. It’s a simple but powerful way to make your applications more resilient before they go live in production.

Next Steps:

Get the image from the GitHub Container Registry.
Deploy kube-probesim in your staging environment, and start testing!

Remember, production is unforgiving. kube-probesim gives you the edge by helping you identify and fix failure points before they cause issues for real users.

go_wait_for_k8s

2024-09-01T00:00:00+00:00

In Kubernetes, ensuring that dependent services are ready before your application starts can be a critical task. For instance, your application might rely on a PostgreSQL database, and you need to make sure the database is fully initialized and ready to accept connections before the app itself starts. Handling this properly often requires custom scripts or tools to manage readiness checks, which can get complex and error-prone.

This is where go_wait_for_k8s comes in. Designed to run as an InitContainer within your Kubernetes pods, it ensures that critical dependencies are ready before your main application container starts. This approach guarantees that your application only launches when its dependencies are fully operational, making your deployments more robust and reliable.

The Problem: Ensuring Dependencies are Ready

When deploying applications that depend on other services—like a web application that requires a database or a cache—it’s crucial to ensure those services are ready before your application attempts to interact with them. If not handled properly, your application might fail to start or exhibit erratic behavior due to unavailable resources.

For example:

Database Dependencies: Your application might fail to start or encounter connection issues if the database is not ready.
Service Dependencies: Microservices might require other services to be available and ready before they can function correctly.

Traditionally, this has been managed with custom scripts or by building readiness checks directly into the application. However, these methods can be cumbersome and are often prone to failure.

How `go_wait_for_k8s` Solves This Problem

go_wait_for_k8s is specifically designed to run as an InitContainer in your Kubernetes pod, ensuring that the necessary dependencies are ready before your main application container starts. It interacts with the Kubernetes API to check the readiness of specific resources like pods, deployments, or services.

Why Use an InitContainer?

Using an InitContainer ensures that go_wait_for_k8s runs before any other containers in the pod. The InitContainer blocks the startup of the main application container until the specified conditions are met, guaranteeing that your application only starts when its dependencies are available.

Example Use Case: Waiting for a PostgreSQL Database

Let’s consider a common scenario where your application relies on a PostgreSQL database. You want to ensure that the PostgreSQL deployment is fully rolled out and the pod is ready before your application starts. Here’s how you can use go_wait_for_k8s to handle this scenario.

Step 1: Define the `InitContainer` in Your Pod Spec

In your application’s Kubernetes deployment, you would define an InitContainer that runs go_wait_for_k8s to wait for the PostgreSQL deployment to be ready.

Here’s an example deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app-image:latest
        ports:
        - containerPort: 8080
      initContainers:
      - name: wait-for-postgres
        image: ghcr.io/jhoelzel/go_wait_for_k8s:latest
        args:
        - "--namespace=app"
        - "--resource=deployment"
        - "--name=postgres"
        - "--condition=available"

In this YAML:

The InitContainer named wait-for-postgres uses the go_wait_for_k8s image.
It waits for the PostgreSQL deployment in the app namespace to reach the available condition, meaning all pods in the deployment are ready.
Only after this check passes will the main application container (my-app) start.

Step 2: Deploy to Kubernetes

You can deploy this configuration to your Kubernetes cluster using kubectl:

kubectl apply -f my-app-deployment.yaml

With this setup, the InitContainer ensures that your application does not start until the PostgreSQL deployment is fully ready, preventing issues related to premature starts.

Using `go_wait_for_k8s` for Other Dependencies

While the PostgreSQL example is common, go_wait_for_k8s can be used to wait for various types of resources. Here are a few examples:

Waiting for a Redis Pod: Ensure that a Redis pod is ready before starting your caching service.

initContainers:
- name: wait-for-redis
  image: ghcr.io/jhoelzel/go_wait_for_k8s:latest
  args:
  - "--namespace=cache"
  - "--resource=pod"
  - "--name=redis-pod"
  - "--condition=ready"

Waiting for an API Service: Ensure that a critical API service is available before your microservice starts.

initContainers:
- name: wait-for-api
  image: ghcr.io/jhoelzel/go_wait_for_k8s:latest
  args:
  - "--namespace=services"
  - "--resource=deployment"
  - "--name=api-service"
  - "--condition=available"

Security and RBAC Considerations

Running go_wait_for_k8s as an InitContainer also integrates well with Kubernetes’ RBAC (Role-Based Access Control) system. By assigning a service account with minimal permissions to the InitContainer, you can ensure that it only has access to the resources it needs to query.

Here’s an example of how you might configure RBAC for go_wait_for_k8s:

Create a Role with Minimal Permissions

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: app
  name: go-wait-for-k8s-role
rules:
- apiGroups: [""]
  resources: ["pods", "deployments"]
  verbs: ["get", "list", "watch"]

Bind the Role to a Service Account

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: go-wait-for-k8s-binding
  namespace: app
subjects:
- kind: ServiceAccount
  name: go-wait-for-k8s-sa
  namespace: app
roleRef:
  kind: Role
  name: go-wait-for-k8s-role
  apiGroup: rbac.authorization.k8s.io

Assign this service account to the InitContainer:

spec:
  initContainers:
  - name: wait-for-postgres
    image: ghcr.io/jhoelzel/go_wait_for_k8s:latest
    args:
    - "--namespace=app"
    - "--resource=deployment"
    - "--name=postgres"
    - "--condition=available"
    serviceAccountName: go-wait-for-k8s-sa

This setup ensures that go_wait_for_k8s can only access the necessary Kubernetes resources, adhering to the principle of least privilege.

TLDR: a simple init container to wati for your resources

go_wait_for_k8s is a simple yet powerful tool designed to enhance the reliability of your Kubernetes deployments. By running as an InitContainer, it ensures that critical dependencies are ready before your application starts, reducing the risk of failed starts and other issues related to unavailable resources.

Whether you’re managing databases, services, or any other type of Kubernetes resource, go_wait_for_k8s provides a streamlined way to handle readiness checks natively within your Kubernetes environment. With minimal setup, you can make your applications more resilient and your deployments more predictable.

For more information, to see the source code, or to contribute, visit the GitHub repository. As always, feedback and contributions are welcome. Let’s keep making Kubernetes a more robust platform, one InitContainer at a time.

Kuberntes Access Proxies

2024-09-01T00:00:00+00:00

Kubernetes has revolutionized the way we manage and deploy applications by providing powerful tools for orchestrating containers at scale. However, as your Kubernetes environment grows, so does the complexity of managing access. Kubernetes’ native Role-Based Access Control (RBAC) is essential but insufficient on its own to address the challenges of scale, security, and compliance. This is where an access proxy, such as Teleport, becomes indispensable.

This post will explore why using an access proxy is critical for your Kubernetes deployment, discuss the importance of robust RBAC from the outset, and provide an in-depth look at alternatives like authentik and others, helping you choose the right tool for your environment.

The Complexity of Kubernetes Access Management

When you start with Kubernetes, managing access seems manageable. Kubernetes offers RBAC, allowing you to define permissions for users and service accounts to perform specific actions. However, as you add more teams, services, and layers of abstraction, the complexity increases exponentially.

Challenges of RBAC Management at Scale

RBAC in Kubernetes allows for fine-grained access control, but it comes with significant challenges:

Complexity of Policies: As the number of users and roles grows, maintaining and auditing RBAC policies manually becomes a Herculean task. Misconfigurations are common, leading to either overly permissive roles that introduce security risks or overly restrictive ones that hinder productivity.
Lack of Visibility: Without proper tools, it’s challenging to track who has access to what resources, when they accessed them, and what actions they performed. This lack of visibility can lead to unauthorized access going unnoticed.
Operational Overhead: As the environment scales, the manual effort required to manage RBAC increases, leading to higher operational overhead and potential errors.

In large environments, these challenges make it clear that relying solely on native RBAC is not sustainable. This is where an access proxy comes into play.

What Is an Access Proxy and Why You Need One

An access proxy acts as a centralized gateway that manages, secures, and logs all access to your Kubernetes clusters and other infrastructure. By funneling all access through this proxy, you gain better control, enhanced security, and detailed auditing capabilities.

Centralized Access Management

One of the key advantages of an access proxy like Teleport is centralized access management. With a single point of control, you can enforce consistent security policies across your infrastructure, whether for SSH access, Kubernetes clusters, databases, or internal web applications.

Consistency: Centralized management ensures that security policies are applied uniformly across all environments, reducing the risk of misconfigurations.
Efficiency: It simplifies the administration of access controls, making it easier to manage who has access to what without needing to touch each individual component separately.
Reduced Attack Surface: By centralizing access, you minimize the number of entry points an attacker could exploit, effectively reducing your overall attack surface.

Compliance and Auditing Made Easy

Compliance with regulations like GDPR, HIPAA, and SOC 2 requires detailed records of access to sensitive data. An access proxy simplifies this by automatically logging all access attempts and actions, providing a comprehensive audit trail that can be used for compliance reporting.

Audit Logs: Detailed, immutable logs of every access attempt and action provide the visibility needed to demonstrate compliance during audits.
Session Recording: Advanced features like session recording allow you to capture and replay sessions, providing a forensic tool for investigating suspicious activity and proving compliance.

Teleport: A Deep Dive into Its Features

Teleport is a popular choice as an access proxy for Kubernetes due to its robust feature set tailored to cloud-native environments. Here’s what makes it stand out:

Unified Access Management

Teleport offers unified access management across various protocols and services:

SSH and Kubernetes Access: Manage SSH access and Kubernetes API access with the same set of RBAC policies, ensuring consistency across different parts of your infrastructure.
Database Access: Extend your access management to databases, allowing centralized control over who can access and modify data, with detailed logging of all interactions.
Web Application Access: Use Teleport to manage access to internal web applications, ensuring only authorized users can access critical tools and dashboards.

Comprehensive Compliance Features

Teleport’s compliance features are designed to meet the needs of highly regulated industries:

Detailed Audit Logs: Logs every access attempt, whether successful or not, and every command executed, providing an auditable trail for compliance and security.
Session Recording: Capture entire user sessions, which can be replayed to review actions taken, crucial for both security investigations and compliance audits.
Integration with SIEMs: Teleport integrates with popular Security Information and Event Management (SIEM) tools, allowing you to aggregate logs and correlate them with other security data for comprehensive monitoring.

Easy Integration and Deployment

Teleport is designed for easy deployment and integration into existing environments:

Helm Charts: Official Helm charts allow for straightforward deployment within Kubernetes, enabling rapid deployment and scaling of Teleport.
API and CI/CD Integration: Teleport provides a robust API, allowing integration with existing CI/CD pipelines, ensuring that access controls are enforced throughout your development lifecycle.

Alternatives to Teleport: A Closer Look

While Teleport is a powerful solution, there are other tools that might be a better fit depending on your specific needs. Let’s explore authentik and other alternatives.

Authentik

Overview: Authentik is an open-source identity provider (IdP) designed to offer maximum flexibility and integration capabilities. While it’s primarily an identity provider, it also functions effectively as an access management tool in Kubernetes environments.

Key Features:

Identity Provider (IdP): Authentik serves as a central identity provider, supporting Single Sign-On (SSO) across a wide range of applications and services.
Multi-Factor Authentication (MFA): Supports MFA, enhancing the security of access to critical resources.
Custom Workflows: Authentik stands out for its ability to implement custom authentication and authorization workflows, giving administrators the flexibility to tailor the solution to their specific needs.
Protocol Support: Supports all major authentication protocols, including OAuth2, SAML, LDAP, and SCIM, making it highly versatile for integration with various services and applications.

Use Cases: Authentik is ideal for organizations looking for an open-source, highly customizable identity management solution that integrates well with existing infrastructure, including Kubernetes. Its ability to adapt to complex environments makes it a strong alternative for those who need more than what traditional access proxies offer.

Boundary by HashiCorp

Overview: Boundary is a tool by HashiCorp that provides secure, identity-based access management, particularly focused on dynamic cloud environments.

Key Features:

Fine-Grained Access Controls: Boundary provides strict identity-based access controls, ensuring that only authenticated users can access sensitive systems.
Session Logging: Offers comprehensive session logging and auditing capabilities, essential for compliance and security monitoring.
Seamless Integration: Works seamlessly with other HashiCorp tools, such as Vault for secrets management, making it a good fit for organizations already using the HashiCorp stack.

Use Cases: Boundary is well-suited for organizations that require secure, auditable remote access to critical infrastructure, particularly those already using HashiCorp tools.

StrongDM

Overview: StrongDM offers a unified approach to managing access across databases, servers, and Kubernetes clusters, with a strong focus on ease of use and comprehensive auditing.

Key Features:

Centralized Access Management: StrongDM centralizes access control across various systems, providing a single interface for managing permissions.
Automated Auditing: Automatically logs all access activities, creating a complete audit trail that is essential for compliance.
Ease of Use: Known for its user-friendly interface, StrongDM simplifies the management of access across multiple platforms, making it accessible even to non-experts.

Use Cases: StrongDM is particularly beneficial for organizations that need a unified access management solution that is easy to deploy and use, with strong compliance features.

What does it all mean?

As your Kubernetes environment grows, so does the complexity of managing access. Relying solely on Kubernetes’ native RBAC can leave you vulnerable to security risks and compliance issues. Implementing an access proxy like Teleport, authentik, or another solution is essential for centralizing access control, enhancing security, and ensuring compliance.

Teleport offers a comprehensive solution with features tailored for cloud-native environments, while authentik provides a flexible, open-source alternative with strong identity management capabilities. Boundary and StrongDM offer additional options, each with its own strengths, depending on your specific requirements.

Take the time to evaluate your current access management setup and consider integrating an access proxy to secure your Kubernetes environment. The peace of mind that comes from knowing your access controls are robust and compliant is invaluable.

Streamlining Helm Chart Management with Argo Helm Versioner

2024-08-31T00:00:00+00:00

For anyone managing Kubernetes environments with Argo CD, keeping Helm charts up-to-date is a routine but essential task. Argo Helm Versioner is a straightforward tool that helps automate this process, ensuring your deployments stay current without adding extra overhead.

Why Helm Chart Versioning Can Be a Headache

If you’ve ever found yourself manually checking which versions of Helm charts are running in your environment, you know how tedious and error-prone it can be. Argo Helm Versioner takes this off your plate by automatically scanning your project directories, identifying Argo CD applications that use Helm charts, and comparing the deployed versions with the latest available versions in the Helm repositories.

Streamlining Your DevOps Workflow with Argo Helm Versioner

Argo Helm Versioner integrates seamlessly into your existing workflows. Whether you’re running it as part of your CI/CD pipeline, performing regular checks, or just doing a one-off audit, this tool simplifies the process:

Deep Directory Scanning: No more manual hunting. The tool finds every relevant YAML file, even if it’s buried deep in your directory structure.
Accurate Version Comparison: By leveraging semantic versioning, Argo Helm Versioner ensures that even minor differences between versions are detected.
Clear Output: The tool presents results in an easy-to-read table, showing you at a glance which applications need an update.

Real-World Example: Using Argo Helm Versioner with Popular Helm Charts

Let’s explore how this might look with some widely-used Helm charts. Suppose you’re managing a Kubernetes environment that includes several popular services:

Nginx Ingress Controller: A crucial component for managing external access to your services.
Prometheus: The go-to solution for monitoring and alerting.
Grafana: Paired with Prometheus for powerful data visualization.
Redis: Often used as a caching layer or message broker.
ElasticSearch: Commonly deployed for search and analytics.

Here’s what the output might look like after running Argo Helm Versioner:

Application             FilePath                                    Current Version   Latest Version   Status           
nginx-ingress           /apps/nginx-ingress/argo-app.yaml           4.0.6             4.3.0            Update available  
prometheus              /apps/prometheus/argo-app.yaml              14.8.0            15.4.0           Update available  
grafana                 /apps/grafana/argo-app.yaml                 6.17.4            7.2.0            Update available  
redis                   /apps/redis/argo-app.yaml                   15.3.1            15.4.0           Up-to-date       
elasticsearch           /apps/elasticsearch/argo-app.yaml           7.10.1            7.12.1           Update available  

In this scenario:

The nginx-ingress, prometheus, grafana, and elasticsearch services all have newer versions available, prompting you to review and potentially update these applications.
The redis service is up-to-date, giving you peace of mind that no action is needed there.

This output gives you a clear, concise overview of the status of your Helm charts, enabling you to quickly address any outdated services.

Customizing and Extending Argo Helm Versioner

While Argo Helm Versioner is designed to be simple, it’s also flexible. You can easily integrate it with other tools in your pipeline or extend its functionality to suit your specific needs. Whether you want to automate updates, trigger notifications, or just keep a clean report of your Helm chart versions, this tool has you covered.

Practical Use Cases: A Handy Tool for Daily Operations

Imagine you’re managing a microservices architecture with several teams. Argo Helm Versioner can help ensure that all your services are using the most current and secure versions of Helm charts. It’s not about solving world problems—it’s about making your life a bit easier by automating a routine task that, left unchecked, could lead to issues down the road.

Conclusion: A Simple Tool for a Simple Task

Argo Helm Versioner is exactly what it sounds like: a tool to help you keep your Helm chart versions in check. It’s not trying to be more than it is. It’s a practical, no-frills solution for those who want to automate version checks and avoid the headache of manual updates. If you’re using Helm and Argo CD, it’s worth a look. For more information and to get started, check out the Argo Helm Versioner GitHub repository. Contributions and feedback are welcome as the tool continues to evolve based on real-world needs.

Demystifying etcd

2023-05-08T00:00:00+00:00

By now, you are no strangers to the challenges of managing distributed systems. Enter etcd(pronounced /ˈɛtsiːdiː/,), the unsung hero that quietly toils behind the scenes to keep these systems in check. etcd is an open source distributed key-value store tailor-made to hold and administer the crucial information that distributed systems, such as Kubernetes, need to function smoothly.

At the heart of distributed systems lie multiple machines working in harmony to achieve a shared objective. Kubernetes utilizes etcd as its reliable data store, providing a consistent view of the system’s status, including clusters, pods, and running application instances etc. If etcd were to pen an autobiography, it might be titled, “From Linux Directory to Distributed Data Store: My Journey.” Because the name “etcd” originates from a naming convention within the Linux directory structure. In UNIX, all system configuration files for a single system are contained in a folder called “/etc,” and the “d” in “etcd” stands for “distributed.”

But why etcd, you ask?

This versatile tool is:

Fully replicated: Every node has access to the entire data store, ensuring data availability and redundancy.
Highly available: With no single point of failure, etcd can gracefully handle hardware failures and network partitions.
Reliably consistent: Every data read returns the most recent write, avoiding conflicts or discrepancies.
Fast: etcd boasts a benchmark of 10,000 writes per second.
Secure: Automatic TLS, optional SSL client certificate authentication, and role-based access controls protect sensitive data.
Simple: Applications can interact with etcd using standard HTTP/JSON tools.

One might say that etcd is like a high-performance engine and it works best with fast storage disks (SSDs are highly recommended).

Built on the Raft consensus algorithm

Etcd ensures data store consistency across all nodes in a cluster, maintaining highly available and consistently replicated copies of the data store. Raft elects a leader node that manages replication for the followers, creating a harmonious and efficient system.

The leader accepts requests from clients and forwards them to follower nodes. The leader verifies that a majority of follower nodes have stored each new request and only then applies the entry to its local state machine. If this is not the case, the leader will retry until all followers have stored all log entries consistently. This helps the system using it by providing an equal query surface to all processes that seek the information, not matter which node they query. Furthermore if a follower node fails to receive a message from the leader within a specified time interval, an election is held to choose a new leader automatically. It declares itself a candidate, and the other followers vote for it or any other one based on their availability.

In the worst case scenario, where the majority of etcd nodes fail, the cluster will not accept any more writes to it and will maintain its state. The cluster can only recover from a majority failure once the majority of members become available and reach concensious. If they can not, then the operator must start disaster recovery to recover the cluster by themselves.

In the realm of Kubernetes, etcd plays a vital role as the primary key-value store for cluster state data. Kubernetes leverages etcd’s “watch” function to monitor data and reconfigure itself when changes arise. This allows Kubernetes to maintain an ideal state and quickly respond to any discrepancies. The “watch” function stores values representing the actual and ideal state of the cluster and can initiate a response when they diverge. This is the little magic, that lies in every kubernetes cluster and what makes them work in the first place.

Etcd VS Redis

Now, let’s address the elephant in the room: etcd vs. Redis. While both are open source tools, their primary functions differ significantly. Redis is an in-memory data store that excels in speed, but etcd trades off speed for greater reliability and guaranteed consistency, making it the go-to choice for storing and managing distributed system configuration data. Redis supports a wider variety of data types and structures than etcd and has much faster read/write performance. However, etcd has superior fault tolerance, stronger failover, and continuous data availability capabilities. Most importantly, etcd persists all stored data to disk, trading off speed for greater reliability and guaranteed consistency.

Etcd VS SQL

And what about etcd vs. MySQL and PostgreSQL? While all three are data storage solutions, etcd is a distributed key-value store specifically designed for distributed systems, while MySQL and PostgreSQL are traditional relational databases adept at handling structured data with relationships. Choosing the right data storage solution is like selecting the perfect pair of shoes: it depends on the occasion, or in this case, the specific requirements of your system.

In a nutshell, etcd is a steadfast sentinel keeping distributed systems, like Kubernetes, in harmony. Its design prioritizes fully replicated, highly available, reliable, fast, secure, and simple data storage. With its superior fault tolerance and data persistence capabilities, etcd is the ideal candidate for storing and managing distributed system configuration information, standing steadfast in its duty to maintain the state of distributed systems.

Traditional databases like MySQL and PostgreSQL prioritize consistency and durability, but their fault tolerance capabilities depend on the specific setup and configuration. While etcd is ideal for managing critical information in distributed systems, MySQL and PostgreSQL are versatile relational databases that handle structured data with relationships.

NOTE: Of course you can also store cluster state in a relational database, K3S will happily do that for you, but once you actually get it up and running your MySQL setup will literally be overtaken by state queries. I have tried on many occasions to make it work, and an effective system WILL require absolute finetuning from your end.

Etcd’s Ecosystem and Extensibility

etcd’s importance extends beyond Kubernetes, as it can also be utilized to coordinate critical system and metadata across clusters of various distributed applications. Its simplicity and consistency make it an attractive choice for any distributed system that requires a reliable data store.

Its API enables seamless integration with a myriad of applications, and developers can easily interact with it using popular programming languages like Go, Python, Java, and more. Moreover, etcd’s extensible architecture allows it to act as a foundation for building custom distributed systems tailored to specific use cases and requirements.

NOTE: While you definitly can, its not recommended to modify the state of a kubernetes cluster directly, whitout using the API.

The etcd community continues to grow and evolve, contributing to the development and enhancement of it. A plethora of third-party tools, libraries, and frameworks have emerged to simplify the deployment, management, and monitoring of etcd clusters. Some notable tools in the etcd ecosystem include:

etcdctl: A command-line client for managing etcd. etcd-operator: A Kubernetes operator that automates etcd cluster management tasks. etcdadm: a command-line tool for operating an etcd cluster. It makes it easy to create a new cluster, add a member to, or remove a member from an existing cluster. Its user experience is inspired by kubeadm. etcd-backup-operator takes backups of ETCD instances on both the control plane and tenant clusters. ETCD Manager Provide an efficient, modern GUI for desktop

A more extensive list of tools and libraries can be found in the (etcd docs itself](https://etcd.io/docs/v3.5/integrations/]

Things to consider when Deploying and Managing etcd

As custodians of complex distributed systems, engineers need to understand the importance of deploying and managing etcd with care. Adhering to best practices is essential to ensure the health and performance of an etcd cluster. Some key recommendations include:

Hardware considerations: Use SSDs for storage, as etcd’s performance relies heavily on disk speed. Provide adequate memory and CPU resources to accommodate the cluster’s workloads. Configuration: Tune etcd’s configuration parameters to optimize performance for your specific use case, considering factors like network latency and data size. Monitoring and alerting: Implement monitoring solutions, such as Prometheus and Grafana, to track etcd performance metrics and set up alerts for potential issues. Backup and recovery: Regularly create and test backups of your etcd data to ensure swift recovery in the event of data loss or corruption. Security: Implement role-based access controls, TLS, and SSL client certificate authentication to protect sensitive configuration data and restrict access to authorized personnel only.

In Conclusion

etcd is a robust and reliable distributed key-value store that quietly underpins distributed systems like Kubernetes. Its fully replicated, highly available, reliable, fast, secure, and simple design ensures that it remains a steadfast guardian of critical information in distributed systems. With its unique advantages over Redis, MySQL, and PostgreSQL in the realm of distributed system configuration management, etcd has earned its place as an essential tool for distributed systems.

As the etcd ecosystem continues to flourish, and best practices for its deployment and management are widely adopted, the future looks promising for this unsung hero. With etcd, distributed systems can thrive and achieve their full potential, all while maintaining harmony and consistency in the face of ever-growing complexity.

Sources

Fixing a Kubernetes Namespace Stuck in Terminating State

2023-05-04T00:00:00+00:00

Kubernetes Namespaces play a crucial role in managing resources within a Kubernetes cluster. They allow for the organization and isolation of resources, enabling efficient management of large-scale applications. However, occasionally, Kubernetes Namespaces can become stuck in the Terminating state, preventing us from creating new resources or updating existing ones. In this article, I will delve into various potential causes for this behavior and outline some strategies, for you to overcome these obstacles.

Diagnosing and Troubleshooting Stuck Namespaces

Before diving into the potential causes, it is essential to diagnose and troubleshoot stuck Namespaces using Kubernetes events and logs. Common patterns and error messages may simply present you with the underlying issues.

The first step is always to retrieve events within the Namespace:

kubectl get events -n

Once you have identified an Issue, you can review logs of related resources, such as Pods, to identify potential issues:

kubectl logs -n   []
# to view the logs of all containers of a pod
kubectl logs -n   --all-containers
# to follow the logs in real-time use
kubectl logs -n   [] --follow
# for already deleted container logs use 
kubectl logs -n   [] --previous

Cause 1: Finalizers

Finalizers are an essential aspect of the Kubernetes resource lifecycle, ensuring that specific actions occur before an object is deleted. These actions might include releasing external resources, cleaning up associated data, or notifying other components of the deletion. Kubernetes uses finalizers to implement graceful deletion for resources, allowing them to complete any required cleanup tasks before the object is permanently removed.

However, finalizers can sometimes prevent a Namespace from being deleted if the required cleanup tasks are not completed or if a custom finalizer is implemented incorrectly. In such cases, it’s crucial to identify and remove finalizers within the Namespace to allow for its smooth deletion. Custom finalizers or non-standard resources, such as Custom Resource Definitions (CRDs), might also be preventing the Namespace from being deleted.

In order to identify and address finalizer-related issues in a Namespace:

List the Namespace’s configuration in JSON format to find existing finalizers:

kubectl get namespace  -o json

examine the finalizers field under metadata to identify any finalizers associated with the Namespace.

If the finalizers appear to be standard Kubernetes finalizers, investigate the Namespace’s resources and their statuses to identify any issues preventing the finalizers from completing their tasks.

As for custom or non-standard finalizers, review the associated components or controllers responsible for implementing the finalizers. Ensure they are functioning correctly and completing their intended tasks. Fix any issues or misconfigurations that might be preventing the finalizers from completing.

If the finalizers are still causing issues, you can consider removing them manually as a last resort. Be aware that manually removing finalizers can result in incomplete cleanup tasks, potentially leaving orphaned resources or data:

kubectl patch namespace  -p '{"metadata":{"finalizers":[]}}' --type=merge

This command will remove all finalizers from the Namespace, allowing it to be deleted. However, use this approach cautiously and only when necessary, as it bypasses the standard deletion process.

Cause 2: Admission Webhooks

Admission webhooks, including both mutating and validating types, play a vital role in the Kubernetes API request processing pipeline. These webhooks intercept and modify or validate requests, ensuring they meet specific criteria before being processed further. However, they can sometimes interfere with the deletion of resources and cause a Namespace to become stuck in the Terminating state. To address this issue, follow these steps:

List the configured admission webhooks in your cluster to identify any that might impact the Namespace you are trying to delete:

kubectl get mutatingwebhookconfigurations,validatingwebhookconfigurations

Examine the listed webhooks and their configurations to determine if any of them are preventing the deletion of resources within the Namespace. For example, a webhook might be configured with a rule that prevents the deletion of specific resources or returns an error upon deletion attempts.
Temporarily disable or modify the problematic webhook configuration to allow the Namespace to be deleted. Be cautious when disabling webhooks, as it may have unintended side effects on other resources in the cluster.

For instance, to disable a validating webhook temporarily, you can edit its configuration:

kubectl edit validatingwebhookconfigurations

Set the failurePolicy to Ignore and save the changes. This action will cause Kubernetes to ignore any errors returned by the webhook and proceed with the deletion process.

NOTE: After resolving the issue and successfully deleting the Namespace, remember to re-enable or restore the webhook configuration to its original state to maintain the intended functionality of your cluster.

By identifying and addressing any issues related to admission webhooks, you can efficiently resolve Namespace deletion problems and ensure a smooth Kubernetes experience. Keep in mind that modifying webhook configurations will have broader implications on your cluster, so exercise caution and thoroughly understand the impact of any changes before implementing them.

Cause 3: Cluster Stability and Health

In some cases, a Namespace may be stuck due to underlying issues with the Kubernetes cluster itself. To ensure the cluster is healthy:

Check the health of the cluster components:

kubectl get componentstatuses

A healthy output should look like this:

Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE   ERROR
scheduler            Healthy   ok        
controller-manager   Healthy   ok

Investigate and resolve any unhealthy component statuses before attempting to delete the Namespace. For example, if the etcd component is unhealthy, you may need to troubleshoot and repair the etcd cluster.

For Kubernetes versions v1.19 and above, since componentstatuses is deprecated, you can use the following command:

kubectl get --raw='/readyz?verbose'

A healthy output should look like this:

[+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-readiness ok
[+]informer-sync ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-legacy-token-tracking-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check passed

Investigate and resolve any unhealthy component statuses before attempting to delete the Namespace. Each component might have unique troubleshooting steps depending on the issue. For example, if the etcd component is unhealthy, you may need to troubleshoot and repair the etcd cluster.
Ensure that the Kubernetes control plane components, such as the API server, controller manager, and scheduler, are running correctly. Verify that there are no error messages or issues in the logs of these components.
Check the health of the worker nodes in your cluster:

kubectl get nodes -o wide
NAME     STATUS   ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                      CONTAINER-RUNTIME
ultron   Ready    control-plane,master   23d   v1.26.3+k3s1   172.17.66.235           Rancher Desktop WSL Distribution   5.15.90.1-microsoft-standard-WSL2   docker://20.10.21

Investigate and resolve any issues with nodes that are in a NotReady state or have other issues.

If all of the above yields nothing, verify that the kubelet service is running properly on each node by checking its logs and status.

By thoroughly examining the health and stability of your Kubernetes cluster, you can identify and resolve issues that might be causing a Namespace to become stuck in the Terminating state. Ensuring your cluster’s overall health will not only resolve Namespace deletion problems but also contribute to a smooth and efficient Kubernetes experience.

Cause 4: Custom Resource Definitions (CRDs) and Namespaces

Custom Resource Definitions (CRDs) can play a role in Namespaces being stuck in the Terminating state. CRDs extend the Kubernetes API by defining new resource types, which may affect the deletion process. To handle CRDs in the context of stuck Namespaces:

List all CRDs in your cluster:

kubectl get crds

Inspect the CRDs to determine if any are associated with the Namespace you are trying to delete.

Delete any associated CRD instances within the Namespace:

kubectl delete . -n

If necessary, remove the CRD itself after deleting its instances:

kubectl delete crd .

Cause 5: Resource Quotas and Stuck Namespaces

Resource quotas can also impact Namespaces getting stuck in the Terminating state. Quotas limit the amount of resources that can be consumed within a Namespace, and exceeding these limits may cause issues.

Check the resource quotas for the Namespace:

kubectl get resourcequotas -n

If the resource usage is exceeding the quota, you may need to adjust the quota or reduce resource consumption within the Namespace.

To update the resource quota, edit the ResourceQuota object:

kubectl edit resourcequota  -n

Adjust the limits as needed, then save and exit the editor.

The Namespace Deletion Process

After identifying and addressing the potential cause(s) behind the stuck Namespace, proceed with the Namespace deletion process:

Eliminate all resources within the Namespace:

kubectl delete all --all -n

Force Namespace deletion:

The kubectl delete namespace command offers additional parameters to help you tailor the deletion process to your specific requirements:

–grace-period: This parameter sets the duration in seconds that the system should wait before forcefully deleting the Namespace. By setting the grace period to 0 (–grace-period=0), you instruct Kubernetes to skip the waiting period and immediately proceed with the deletion.
–force: The –force parameter ensures that the Namespace is deleted forcefully, bypassing the default graceful deletion process. This can be helpful in cases where the Namespace is stuck in the Terminating state due to various issues discussed earlier.

kubectl delete namespace  --grace-period=0 --force

Reevaluate the Namespace status:

kubectl get namespaces

Regenerate the Namespace, if necessary:

kubectl create namespace

By understanding and addressing these potential causes, senior engineers can effectively resolve a Namespace stuck in the Terminating state.

There you have it, these comprehensive strategies will enable you to enjoy a more efficient and smoother Kubernetes experience, ensuring optimal performance and reliability in their container orchestration system.

Kubernetes Headless Services

2023-05-01T00:00:00+00:00

Kubernetes services are abstractions that define a logical set of pods and a policy to access them. Services enable loose coupling between pods and provide a stable IP address, DNS name, and load balancing for distributing network traffic.

Headless Services do not provide a stable IP address are a special type of Kubernetes service that allow for more direct control over pod-to-pod communication and service discovery, particularly useful for stateful applications and custom load balancing scenarios.

What are Kubernetes Headless Services?

Simply said, a headless service is a Kubernetes service without a cluster IP, allowing pods to be directly reached via their individual IP addresses. This facilitates direct communication between pods and bypasses the default load balancing provided by Kubernetes.

Unlike ClusterIP services, which provide a single IP address that load balances traffic to the backend pods, headless services expose each pod’s IP address directly. This is useful when pods need to be individually addressed or when custom load balancing is required, or you simply want the endpoints of your service listed without providing access.

The primary usecases for headless services are:

Stateful applications: Headless services are essential for stateful applications that require stable network identities and direct communication between instances, such as databases and message brokers.
Service discovery: Headless services enable custom service discovery mechanisms, allowing applications to discover and communicate with individual pod instances.
Custom load balancing: By exposing individual pod IPs, headless services allow for the implementation of custom load balancing strategies tailored to specific application requirements.

Headless Services with a regular Deployment

First lets create a deployment for our Headless Service. They are primarily used with StatefulSets but in order to demostrae why we will start with a deployment.

This YAML file creates a deployment for a sample application with 3 replicas.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: busybox:latest
        args:
        - /bin/sh
        - -c
        - while true; do { echo -e 'HTTP/1.1 200 OK\n\nHello from Busybox!'; } | nc -l -p 80; done
        ports:
        - containerPort: 80

Now let’s create the corresponding headless service:

apiVersion: v1
kind: Service
metadata:
  name: my-headless-service
spec:
  # Set clusterIP to None to create a headless service
  clusterIP: None
  # Selector to match the desired set of pods
  selector:
    app: my-app
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80

As you can see, there is no magic and the only real difference between a headless and a “normal” service is, that the headless service will not provision a clusterIP for you.

Apply the YAML configuration using kubectl apply -f to create the headless service.

$ kubectl apply -f ./example.yaml
deployment.apps/my-app-deployment created
service/my-headless-service created

Verify the creation of the headless service using kubectl get services and ensure that the ‘CLUSTER-IP’ field is set to ‘None’.

$ kubectl get service my-headless-service -o wide
NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE   SELECTOR
my-headless-service   ClusterIP   None                 80/TCP    90s   app=my-app

If you want to list all endpoints of your service you can simply do so with kubectl get endpoints which will look something like this:

$ kubectl get endpoints my-headless-service
NAME                  ENDPOINTS                                            AGE
my-headless-service   10.42.0.124:80,10.42.0.125:80,10.42.0.126:80   4m7s

Or if you like some more specific information you can output them as Json like this:

$ kubectl get endpoints my-headless-service --output=json
{
    "apiVersion": "v1",
    "kind": "Endpoints",
    "metadata": {
        "annotations": {
            "endpoints.kubernetes.io/last-change-trigger-time": "2023-05-02T08:58:35Z"
        },
        "creationTimestamp": "2023-05-02T08:58:20Z",
        "labels": {
            "service.kubernetes.io/headless": ""
        },
        "name": "my-headless-service",
        "namespace": "default",
        "resourceVersion": "169091",
        "uid": "c431ad71-40e8-4140-987f-654e596e957d"
    },
    "subsets": [
        {
            "addresses": [
                {
                    "ip": "10.42.0.137",
                    "nodeName": "ultron",
                    "targetRef": {
                        "kind": "Pod",
                        "name": "my-app-deployment-6694968687-c6rv9",
                        "namespace": "default",
                        "uid": "32a435fb-7938-4a6d-a4c0-ad0439a9dabe"
                    }
                },
                {
                    "ip": "10.42.0.138",
                    "nodeName": "ultron",
                    "targetRef": {
                        "kind": "Pod",
                        "name": "my-app-deployment-6694968687-9djtn",
                        "namespace": "default",
                        "uid": "e45f0d0e-292e-408c-99a1-ecdc7ccaa3cc"
                    }
                },
                {
                    "ip": "10.42.0.139",
                    "nodeName": "ultron",
                    "targetRef": {
                        "kind": "Pod",
                        "name": "my-app-deployment-6694968687-ln985",
                        "namespace": "default",
                        "uid": "fd77afaf-8b5f-4994-9e12-c9ba780ce44f"
                    }
                }
            ],
            "ports": [
                {
                    "name": "http",
                    "port": 80,
                    "protocol": "TCP"
                }
            ]
        }
    ]
}

Discovering pods with DNS

Now let us create a dnsutils container in the same namespace. With it, we can perform an dig command to query the A records for the headless service

$ kubectl run -i --tty dnsutils --image=tutum/dnsutils --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
# 
dig my-headless-service.default.svc.cluster.local A

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> my-headless-service.default.svc.cluster.local A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20692
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;my-headless-service.default.svc.cluster.local. IN A

;; ANSWER SECTION:
my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.137
my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.139
my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.138

;; Query time: 0 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Tue May 02 08:55:22 U

As you can see in the Answer Section, all IPs of the replicas have been nicely returned to us. This will help us connect directly through them, however the hostname remains the same. Most DNS query programs will not make sure that you rotate the IP and will only connect the first one and provide no load balancing whatsoever.

Using SRV records for service discovery

SRV records are a type of DNS record that provides additional information beyond just the IP address of a service. They are particularly useful for service discovery within a cluster, allowing applications to obtain information about available pod instances and their respective ports.

An SRV record is a DNS record with the format: _service._protocol...svc.cluster.local. It contains information about the service’s hostname, port, protocol, and priority. Applications can perform DNS lookups using the SRV record to discover the available pod instances and their corresponding ports, allowing them to establish connections and communicate with individual pods.

This is especially useful for stateful applications that require direct pod-to-pod communication, as well as for custom load balancing scenarios where the built-in Kubernetes load balancing is not sufficient.

In order to check this out we first need to create a dnsutils container in the same namespace. Next, we can perform an dig command to query the SRV records for the headless service:

$ kubectl run -i --tty dnsutils --image=tutum/dnsutils --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
# 
dig _http._tcp.my-headless-service.default.svc.cluster.local SRV

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> _http._tcp.my-headless-service.default.svc.cluster.local SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61360
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 4
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_http._tcp.my-headless-service.default.svc.cluster.local. IN SRV

;; ANSWER SECTION:
_http._tcp.my-headless-service.default.svc.cluster.local. 5 IN SRV 0 33 80 10-42-0-138.my-headless-service.default.svc.cluster.local.
_http._tcp.my-headless-service.default.svc.cluster.local. 5 IN SRV 0 33 80 10-42-0-137.my-headless-service.default.svc.cluster.local.
_http._tcp.my-headless-service.default.svc.cluster.local. 5 IN SRV 0 33 80 10-42-0-139.my-headless-service.default.svc.cluster.local.

;; ADDITIONAL SECTION:
10-42-0-137.my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.137
10-42-0-139.my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.139
10-42-0-138.my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.138

;; Query time: 0 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Tue May 02 08:58:58 UTC 2023
;; MSG SIZE  rcvd: 703

NOTE: Headless Services for deployments automatically create DNS records for each pod, following the pattern ...svc.cluster.local.

Now using this information, lets try it out by spinning up a busybox and running wget:

$ kubectl run -i --tty busybox --image=busybox --restart=Never -- sh 
If you don't see a command prompt, try pressing enter.
/ # wget -O- http://10-42-0-138.my-headless-service.default.svc.cluster.local:80
Connecting to 10-42-0-138.my-headless-service.default.svc.cluster.local:80 (10.42.0.138:80)
writing to stdout
Hello from Busybox!
-                    100% |*****************************************************************************************************************************************************************************************************************************|    20  0:00:00 ETA
written to stdout

As you can see the dns name of the pod is using the pod ip as dns hostnamme and thus it is not going to be stable across new iterations of the pod or even scaling the deployment up or down. Let us find out how the same scenario will play out with a StatefullSet:

Using a SatefullSet

Now that we have seen what happens to a deployment, lets check what happens if we change it to a StatefullSet.

In order to that we have to change our yaml like this:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: my-app-statefulset
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  serviceName: my-headless-service
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: busybox:latest
        args:
        - /bin/sh
        - -c
        - while true; do { echo -e 'HTTP/1.1 200 OK\n\nHello from Busybox!'; } | nc -l -p 80; done
        ports:
        - containerPort: 80
        readinessProbe:
          tcpSocket:
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 10
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: my-headless-service
spec:
  # Set clusterIP to None to create a headless service
  clusterIP: None
  # Selector to match the desired set of pods
  selector:
    app: my-app
  ports:
    - name: http
      protocol: TCP
      port: 80
      targetPort: 80

As you may see, I have changed Deployment to StatefulSet and added a serviceName field in the spec section that points to the headless service we have defined. We have also added a volumeClaimTemplates field to the spec section, which defines a persistent volume claim template that can be used by the StatefulSet’s pods to store data.

The StatefulSet maintains a unique identity for each of its pods and ensures that they are created in a predictable order. It is useful when you need to manage stateful applications, like databases, where each instance requires a unique identity and persistent storage.

kubectl run -i --tty dnsutils --image=tutum/dnsutils --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
dig my-headless-service.default.svc.cluster.local A

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> my-headless-service.default.svc.cluster.local A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18912
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;my-headless-service.default.svc.cluster.local. IN A

;; ANSWER SECTION:
my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.148
my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.143
my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.146

;; Query time: 0 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Tue May 02 15:00:11 UTC 2023
;; MSG SIZE  rcvd: 257

So far the Answer section looks pretty much the same as for the deployment, but watch what happens if we quere the SRV record:

# 
dig _http._tcp.my-headless-service.default.svc.cluster.local SRV

; <<>> DiG 9.9.5-3ubuntu0.2-Ubuntu <<>> _http._tcp.my-headless-service.default.svc.cluster.local SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12868
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 4
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_http._tcp.my-headless-service.default.svc.cluster.local. IN SRV

;; ANSWER SECTION:
_http._tcp.my-headless-service.default.svc.cluster.local. 5 IN SRV 0 33 80 my-app-statefulset-0.my-headless-service.default.svc.cluster.local.
_http._tcp.my-headless-service.default.svc.cluster.local. 5 IN SRV 0 33 80 my-app-statefulset-1.my-headless-service.default.svc.cluster.local.
_http._tcp.my-headless-service.default.svc.cluster.local. 5 IN SRV 0 33 80 my-app-statefulset-2.my-headless-service.default.svc.cluster.local.

;; ADDITIONAL SECTION:
my-app-statefulset-1.my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.146
my-app-statefulset-0.my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.143
my-app-statefulset-2.my-headless-service.default.svc.cluster.local. 5 IN A 10.42.0.148

;; Query time: 0 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Tue May 02 15:01:52 UTC 2023
;; MSG SIZE  rcvd: 757

NOTE: Headless services automatically create DNS records for each pod, following the pattern ...svc.cluster.local.

Why headless services work best with StatefullSets

When a StatefulSet is created, Kubernetes assigns a unique identity to each of its pods, in the form of an ordinal index. The SRV records returned by the dig command reflect this by including the ordinal index in the hostname of each pod. For example, my-app-statefulset-0 refers to the first pod in the StatefulSet.

They are predominantly used with StatefulSets because they enable DNS-based service discovery for each individual pod, using the unique hostname assigned by Kubernetes in a stable manner. This allows applications running inside the pods to easily discover and connect to other pods in the StatefulSet, using their unique identities. This stable network identity is crucial for stateful applications that require a consistent identity for data replication, leader election, or other distributed systems tasks.

While it is possible to use a headless service with a Deployment, some of the advantages of using a headless service with a StatefulSet do not apply to a Deployment. Here’s why:

Stable network identity: In a StatefulSet, each pod gets a unique and stable hostname based on the StatefulSet’s name and an index (e.g., my-app-statefulset-0, my-app-statefulset-1). Deployments, on the other hand, create pods with random hashes in their names (e.g., my-app-deployment-6f8d6b94c7). Thus, pods created by a Deployment do not have stable hostnames, which makes it difficult to maintain a consistent network identity for stateful applications that rely on it for data replication, leader election, or other tasks. Also as we have seen, the DNS Hostname of a deployed pod will be its IP.
Direct access to individual pods: While headless services can still provide direct access to individual pods in a Deployment, the lack of a stable network identity makes it harder to manage communication between specific instances or replicas. Without stable hostnames, clients need to rely on other mechanisms, such as labels or annotations, to identify and communicate with specific instances. This adds complexity to the system and might not be suitable for certain stateful applications. Of course you can query the SRV entry in many iterations, but you need to make sure that your application takes care of it.
Ordered and controlled scaling: StatefulSets provide ordered and controlled scaling, ensuring that pods are created and terminated in a specific order (based on their index). This is particularly important for stateful applications where the order of scaling operations can affect data consistency or availability. Deployments, however, scale pods without any specific order, which could lead to issues in stateful applications.
Persistent storage: StatefulSets can leverage Stateful Volumes, which ensures that each pod gets its own unique and persistent storage. When a pod is rescheduled, it retains the same storage, allowing for seamless data recovery. Deployments do not have this native capability, making it challenging to manage stateful applications that require persistent storage.

Therefore, while headless services can be used with both StatefulSets and Deployments, the advantages of using a headless service with a StatefulSet do not apply to Deployments because of the lack of stable network identity, unordered scaling, and absence of native persistent storage support.

Finishing up

Kubernetes headless services provide a powerful way to manage direct pod-to-pod communication, service discovery, and custom load balancing. They are particularly useful for stateful applications and situations where the default load balancing provided by Kubernetes is insufficient and are an important tool in a developer’s arsenal.

Embracing the Kubernetes Downward API

2023-04-25T00:00:00+00:00

In the realm of Kubernetes, it is sometimes important to expose certain Pod and container fields to containers running within the Pod. These approaches, collectively known as the downward API, enable developers to harness the power of Pod and container fields within their containers.

The Kubernetes Downward API is an essential mechanism that facilitates containers in a Kubernetes cluster to obtain metadata regarding themselves or the environment in which they operate, which enables us to make more solid decisions. It enables a container to gather information about its configuration and additional metadata. This information is valuable for generating configuration files, implementing context-aware behavior, or conducting monitoring.

The Downward API can be employed through two methods:

Environment Variables: By utilizing the env or envFrom field in the container’s specification, pod or container metadata can be injected into a container’s environment variables. The metadata is exposed as environment variables and can be accessed by the applications operating within the container.

Volumes: The Downward API can also populate files within a volume. This is achieved using a DownwardAPIVolume, a Kubernetes volume type capable of exposing pod metadata as files in a directory. The volume can subsequently be mounted into the container’s filesystem at a specified path.

Today I am going to make sure that you understand both and can use them with confidence in your deployments. But first I want to make clear that it saidly does not support all the fields in the podspec. This is a detailed list of fields that can be used with the it and the methods they are available through (Environment variables, Volume files, or both):

metadata.name (Both)
- The name of the pod.
metadata.namespace (Both)
- The namespace the pod is running in.
metadata.labels (Volume files)
- A set of key-value pairs (labels) attached to the pod.
metadata.annotations (Volume files)
- A set of key-value pairs (annotations) providing additional non-identifying information about the pod.
status.podIP (Environment variables)
- The IP address of the pod.
spec.nodeName (Environment variables)
- The name of the node the pod is running on.
status.hostIP (Environment variables)
- The IP address of the node where the pod is running.
spec.serviceAccountName (Environment variables)
- The name of the service account associated with the pod.
metadata.uid (Environment variables)
- The unique identifier (UID) assigned to the pod by the Kubernetes system.
resources.requests.cpu (Environment variables)
- The amount of CPU requested by the container.
resources.requests.memory (Environment variables)
- The amount of memory requested by the container.
resources.limits.cpu (Environment variables)
- The maximum amount of CPU allowed for the container.
resources.limits.memory (Environment variables)
- The maximum amount of memory allowed for the container.

Using the downwards API with volumes

Note: the downwards api for volumes currently is pretty limited. The only supported values are: “metadata.annotations”, “metadata.labels”, “metadata.name”, “metadata.namespace”, “metadata.uid”

In order to expose the pods labels and annotations, I will create a Pod with a single container and project Pod-level fields into the running container as files. The Pod manifest provided below reveals a downwardAPI volume, which the container mounts at /etc/downwardAPI:

apiVersion: v1
kind: Pod
metadata:
  name: learning-about-downwardapi
  labels:
    zone: DE-Falkenstein
    cluster: test-meadow
  annotations:
    app: downwardAPI
    team: devops
spec:
  containers:
    - name: demo-container
      image: registry.k8s.io/busybox
      command: ["sh", "-c"]
      args:
      - while true; do
          if [[ -e /etc/downwardapi/labels ]]; then
            echo -en 'Labels:\n'; cat /etc/downwardapi/labels; fi;
          if [[ -e /etc/downwardapi/annotations ]]; then
            echo -en '\nAnnotations:\n'; cat /etc/downwardapi/annotations; fi;
          sleep 5;
        done;
      volumeMounts:
        - name: downwardapi
          mountPath: /etc/downwardapi
  volumes:
    - name: downwardapi
      downwardAPI:
        items:
          - path: "labels"
            fieldRef:
              fieldPath: metadata.labels
          - path: "annotations"
            fieldRef:
              fieldPath: metadata.annotations
          - path: "podName"
            fieldRef:
              fieldPath: metadata.name
          - path: "Namespace"
            fieldRef:
              fieldPath: metadata.namespace              

reviewing the container definition

The main components are briefly described below:

image: The container image is sourced from the registry.k8s.io repository, with the specific image being “busybox”, a lightweight and versatile Linux distribution. In this case it will help use outputting our stored information
args: This is a list of arguments that will be passed to the shell command. In this case, it contains a single argument that is a multi-line shell script. The shell script does the following: a. It runs an infinite loop using the “while true” construct. b. Within the loop, it checks if the file “/etc/downwardapi/labels” exists by using the “-e” flag in the conditional expression. If it exists, it prints a newline character (‘\n’) twice and then the contents of the file using “cat”. c. Similarly, it checks if the file “/etc/downwardapi/annotations” exists, and if so, prints a newline character (‘\n’) twice and the contents of the file using “cat”. d. The loop then pauses for 5 seconds using the “sleep 5” command before reiterating through the process.

Therefore, we define a Pod called “learning-about-downwardapi” that runs a BusyBox image and executes a shell script, which continuously checks for the existence of two files (/etc/downwardapi/labels and /etc/downwardapi/annotations), and prints their contents if they exist, before pausing for 5 seconds and repeating the process. We can observe that the Pod features a downwardsAPI Volume, and the container kindly mounts the volume at /etc/downwardsapi as files, so the pod can access the information. The first element represents that the value of the Pod’s metadata.labels field should be saved in a file called ‘labels’. The second element proposes that the value of the Pod’s annotations field should be stored in a file named ‘annotations’.

applying and testing

Now that we have prepared our pod definition, lets apply it with:

kubectl apply -f ./downwardsapitest.yaml

And since I have not provided any namespace, it will be in the default namespace and we can retrieve the logs with:

kubectl logs learning-about-downwardapi

and our output looks something like this:

Labels:
cluster="test-meadow"
zone="DE-Falkenstein"
Annotations:
app="downwardAPI"
kubectl.kubernetes.io/last-applied-configuration="{...}"
kubernetes.io/config.seen="2023-04-25T07:29:39.669534852Z"
kubernetes.io/config.source="api"

As you can see, all of the labels and annotations have succesfully been mounted to the pod and we can access them through the container file system. This is particularly usefull, when your code makes decisions based on these parameters.

You also can simply exec the running pod itself to verify the files existance:

kubectl exec -it learning-about-downwardapi -- sh

where cat will give you the same results:

/# cat /etc/downwardapi/labels
cluster="test-meadow"
zone="DE-Falkenstein"/ # 

What is more interesting though is if the actual contents of the folder /etc/downardsapi! Let me show you by runing ls:

/# cd /etc/downwardapi/
/etc/downwardapi # ls -la
total 4
drwxrwxrwt    3 root     root           120 Apr 25 07:37 .
drwxr-xr-x    1 root     root          4096 Apr 25 07:37 ..
drwxr-xr-x    2 root     root            80 Apr 25 07:37 ..2023_04_25_07_37_05.3522933447
lrwxrwxrwx    1 root     root            32 Apr 25 07:37 ..data -> ..2023_04_25_07_37_05.3522933447
lrwxrwxrwx    1 root     root            18 Apr 25 07:37 annotations -> ..data/annotations
lrwxrwxrwx    1 root     root            13 Apr 25 07:37 labels -> ..data/labels

In the generated output, it is evident that both the labels and annotations files are located within a temporary subdirectory. In our specific example, the subdirectory is denoted as “..2023_04_25_07_37_05.3522933447”. Within the “/etc/downwardapi” directory, “..data” functions as a symbolic link that connects to the aforementioned temporary subdirectory. Additionally, within the same “/etc/downwardapi” directory, both the labels and annotations serve as symbolic links, so they can be updated!

Utilizing symbolic links facilitates dynamic and atomic updates of the metadata. This is achieved through writing updates to a new temporary directory, followed by an atomic update of the “..data” symlink employing the “rename(2)” system call. If you are used to deploying without kubernetes, you may remember this approach from countless release-deploy scripts out there.

Lets patch our pod with a new annotation to see how that works.

kubectl patch pod learning-about-downwardapi -p '{"metadata":{"annotations":{"updatestatus":"in-progress"}}}'
pod/learning-about-downwardapi patched  

and we exec our running pod again with

kubectl exec -it learning-about-downwardapi -- sh

and check out our mounted folder:

cd /etc/downwardapi/
/etc/downwardapi # ls -la
total 4
drwxrwxrwt    3 root     root           120 Apr 25 07:47 .
drwxr-xr-x    1 root     root          4096 Apr 25 07:37 ..
drwxr-xr-x    2 root     root            80 Apr 25 07:47 ..2023_04_25_07_47_52.1041351476
lrwxrwxrwx    1 root     root            32 Apr 25 07:47 ..data -> ..2023_04_25_07_47_52.1041351476
lrwxrwxrwx    1 root     root            18 Apr 25 07:37 annotations -> ..data/annotations
lrwxrwxrwx    1 root     root            13 Apr 25 07:37 labels -> ..data/labels

As you can see the date has changed, so lets review our data:

/etc/downwardapi # cat annotations 
app="downwardAPI"
kubectl.kubernetes.io/last-applied-configuration="{...}"
kubernetes.io/config.seen="2023-04-25T07:37:05.107154061Z"
kubernetes.io/config.source="api"
team="devops"
updatestatus="in-progress"

And there you have it, this way you can mount critical information directly into your pods and of course it does not stop at labels and annotations. You can mount the entire podspec this way.

Why is this important

Pods are considered immutable once they are created. If you need to change the environment variables for a running pod, you must create a new pod with the updated environment variables. However Labels and annotations are indeed exceptions to the immutability of pods.

Using this technique and code that uses for instance a filesystemwatcher, your program will be able to adapt to changes in annotations and labels which are currently widely used with operators or controllers. There is a huge opportunity to have your pod react in real time to changes in the outside world, without actually stopping the container and ensuring downtime!

Using the Downwards API with Environment variables

Environment variables are not as flexible as the mounted volumes, as I have demostrated in the last chapter, but they are still usefull for the information they contain. You cannot directly patch environment variables in a running pod. If you need to change the environment variables for a running pod, you must create a new one with the updated environment variables and have to terminate the old one before. In the case of a Deployment, StatefulSet, or DaemonSet, you can update the environment variables in the respective template spec, which will trigger a rolling update to replace the existing pods with new ones that have the updated environment variables. They do however offer invaluable information as you can see in the example below:

apiVersion: v1
kind: Pod
metadata:
  name: learning-about-downwardapi
  labels:
    zone: DE-Falkenstein
    cluster: test-meadow
  annotations:
    app: downwardAPI
    team: devops
spec:
  containers:
    - name: demo-container
      image: registry.k8s.io/busybox
      command: ["sh", "-c"]
      args:
      - while true; do
          echo -en 'Pod Name:\n'; echo $POD_NAME;
          echo -en '\nNamespace:\n'; echo $POD_NAMESPACE;
          echo -en '\nPod IP:\n'; echo $POD_IP;
          echo -en '\nNode Name:\n'; echo $NODE_NAME;
          echo -en '\nHost IP:\n'; echo $HOST_IP;
          echo -en '\nService Account Name:\n'; echo $SERVICE_ACCOUNT_NAME;
          echo -en '\nUID:\n'; echo $POD_UID;
          echo -en '\nCPU Request:\n'; echo $CPU_REQUEST;
          echo -en '\nCPU Limit:\n'; echo $CPU_LIMIT;
          echo -en '\nMemory Request:\n'; echo $MEMORY_REQUEST;
          echo -en '\nMemory Limit:\n'; echo $MEMORY_LIMIT;
          sleep 5;
        done;
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: SERVICE_ACCOUNT_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: POD_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid
        - name: CPU_REQUEST
          valueFrom:
            resourceFieldRef:
              containerName: demo-container
              resource: requests.cpu
        - name: CPU_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: demo-container
              resource: limits.cpu
        - name: MEMORY_REQUEST
          valueFrom:
            resourceFieldRef:
              containerName: demo-container
              resource: requests.memory
        - name: MEMORY_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: demo-container
              resource: limits.memory
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
        limits:
          cpu: 200m
          memory: 200Mi

mount a specific annotation as environment variable

/E: 04.10.23

One of my readers pointed out, that it is indeed possible to mount a specific annotation as an environment variable into a pod by usin the fieldPath metadata.annotations[yourAnnotation] which will be plenty usefull for directly mounting an annotation.

      env:
        - name: SOME_ANNOTATION
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations[k8s.io/some-annotation]

However please be aware that, opposed to a volumemount, ENV variables will not be updated during runtime if you use it this way. So while it is indeed possible, please use it with care.

combining it all for all the information

Finally lets combine the environment variables with the volumemounts and check all the information together. This does not change what we have learned, but gives us a complete overview of information we can read in the pod:

apiVersion: v1
kind: Pod
metadata:
  name: learning-about-downwardapi
  labels:
    zone: DE-Falkenstein
    cluster: test-meadow
  annotations:
    app: downwardAPI
    team: devops
spec:
  containers:
    - name: demo-container
      image: registry.k8s.io/busybox
      command: ["sh", "-c"]
      args:
      - while true; do
          echo -en 'Pod Name:\n'; echo $POD_NAME;
          echo -en '\nNamespace:\n'; echo $POD_NAMESPACE;
          echo -en '\nPod IP:\n'; echo $POD_IP;
          echo -en '\nNode Name:\n'; echo $NODE_NAME;
          echo -en '\nHost IP:\n'; echo $HOST_IP;
          echo -en '\nService Account Name:\n'; echo $SERVICE_ACCOUNT_NAME;
          echo -en '\nUID:\n'; echo $POD_UID;
          echo -en '\nCPU Request:\n'; echo $CPU_REQUEST;
          echo -en '\nCPU Limit:\n'; echo $CPU_LIMIT;
          echo -en '\nMemory Request:\n'; echo $MEMORY_REQUEST;
          echo -en '\nMemory Limit:\n'; echo $MEMORY_LIMIT;
          if [[ -e /etc/downwardapi/labels ]]; then
            echo -en '\nLabels:\n'; cat /etc/downwardapi/labels; fi;
          if [[ -e /etc/downwardapi/annotations ]]; then
            echo -en '\nAnnotations:\n'; cat /etc/downwardapi/annotations; fi;
          sleep 5;
        done;
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: SERVICE_ACCOUNT_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: POD_UID
          valueFrom:
            fieldRef:
              fieldPath: metadata.uid
        - name: CPU_REQUEST
          valueFrom:
            resourceFieldRef:
              containerName: demo-container
              resource: requests.cpu
        - name: CPU_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: demo-container
              resource: limits.cpu
        - name: MEMORY_REQUEST
          valueFrom:
            resourceFieldRef:
              containerName: demo-container
              resource: requests.memory
        - name: MEMORY_LIMIT
          valueFrom:
            resourceFieldRef:
              containerName: demo-container
              resource: limits.memory
      volumeMounts:
        - name: downwardapi
          mountPath: /etc/downwardapi
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
        limits:
          cpu: 200m
          memory: 200Mi
  volumes:
    - name: downwardapi
      downwardAPI:
        items:
          - path: "labels"
            fieldRef:
              fieldPath: metadata.labels
          - path: "annotations"
            fieldRef:
              fieldPath: metadata.annotations
          - path: "podName"
            fieldRef:
              fieldPath: metadata.name
          - path: "Namespace"
            fieldRef:
                fieldPath: metadata.namespace

And let’s check out the example output:

kubectl logs learning-about-downwardapi 
Pod Name:
learning-about-downwardapi

Namespace:
default

Pod IP:
10.42.0.112

Node Name:
ultron

Host IP:
172.17.66.235

Service Account Name:
default

UID:
afca6d22-a1df-4787-a947-bccf6b1015f8

CPU Request:
1

CPU Limit:
1

Memory Request:
104857600

Memory Limit:
209715200

Labels:
cluster="test-meadow"
zone="DE-Falkenstein"
Annotations:
app="downwardAPI"
kubectl.kubernetes.io/last-applied-configuration="{...}"
kubernetes.io/config.seen="2023-04-25T11:31:49.978886812Z"
kubernetes.io/config.source="api"
team="devops"

There you have it, extended information of the pod, directly mounted into it. The Kubernetes Downward API provides a powerful and flexible way to expose pod metadata and container resources to the containers running within a pod. By making use of environment variables and volume mounts, developers can access essential information about the pod, such as its name, namespace, labels, and annotations, as well as the allocated resources and limits for the container.

Exposing this information within containers has several practical applications. For instance, it enables applications to adjust their behavior according to the available resources or the environment they are running in. Additionally, it can facilitate the development of monitoring and logging tools that can access and report the pod’s metadata, making it easier to track and manage applications deployed in a Kubernetes cluster.

Furthermore, it helps create self-aware applications that can dynamically adapt to their environment, improving overall system resilience and responsiveness. This feature can be particularly useful when dealing with auto-scaling, rolling updates, and other scenarios where applications must gracefully handle changes in their environment.

The Kubernetes Downward API can be an invaluable tool for developers and operators working with containerized applications in a cluster environment. By providing easy access to critical pod metadata and container resources, it allows for more robust, flexible, and adaptive applications that can better handle the ever-changing landscape of modern distributed systems.

{ Hoelzel.IT }

All roads will lead you to Azure

The Realities of a Multicloud World

Compliance: Beyond the Clusters

Hard Drive Encryption: The Basics, Done Right

Advanced Threat Protection: More Than Just Antivirus

Automated Updates: Close the Gaps

Remote Wipe: Protecting Data When Things Go Wrong

Auditing and Verification: Proving Compliance, Every Time

Teleport: Securing Kubernetes Access

Multi-Factor Authentication (MFA): Strengthen Your Security

Session Recording: Keeping a Close Eye

Granular Access Controls: Principle of Least Privilege

Secure Global Access: No Matter Where You Are

Real-World Implementation: RKE2, Teleport, and Office 365 in Action

Provisioning with Intune

Automated Teleport Installation

Integration and Immediate Productivity

Continuous Compliance Monitoring

Audit Logging: The Foundation of Compliance

Enhanced Logging with Teleport

SIEM Integration

Managing Apple Devices with Intune and Apple Business Manager

Seamless Integration with Apple Business Manager

Enforcing Security Policies

Unified Compliance Reporting

Wrapping It Up: Securing and Managing Compliance in a Multicloud World

Gaining Total Control of Your Kubernetes Nodes with Custom Images

What Are Custom Images?

Why could they be Essential for Kubernetes Deployments

1. Consistency Across Your Cluster

2. Security by Design

3. Speeding Up Node Provisioning

4. Full Control Over Your Node Environment

5. Embedding Custom Scripts and Self-Written Programs

Real-World Example: Financial App Deployment on Hetzner and DigitalOcean

Avoiding Cloud Provider Limitations

Automating Image Updates and Infrastructure with Terraform

1. CI/CD Integration for Automated Image Building

2. Terraform for Infrastructure Provisioning

Edge Computing and Custom Images: Expanding Kubernetes Beyond the Cloud

Long-Term Strategy: Evolving Your Infrastructure

1. Modular and Adaptable

2. Collaboration Between DevOps and Development Teams

3. They life in GIT

Deep Dive: Automating Custom Kubernetes Images with Packer

Packer Template for DigitalOcean

Let’s dive into the details

Why This Template Works for Kubernetes

What now?

TLDR: Custom Images as the Backbone of Kubernetes Success

Building Resilience with kube-probesim

Why kube-probesim?

Deploying kube-probesim from GitHub Container Registry

Real-World Scenarios

1. Random Failures

2. Failing External Dependencies

Observability and Monitoring

TLDR: A tool to simulate probe failure

Next Steps:

go_wait_for_k8s

The Problem: Ensuring Dependencies are Ready

How go_wait_for_k8s Solves This Problem

Why Use an InitContainer?

Example Use Case: Waiting for a PostgreSQL Database

Step 1: Define the InitContainer in Your Pod Spec

Step 2: Deploy to Kubernetes

Using go_wait_for_k8s for Other Dependencies

Security and RBAC Considerations

Create a Role with Minimal Permissions

Bind the Role to a Service Account

TLDR: a simple init container to wati for your resources

Kuberntes Access Proxies

The Complexity of Kubernetes Access Management

Challenges of RBAC Management at Scale

What Is an Access Proxy and Why You Need One

Centralized Access Management

Compliance and Auditing Made Easy

Teleport: A Deep Dive into Its Features

Unified Access Management

How `go_wait_for_k8s` Solves This Problem

Step 1: Define the `InitContainer` in Your Pod Spec

Using `go_wait_for_k8s` for Other Dependencies