Published on

Kubernetes API Errors Caused by Control Plane Exposure

Authors

When Your Kubernetes API Starts Failing: A Lesson in Control Plane Exposure

Recently, I ran into a cluster issue that, at first glance, looked like a fairly typical internal problem: intermittent Kubernetes API errors. Timeouts, failed requests, and general instability.

The kind of issue that usually sends you digging into:

  • resource pressure
  • misbehaving workloads
  • logging or monitoring pipelines

It turned out to be none of those.


The Symptoms

The cluster began showing signs of API instability:

  • Intermittent request failures
  • Increased latency from the Kubernetes API
  • Errors appearing across multiple components

At first, the suspicion fell on internal services. But nothing obvious stood out:

  • Resource usage was within expected bounds
  • No clear spike in workload activity
  • No obvious misconfiguration

The Turning Point

After raising the issue with the provider, they reported:

Unauthorised activity detected against the Kubernetes control plane.

That immediately reframed the problem.

This wasn’t an internal failure — it was external pressure on the API server.


The Root Cause

The Kubernetes API endpoint was more exposed than it should have been.

That meant:

  • External actors could reach the control plane
  • Requests (malicious or not) were hitting the API server
  • The control plane was under unnecessary load

Even without successful authentication, this can:

  • Increase latency
  • Trigger rate limiting
  • Cause intermittent failures for legitimate traffic

In short: your cluster can degrade even if no one actually gets access.


The Fix

The solution was straightforward:

Restrict access to the Kubernetes API using IP-based ACLs.

Only trusted sources were allowed:

  • Admin networks
  • VPN endpoints
  • Known automation systems

As soon as this was implemented:

  • API errors stopped
  • Latency returned to normal
  • Cluster stability was restored

Why This Matters

Kubernetes makes it easy to expose the API server, especially in managed environments.

But “accessible” doesn’t mean “safe”.

If your control plane is reachable from the internet, you are:

  • Increasing your attack surface
  • Allowing unnecessary traffic to hit critical components
  • Relying entirely on authentication as your first line of defence

Key Takeaways

Lock Down the Control Plane

The Kubernetes API should not be broadly accessible unless absolutely required.

Use:

  • IP allow lists (ACLs)
  • Private endpoints
  • VPN or bastion access

Don’t Assume Internal Causes

API instability doesn’t always originate inside the cluster.

Always consider:

  • External traffic
  • Probing or scanning
  • Exposure misconfiguration

Authentication Isn’t Enough

Even failed requests consume resources.

Blocking traffic at the network layer is more effective than relying on:

  • RBAC
  • Tokens
  • Authentication layers

Monitor Control Plane Access

Where possible, enable:

  • API audit logs
  • Request rate monitoring
  • Connection metrics

These provide early warning of unusual behaviour.


A Simple Mental Model

Think of the Kubernetes API like SSH on a server.

You wouldn’t leave SSH open to the internet without restrictions—even with strong authentication.

The same principle applies here.


Final Thoughts

This issue was a useful reminder:

Not all incidents originate from within your cluster.

Sometimes, the problem is simply that your control plane is too easy to reach.

Restricting access is low effort, high impact, and immediately effective.

If you haven’t reviewed your Kubernetes API exposure recently, now is a good time.