May 18, 2020

Kubernetes policies with Gatekeeper

Introduction

Gatekeeper is a validating webhook that enforces CRD-based policies executed by Open Policy Agent. In a previous post, we went into details about OPA: this post superseeds it. The differences between OPA and Gatekeeper are listed here.

In this post we will explore Gatekeeper and start with implementing a policy to enforce a given label to be present at the namespace level.

In future posts coming soon we will implement policies as described here:

  • Contraint enforcing pods to have resource limits
  • Contraint enforcing dropping NET_RAW
  • Contraint enforcing read-only file system
  • Contraint enforcing unprivileged pods
  • Contraint enforcing running containers as non-root

All examples as well as the installation of Gatekeeper are on github.

Gatekeeper vs. Pod Security Policies

Pod security policies may have been the right answer for establishing better standards and defaults, but their future seems uncertain.

Install Gatekeeper

The installation of OPA is covered in this readme.

A simple policy

Create

The policy we will create ensures that all namespaces are created with a label application. The example can be found here. It is composed of:

  • a ConstraintTemplate templates/k8s_required_labels_template.yaml defining a CRD of kind K8sRequiredLabels, with a rego-rule checking if the labels of the object the constraint is applied to contains the array of labels passed a parameter
  • a constraint constraints/all-ns-must-have-application-label.yaml of kind K8sRequiredLabels (defined in the ConstraintTemplate above), requiring that namespaces have a label application
  • a dryrun constraint constraints/all-ns-must-have-application-label_dryrun.yaml to allow you to test the impact

Create the ContraintTemplate

kubectl -n gatekeeper-system apply -f example1/templates/k8s_required_labels_template.yaml

Test the impact (dryrun)

In order to understand what impact a constraint will have on your workloads, Gatekeeper offers the action enforcementAction: dryrun. As the name suggests, applying this constraint will only show you the impact it will have, but not enforce it.

Due to how Gatekeeper works, constraints will not only be checked when objects are created or changed, but regularly. In order to avoid any surprises or side-effects you should always make a dry-run to understand what effect the constraint may have on existing workloads.

Apply the dryrun constraint:

kubectl -n gatekeeper-system apply -f example1/constraints/all-ns-must-have-application-label_dryrun.yaml

and then check its status: kubectl -n gatekeeper-system get K8sRequiredLabels all-ns-must-have-application-label -o yaml

In the status section of the yaml you will see something like:

apiVersion: constraints.gatekeeper.sh/v1beta1
[...]
status:
  auditTimestamp: "2020-05-18T10:53:42Z"
  byPod:
  - enforced: true
    id: gatekeeper-controller-manager-84c78cfb7f-sl68s
    observedGeneration: 1
  totalViolations: 12
  violations:
  - enforcementAction: dryrun
    kind: Namespace
    message: 'you must provide labels: {"application"}'
    name: default
  - enforcementAction: dryrun
    kind: Namespace
    message: 'you must provide labels: {"application"}'
    name: kube-public
  - enforcementAction: dryrun
    kind: Namespace
    message: 'you must provide labels: {"application"}'
    name: kube-node-lease
  - enforcementAction: dryrun
    kind: Namespace
    message: 'you must provide labels: {"application"}'
    name: gatekeeper-system

As namespaces are impacted, the side effect is limited. But this will still cause Gatekeeper to check and report violations over and over.

To avoid this you should either:

  • Fix the affected namespaces
  • Create exceptions in the section excludedNamespaces of the constraint’s definition.

Apply the constraint

Once you are sure of the impact and have fixed any foreseeable side-effects you can apply the “real” constraint:

kubectl -n gatekeeper-system apply -f example1/constraints/all-ns-must-have-application-label.yaml

Test

To test the policy we have two namespace definitions:

  • example1/resources/bad-ns.yaml is missing the label application
  • example1/resources/good-ns.yaml has the label application

kubectl apply -f example1/resources/bad-ns.yaml returns the following:

Error from server ([denied by all-ns-must-have-application-label] you must provide labels: {"application"}): error when creating "example1/resources/bad-ns.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [denied by all-ns-must-have-application-label] you must provide labels: {"application"}

kubectl apply -f example1/resources/good-ns.yaml returns:

namespace/good-prod-ns created

Roll-out strategy

It is important to have a strategy for rolling-out Gatekeeper. It is crucial to find a way of increasing the security level of your cluster while avoiding to break the developer experience as failing to do so will hurt acceptance. Assuming that you already have a cluster with workloads in place, and plan to introduce Gatekeeper I suggest the following approach:

  1. Identify all namespaces that should be excluded from any policy checking permanently (e.g. kube-system), and add these to your constraints as excludedNamespaces.
  2. In a first step create your contraints to support opt-in. This can be achieved by defining contraints that are only applicable of the namespace is labeled accordingly, e.g. gklimits: enabled for enabling the contraints checking for pod having resource limits on place
  3. Always deploy your contraints with enforcementAction: dryrun as a first step, and monitor their status carefully. Once you feel confident remove dryrun and let Gatekeeper enforce the contraints, and keep monitoring the status
  4. Pick teams who are mature enough to understand the benefits of policy by design and plan the roll-out with them: they will appreciate the additional level of security, but also understand that there will potentially be impediments related to their early-adopter role.
  5. Track the adoption by querying for the gk*: enabled namespace labels
  6. Once the early-adopters have proven that your policies can scale, communicate a transition phase to all teams. The tracking will help you identifying teams who are behind
  7. Put an exception system in place, and communicate to the teams that in case of non-compliance they can request for an exception, but will need to provide good arguments why they can not comply with the policies.
  8. Change the contraints from whitelisting to backlisting, e.g. by changing namespace selector to match on gk* NotIn [disabledbyexception]

Conclusion

Gatekeeper offers a flexible way to formalize rules in a declarative way using the rego language. The language certainly comes with a steep learning curve but the model for declaring and deploying policies is very flexible.

Rolling-out policies also requires caution, testing and monitoring, in order to avoid acceptance issues, to make sure that the increase of the level of security or manageability of your cluster is perceived as a benefit.

Further readings

Content licensed under CC BY 4.0