Introduction
Gatekeeper is a validating webhook that enforces CRD-based policies executed by Open Policy Agent. In a previous post, we went into details about OPA: this post superseeds it. The differences between OPA and Gatekeeper are listed here.
In this post we will explore Gatekeeper and start with implementing a policy to enforce a given label to be present at the namespace level.
In future posts coming soon we will implement policies as described here:
- Contraint enforcing pods to have resource limits
- Contraint enforcing dropping NET_RAW
- Contraint enforcing read-only file system
- Contraint enforcing unprivileged pods
- Contraint enforcing running containers as non-root
All examples as well as the installation of Gatekeeper are on github.
Gatekeeper vs. Pod Security Policies
Pod security policies may have been the right answer for establishing better standards and defaults, but their future seems uncertain.
Install Gatekeeper
The installation of OPA is covered in this readme.
A simple policy
Create
The policy we will create ensures that all namespaces are created with a label application
. The example can be found here. It is composed of:
- a ConstraintTemplate
templates/k8s_required_labels_template.yaml
defining a CRD of kindK8sRequiredLabels
, with a rego-rule checking if the labels of the object the constraint is applied to contains the array of labels passed a parameter - a constraint
constraints/all-ns-must-have-application-label.yaml
of kindK8sRequiredLabels
(defined in the ConstraintTemplate above), requiring that namespaces have a labelapplication
- a
dryrun
constraintconstraints/all-ns-must-have-application-label_dryrun.yaml
to allow you to test the impact
Create the ContraintTemplate
kubectl -n gatekeeper-system apply -f example1/templates/k8s_required_labels_template.yaml
Test the impact (dryrun)
In order to understand what impact a constraint will have on your workloads, Gatekeeper offers the action enforcementAction: dryrun
. As the name suggests, applying this constraint will only show you the impact it will have, but not enforce it.
Due to how Gatekeeper works, constraints will not only be checked when objects are created or changed, but regularly. In order to avoid any surprises or side-effects you should always make a dry-run to understand what effect the constraint may have on existing workloads.
Apply the dryrun constraint:
kubectl -n gatekeeper-system apply -f example1/constraints/all-ns-must-have-application-label_dryrun.yaml
and then check its status: kubectl -n gatekeeper-system get K8sRequiredLabels all-ns-must-have-application-label -o yaml
In the status
section of the yaml you will see something like:
apiVersion: constraints.gatekeeper.sh/v1beta1
[...]
status:
auditTimestamp: "2020-05-18T10:53:42Z"
byPod:
- enforced: true
id: gatekeeper-controller-manager-84c78cfb7f-sl68s
observedGeneration: 1
totalViolations: 12
violations:
- enforcementAction: dryrun
kind: Namespace
message: 'you must provide labels: {"application"}'
name: default
- enforcementAction: dryrun
kind: Namespace
message: 'you must provide labels: {"application"}'
name: kube-public
- enforcementAction: dryrun
kind: Namespace
message: 'you must provide labels: {"application"}'
name: kube-node-lease
- enforcementAction: dryrun
kind: Namespace
message: 'you must provide labels: {"application"}'
name: gatekeeper-system
As namespaces are impacted, the side effect is limited. But this will still cause Gatekeeper to check and report violations over and over.
To avoid this you should either:
- Fix the affected namespaces
- Create exceptions in the section
excludedNamespaces
of the constraint’s definition.
Apply the constraint
Once you are sure of the impact and have fixed any foreseeable side-effects you can apply the “real” constraint:
kubectl -n gatekeeper-system apply -f example1/constraints/all-ns-must-have-application-label.yaml
Test
To test the policy we have two namespace definitions:
example1/resources/bad-ns.yaml
is missing the labelapplication
example1/resources/good-ns.yaml
has the labelapplication
kubectl apply -f example1/resources/bad-ns.yaml
returns the following:
Error from server ([denied by all-ns-must-have-application-label] you must provide labels: {"application"}): error when creating "example1/resources/bad-ns.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [denied by all-ns-must-have-application-label] you must provide labels: {"application"}
kubectl apply -f example1/resources/good-ns.yaml
returns:
namespace/good-prod-ns created
Roll-out strategy
It is important to have a strategy for rolling-out Gatekeeper. It is crucial to find a way of increasing the security level of your cluster while avoiding to break the developer experience as failing to do so will hurt acceptance. Assuming that you already have a cluster with workloads in place, and plan to introduce Gatekeeper I suggest the following approach:
- Identify all namespaces that should be excluded from any policy checking permanently (e.g.
kube-system
), and add these to your constraints asexcludedNamespaces
. - In a first step create your contraints to support opt-in. This can be achieved by defining contraints that are only applicable of the namespace is labeled accordingly, e.g.
gklimits: enabled
for enabling the contraints checking for pod having resource limits on place - Always deploy your contraints with
enforcementAction: dryrun
as a first step, and monitor their status carefully. Once you feel confident removedryrun
and let Gatekeeper enforce the contraints, and keep monitoring the status - Pick teams who are mature enough to understand the benefits of policy by design and plan the roll-out with them: they will appreciate the additional level of security, but also understand that there will potentially be impediments related to their early-adopter role.
- Track the adoption by querying for the
gk*: enabled
namespace labels - Once the early-adopters have proven that your policies can scale, communicate a transition phase to all teams. The tracking will help you identifying teams who are behind
- Put an exception system in place, and communicate to the teams that in case of non-compliance they can request for an exception, but will need to provide good arguments why they can not comply with the policies.
- Change the contraints from whitelisting to backlisting, e.g. by changing namespace selector to match on
gk* NotIn [disabledbyexception]
Conclusion
Gatekeeper offers a flexible way to formalize rules in a declarative way using the rego
language. The language certainly comes with a steep learning curve but the model for declaring and deploying policies is very flexible.
Rolling-out policies also requires caution, testing and monitoring, in order to avoid acceptance issues, to make sure that the increase of the level of security or manageability of your cluster is perceived as a benefit.