This article was originally published on Heptio's blog by Joe Beda. With their kind permission, we’re sharing it here for Codeship readers.
This is the sixth part in a multi-part series that examines multiple angles of how to think about and apply “cloud native” thinking.
Note: this post doesn’t cover all of the angles around security in the new “cloud native” world. Also, while I’m not a security expert, it is something that I’ve paid attention to throughout my career. Consider this a part of a map on things to consider.
Security is still a big question in the cloud native world. Old techniques don’t apply cleanly and so, initially, cloud native may appear to be a step backward. But this brave new world also introduces opportunities.
Container Image Security
There are quite a few tools that help users audit their container images to ensure that they are fully patched. I don’t have a strong opinion on the various options there.
The real problem: what do you do once you find a vulnerable container image? This is a place where the market hasn’t provided a great set of solutions.
Once a vulnerable image is found, this changes things from being a technical issue to a process/workflow issue.
You will want to identify which groups within your organization are impacted, where in your container image “tree” to fix the problem, and how best to test and push out a new patched version.
CI/CD (continuous integration/continuous deployment) is a critical piece of the puzzle as it will enable automated and quick release processes for the new images. Furthermore, integration with orchestration systems will enable you to identify which users are using which vulnerable images.
It will also allow you to verify that a new fixed version is actually being run in production. Finally, policy in your deployment system can help prevent new containers from being launched with a known bad image. (In the Kubernetes world, this policy is called admission.)
Microservice and Network Security
But even if all of the things you are running on your cluster are patched, it doesn’t ensure that there isn’t untrusted activity on your network.
Traditional network-based security tools don’t work well in a dynamically scheduled short-lived container world. Short-lived containers may not be around long enough to be scanned by traditional scanning tools. And by the time a report is generated, the container in question may be gone.
With dynamic orchestrators, IPs don’t have long-term meaning and can be reused automatically. The solution is to integrate network analysis tools with the orchestrator so that logical names (and other metadata) can be used in addition to raw IP addresses. This will likely make alerts more easily actionable.
Many of the networking technologies leverage encapsulation to implement an “IP per container.” This can create issues for network tracing and inspection tools. They will have to be adapted if such networking systems are deployed in production. Luckily, much of this has standardized on VXLAN, VLANs, or no encapsulation/virtualization so support can be leveraged across many such systems.
However, in my opinion, the biggest issues are around microservices.
When there are many services running in production, it is necessary to ensure that only authorized clients are calling any particular service. Furthermore, with reuse of IPs, clients need to know that they are speaking with the correct service. As of now, this is largely an unsolved problem. There are two (non-mutually exclusive) ways to approach this problem.
First, the more flexible networking systems and the opportunity to implement host-level firewall rules (outside any container) to enable fine grained access policies for which containers can call which other containers. I’ve been calling this approach network micro-segmentation.
The challenge here is one of configuring such policy in the face of dynamic scheduling. While early yet, there are multiple companies working to make this easier through support in the network, coordination with the orchestrator and higher level application definitions.
One big caveat: micro-segmentation becomes less effective the more widely any specific service is used. If a service has hundreds of callers, simple “access implies authorization” models are no longer effective.
The second approach is for applications to play a larger role in implementing authentication and encryption inside the datacenter. This works as services take on many clients and become “soft multi-tenant” inside a large organization. This requires a system of identity for production services.
As a side project, I’ve started a project called SPIFFE (Secure Production Identity Framework For Everyone). These ideas are proven inside of companies such as Google but haven’t been widely deployed elsewhere.
Security is a deep topic and I’m sure that there are threats and considerations not listed here. This will have to be an ongoing discussion.
Check out the rest of the series: