In Securing the Cloud: Part 1 , we looked at the ways in which developers at CloudBeesmanage credentials. In today's post, we'll look at how we manage security around remote login and remote development.
Like the previous post, because a large portion of our infrastructure is in the Amazon Web Services environment (AWS), this post will specifically focus on that platform.
Remote Server Login
One major advantage in using the CloudBees Platform as a Service (PaaS) is that you do not have to manage servers anymore. Using our platform, developers develop, deploy and scale applications with minimal server interaction.
However, behind the scenes, CloudBees engineers do need to manage server lifecycle. Not only for instances that run customer code, but for web proxying layers, databases, Git/SVN repos, and many other administrative things. In the previous post, we discussed the credentials that allow developers to see, and perhaps manage, the lifecycle of these servers. However, we also need to manage the ability to remotely login to these machines to perform maintenance or fix problems that may occur. In addition, we need to limit traffic from the outside world in a way that allows applications to work, but does not allow malicious attempts to break into the systems.
Locked Down Access
Our first strategy is to make prodigious use of EC2 security groups and rules. Each of our instances has a particular role it serves, and as such is tied to a specific security group that reflects that type of role. Our application servers, our proxying layer and our databases each have separate EC2 security groups attached to them. On the DEV@cloud side, our Jenkins controller instances, the executor machines and the proxying layer also have their own EC2 security groups.
It is within these security groups that we can restrict outside traffic to only the ports needed, and then also limit inside traffic between the EC2 security groups where things need to "talk" internally. For example, our web proxying layer allows outside traffic from ports 80 and 443 - and that's it. Our application servers don't allow outside traffic at all, and only allow connections to specific ports coming from the web proxying layer. This tiered and locked down approach ensures we don't succumb to attackers looking for a backdoor into our environment.
Of course, we still DO need backdoors into the systems in order for our own team to get in and perform administrative tasks. Most commonly this includes remote login (SSH) to a server, but also includes access to backend web interfaces to monitor application health or observe application metrics in order to solve issues.
To ensure we maintain as much security around these backdoors as possible, we hide them all behind a Virtual Private Network (VPN) that is accessible only to CloudBees developers. We use openvpn, which is a userspace-based SSL VPN that tunnels traffic over UDP. Each developer who has the need for access is given a private key to access the VPN. Once established on the VPN, the developer now has access to the ports needed to get into the system.
Note that doesn't mean they automatically have access INTO the systems, it just means they have access to the mechanisms to get into the systems. Case in point: once on the VPN, developers have access into port 22 (SSH) on our various machines. However, this still doesn't mean they have the access keys to actually login to those various systems - this is a separate credentialing and distribution mechanism that is handled on an as-needed basis.
This two layer approach gives us a high level of security, while still maintaining usability for our development team.
While it provides security, the VPN system can still be a source of friction. Maintenance, or an unplanned outage on the VPN system itself, can halt developer progress across the entire system. In a way, the VPN becomes a single point of failure for our team to be able to handle system level issues, should they occur.
To handle this, we allow our administrators to make temporary rule changes to the EC2 security groups. This facilitates work on system issues if the VPN system, itself, becomes a bottleneck to progress. As an example, they can open SSH access to a specific external IP address a developer may be using in order to let them login while bypassing the VPN. This change can only be facilitated by an administrator.
In addition, our security group rules are monitored by an external script on a nightly basis. A script matches the state of the security group rules with a known state stored in a Git repository; any deviations are noted and an email is generated. This allows all administrators to keep tabs on rule changes and ensure "temporary" changes get reverted, or made permanent by adding them to the Git repository of "good" rules.
We feel that our VPN approach, coupled with continuous auditing of security group rules against a known standard, provides us with a very high level of overall security around external facing access into our critical infrastructure. This, in turn, provides our customers with the highest levels of security against intrusion and potential data theft.
In my third and final post on the topic of security, we'll look at how we manage credential access to external services that developers may need to use.
-- Caleb Tennis, Elite Developer
Read Parts 1 and 3 in Caleb's Securing the Cloud blog series: