At his “From DevOps to NoOps”, Mario Cruz, CTO of Choose Digital, talked about automating manual tasks to achieve a NoOps organization, using Jenkins, AWS and Docker.
For Mario, NoOps is not about the elimination of Ops, it is the automation of manual processes, being the end state of adopting a DevOps culture, or, quoting Forrester, a DevOps focus on collaboration evolves into a NoOps focus on automation. At Choose Digital, the developers own the complete process, from writing code through production deployment. By using AWS Elastic Beanstalk and Docker they can scale up and down automatically. Docker and containers are the best thing to adopt DevOps, enabling running the same artifact in your machine and in production.
Mario mentioned that Jenkins is a game changer for continuous build, deploy, testing and closing the feedback loop. They use DEV@Cloud because of the same reason they use AWS, it is not their core business, and prefer to use services from companies with the expertise to run anything not core to the business. On their journey to adopt Docker they developed several Docker related plugins that they are discarding for the ones recently announced by CloudBees, like the Traceability plugin, a very important feature for auditing and compliance.
About deployment, Choose Digital uses Blue-Green deployment, creating a new environment and updating Route53 CNAMEs when the new deployment passes some tests ran by Jenkins, and even running Netflix Chaos Monkey. With Beanstalk swap environment urls both old and new deployments can be running at the same time, and reverting a broken deployment is just a matter of switching the CNAME back to the previous url without needing a new deployment. The old environments are kept around 2 days to account for caching and ensure all users are running in the new environment.
Only parts of the stack are replaced because doing it in the whole stack at peak time takes around 34 minutes, so only small parts on the AWS Elastic Beanstalk stack are deployed, in order to do it faster and more often. For some complex cases, such as database migrations, features are turned off by default and turned on at low traffic hours.
After deployment, logs and metrics are important, for example using NewRelic has proven very helpful to understand performance issues. Using these metrics the deployments are scaled automatically from around 25 to 250 servers at peak time.