This is a simple summary of when we sat down in Betfair and questioned why we were still applying a highly manual, so-called ‘industry best practice’ process to all our production application releases regardless of how they were being deployed and went back to the whiteboard to see how we could make it better.
First up, a disclaimer – this entry is less tech & more process orientated. Both are close to my heart – I’m a techie at heart (jack-of-all-trades, master-of-none) who moved into Operations management and landed in world of process improvement.
At the start of this journey, every Production change ticket was manually logged, manually peer reviewed, manually security reviewed and manually Change&Release team reviewed. Not quick, not automated and not clever. This was frustrating for Development teams – waiting for others to ‘approve’ their work as well slowing down production deployment speed (‘Pace’ is very much thing at Betfair). It was frustrating for security and change&release teams – constantly perceived as blockers and held the unwanted tag of ‘gatekeepers of Production’. It also made all these teams essentially operational – everyone is waiting on you to deploy, so you need to be available almost all the time. A lot of the original reasons for setting things up like this in the first place were completely logical, but as they say the road to hell is paved with good intentions. Betfair operates in a number of highly regulated international jurisdictions, and requires significant levels of compliance and regulatory reporting on top of the more usual security/change/release requirements. Some existing practices were also hangovers from a time of significant separation between Dev & Ops, and our Development teams did not always support their components in production…
So what did we do? Collaboration is the key – working with the SRE, Security, Compliance and Developments teams we focused on outcomes:- ensuring that code could be rapidly and safely deployed to production on demand, every change tracked & audited, every change met our security and quality requirements. As a range of other companies have demonstrated, automated, repeatable pipelines are the key to fast paced code-to-production systems. We embraced this approach and agreed to move from manually reviewing & approving each and every release to reviewing & approving an automated CD pipeline. This means once a pipeline is approved it will be available for development teams to use for automatic code deployment to Production. It also means that the requirements for security testing, auditing, regulatory and compliance reporting must be met by the pipelines without manual intervention.
This still left plenty to do in different areas – you can read more about the security automation piece in this blog entry: https://betsandbits.com/2015/05/21/automating-security-turning-it-up-to-11/. Once we had agreed the approach in principle, everything else was relatively straightforward, although we adopted a pragmatic approach to avoid some automation pitfalls (https://xkcd.com/1319/). We built an interface (in Python) between our ticketing system and our CD orchestration tooling allowing for change tickets to be created & updated on the fly as production deployments happened. We agreed with our compliance teams which components (if any) needed prior regulator engagement. We also cemented our ‘you build it you own it’ philosophy, as this approach means our Delivery teams and Managers take full ownership of the quality, security, stability and scalability of their components in production.
Are we there yet?
Writing the above summary makes it feel obvious and simple, especially with the benefit of hindsight, but it needed it’s fair share of selling, supporting and reinforcing. It went against the grain for some people, especially those used to a more traditional compliance, application security and change & release management approach. It also is only going to work well in an environment and culture with the right framework – such as a top down belief in Developers owning their components in production, and strong collaboration so that different areas can fully understand some of the less obvious requirements of our auditors and regulators happy as our customers.
We currently push hundreds of changes a week with an appetite for more, and have a devolved change process with allows for easy scaling and teams to own their destiny. Clear bottlenecks were removed, and the focus has moved on – all of which fits well with our philosophy of ensuring any process is lightweight and enables our teams…
A lot of my time was spent aligning ways of thinking rather that workflows – I found some of these links helpful to share with others to help them explore various approaches that others have done. There is some great reference material out there by far more gifted writers than me:
Great overview of traditional process meeting a DevOps culture from Gene Kim: http://www.theitsmreview.com/2014/03/trust-devops-movement-fits-perfectly-itsm/
Jez Humble takes a more detailed look at Continuous Delivery and Change Management on his blog: http://continuousdelivery.com/2010/11/continuous-delivery-and-itil-change-management/
Mandatory Allspaw link (a slide or quote from John Allspaw has been in pretty much every presentation I’ve done in the past couple of years…)- detailed thoughts on automation and how far to take it: http://www.kitchensoap.com/2012/09/21/a-mature-role-for-automation-part-i/
/Tom Walters is the Global Head of Service Management at Paddy Power Betfair