‘Amazon’s approach to automated testing’ at re:invent 2019
My thoughts and notes from this session.
This was a chalk talk with the speakers setting the scene for 10 min, and then answering questions and going into more depths based on interest.
The right-hand side is the cost of quality over a timeline. The methods for reducing risk are cheaper the closer to the inception of a functionality they reside. Think ‘’clear understanding & alignment between builders, product managers, design’, ‘continuous validation and iteration’ , ‘unit testing‘’ etc. on the left and means like ‘customer feedback’, ‘’monitoring‘ on the right-hand side. The longer you defer detection of a problem, the more it costs to correct it.
The ‘testing bow-tie’ (left) is Amazon’s way of saying that the traditional testing pyramid is not the answer in a service oriented world, built on cloud technologies. Whereas the testing pyramid may have been a useful means of thinking when building a physical product where feedback is deferred far to the future, in today’s world you have to turn the pyramid on its side, and accept that there exists other means of tackling risks; strategies of gradually rolling out a change, A/B -testing value, monitoring and metrics of use and performance. Testing / quality assurance does not stop at the top of the traditional pyramid.
Amazon’s approach to test coverage especially is very useful. They realize that the percentage (as in % of lines, functions or branches covered) is not so helpful in itself, we all know that you can write tests that traverse a large amount of lines, functions or branches, but which do not actually test anything. Instead they use the notion of coverage as additional information when processing pull requests; ‘you changed these lines but they have no tests covering them, you probably should do something about that’.
Covered lines do not matter - the ones not covered do
A lot of tackling risks at Amazon also come down to the way they learn from failures, and grow that local learning into organisational learning.
Team composition and responsibilities
The teams are responsible of developing and running services. This requires all the typical activities from development, testing, monitoring, and more importantly to learn about the service they are providing, and of the problems they are solving for the customer. Over the long term the teams become a significant source of innovation for the service. The term ‘engineer‘ was used, with no distinction if you are developing or testing etc. The assumption is that you are doing what is needed for your service. The separation of teams is based on crisp contracts, defining how the interaction between them should take place both functionally and non-functionally (e.g. expected performance). Changing the contracts has intentionally been made a big deal, and is not lightly undertaken. Teams utilize concepts of dependency dashboards to keep an eye of how the services they depend on are doing — ‘ is the contract being fulfilled’. This streamlines problem resolution. Amazon’s approach in general closely resembles what Marty Cagan calls Dedicated Product Teams
The hosts did say that there are dedicated QA people at Amazon, but they are not people you off-load testing to, but instead they act as teachers of the quality- and testing skill-sets, helping the teams do what is needed, auditing and teaching. There were some exceptions to this as well, in such a large organisation. Manual testing as a role that mostly only exists in cases of physical products.
The teams are expected to provide a health dashboard displaying varying metrics of the service’s performance. This can be drilled down into, getting to details such as test-reports, but that is by no means their primary focus. The dashboard especially should answer the questions “can a user of this service perform the three most important operations” and “does it operate relatively fast”.
In order to mitigate risk, the teams are always deploying only one region at the time, observing their metrics and limiting blast radius.
Load testing is part of the teams work, and it is based on a lot of analytics and customer modeling, they have a good understanding of how customers (and the customer base as a whole) will behave during key events.
A lot of the performance/load-testing is about stating hypothesis and then testing them out. Amazon often does this in production, as it is the fastest way for them to learn in many cases.
Trunk-based development was the most common ‘branching strategy’, but like in many cases at Amazon, they realize that there should be a very easy to start with, default way of working/tooling/etc., and that the teams should be able to decide on the right way in their context.
Developer productivity is a huge thing at Amazon, and some of their best people are utilized for increasing the productivity of the average engineer. This often means that they make the recommended way/tool so easy to use that it for the most part leads people that way.
When the teams are not sure what they are creating (product discovery, proof-of-concepts etc.) they are doing significantly less test automation. They seem to be very conscious of this, channeling their capacity to the right things at any given time.
All in all a interesting session, most of this is not exactly new, but seeing all tied together with the possibility to ask specific questions was great.
From re:invent 2019 I plan on summarizing my thoughts on at least the following sessions:
- Amazon’s approach to building resilient services
- Deployments at Amazon
- How to fail successfully at Amazon