High-performing engineering teams and the Holy Grail

A presentation at DevOps KC Meetup - March 2023 in March 2023 in Kansas City, MO, USA by Jeremy Meiss

Slide 1

Slide 1

Metrics, High-performing teams, and the Holy Grail

Slide 2

Slide 2

Slide 3

Slide 3

Slide 4

Slide 4

Jeremy Meiss Director, DevRel & Community

Slide 5

Slide 5

So back to the tech industry….

Slide 6

Slide 6

Slide 7

Slide 7

Slide 8

Slide 8

Slide 9

Slide 9

CI/CD Benchmarks for high-performing teams Duration Mean time to recovery Success rate Throughput

Slide 10

Slide 10

Slide 11

Slide 11

Slide 12

Slide 12

So what does the data say?

Slide 13

Slide 13

Duration the foundation of software engineering velocity, measures the average time in minutes required to move a unit of work through your pipeline

Slide 14

Slide 14

Slide 15

Slide 15

Slide 16

Slide 16

Duration Benchmark <=10 minute builds “a good rule of thumb is to keep your builds to no more than ten minutes. Many developers who use CI follow the practice of not moving on to the next task until their most recent checkin integrates successfully. Therefore, builds taking longer than ten minutes can interrupt their flow.” — Paul M. Duvall (2007). Continuous Integration: Improving Software Quality and Reducing Risk

Slide 17

Slide 17

Duration: What the data shows Benchmark: 5-10mins

Slide 18

Slide 18

Improving test coverage Add unit, integration, UI, and end-to-end testing across all app layers Incorporate code coverage tools into pipelines to identify inadequate testing Include static and dynamic security scans to catch vulnerabilities Incorporate TDD practices by writing tests during design phase

Slide 19

Slide 19

Slide 20

Slide 20

Mean time to Recovery the average time required to go from a failed build signal to a successful pipeline run

Slide 21

Slide 21

Mean time to recovery is indicative of resilience

Slide 22

Slide 22

Slide 23

Slide 23

“A key part of doing a continuous build is that if the mainline build fails, it needs to be fixed right away. The whole point of working with CI is that you’re always developing on a known stable base.” — Fowler, Martin. “Continuous Integration.” Web blog post. MartinFowler.com. 1 May 2006. Web.

Slide 24

Slide 24

MTTR Benchmark <=60min MTTR on default branches

Slide 25

Slide 25

MTTR: What the data shows Benchmark: 60mins

Slide 26

Slide 26

Treat your default branch as the lifeblood of your project

Slide 27

Slide 27

Getting to faster recovery times Treat your default branch as the lifeblood of your project Set up instant alerts for failed builds using services like Slack, Twilio, or Pagerduty. Write clear, informative error messages for your tests that allow you to quickly diagnose the problem and focus your efforts in the right place. SSH into the failed build machine to debug in the remote test environment. Doing so gives you access to valuable troubleshooting resources, including log files, running processes, and directory paths.

Slide 28

Slide 28

Success Rate number of passing runs divided by the total number of runs over a period of time

Slide 29

Slide 29

Slide 30

Slide 30

Success Rate Benchmark 90%+ Success rate on default branches

Slide 31

Slide 31

Success rate: What the data shows Benchmark: 90%+ on default

Slide 32

Slide 32

Throughput average number of workflow runs that an organization completes on a given project per day

Slide 33

Slide 33

Slide 34

Slide 34

Slide 35

Slide 35

So what Throughput is ideal?

Slide 36

Slide 36

Throughput Benchmark It depends.

Slide 37

Slide 37

Throughput: What the data shows Benchmark: at the speed of your business

Slide 38

Slide 38

Slide 39

Slide 39

Slide 40

Slide 40

High-Performing Teams in 2023

Slide 41

Slide 41

Platform teams and their impact

Slide 42

Slide 42

Slide 43

Slide 43

The impact of Platform Teams Duration MTTR Success rate Throughput Identify and eliminate Set up effective monitoring Look at MTTR and shorten Map goals to reality of impediments to developer and alerting systems, and recovery time first internal and external business velocity track recovery time Set a baseline success rate, situations Set guardrails and enforce Config- and Infrastructure-as- then aim for continuous Capture a baseline, monitor quality standards across Code tools limit potential for improvement for deviations projects misconfig errors Be mindful of patterns and Alleviate as much developer Standardize test suites and CI Actively monitor, streamline, influence of external factors cognitive load from day-to- pipeline configs, i.e. shareable and parallelize pipelines config templates and policies across the org day work

Slide 44

Slide 44

Slide 45

Slide 45

Thank You. timeline.jerdog.me IAmJerdog jerdog /in/jeremymeiss For feedback and swag: circle.ci/jeremy @jerdog@hachyderm.io