High-performing engineering teams and the Holy Grail

A presentation at Developer Week CloudX 2023 in August 2023 in San Mateo, CA, USA by Jeremy Meiss

Slide 1

Slide 1

High-performing Engineering Teams, and the Holy Grail

Slide 2

Slide 2

Slide 3

Slide 3

Slide 4

Slide 4

Jeremy Meiss Director, DevRel & CircleCI

Slide 5

Slide 5

So back to the tech industry…

Slide 6

Slide 6

Slide 7

Slide 7

Slide 8

Slide 8

Slide 9

Slide 9

Image: Consumer Choice Center

Slide 10

Slide 10

CI/CD Benchmarks for highperforming teams Duration Mean time to resolve Success rate Throughput

Slide 11

Slide 11

Slide 12

Slide 12

Duration the foundation of software engineering velocity, measures the average time in minutes required to move a unit of work through your pipeline

Slide 13

Slide 13

Slide 14

Slide 14

Slide 15

Slide 15

Slide 16

Slide 16

Duration Benchmark <=10 minute builds “a good rule of thumb is to keep your builds to no more than ten minutes. Many developers who use CI follow the practice of not moving on to the next task until their most recent check-in integrates successfully. Therefore, builds taking longer than ten minutes can interrupt their flow.” – Paul M. Duvall (2007). Continuous Integration: Improving Software Quality and Reducing Risk

Slide 17

Slide 17

Duration: What the data shows Benchmark: 5-10mins

Slide 18

Slide 18

Improving test coverage Add unit, integration, UI, end-to-end testing across all app layers Add code coverage into pipelines to identify inadequate testing Include static and dynamic security scans to catch vulnerabilities Incorporate TDD practices by writing tests during design phase

Slide 19

Slide 19

Slide 20

Slide 20

Mean time to Recovery the average time required to go from a failed build signal to a successful pipeline run

Slide 21

Slide 21

Slide 22

Slide 22

“A key part of doing a continuous build is that if the mainline build fails, it needs to be fixed right away. The whole point of working with CI is that you’re always developing on a known stable base.” – Martin Fowler (2006). “Continuous Integration.” Web blog post. MartinFowler.com

Slide 23

Slide 23

MTTR Benchmark <=60min MTTR on default branches

Slide 24

Slide 24

MTTR: What the data shows Benchmark: 60 mins

Slide 25

Slide 25

Treat your default branch as the lifeblood of your project

Slide 26

Slide 26

Getting to faster recovery times Treat default branch as the lifeblood of your project Set up instant alerts for failed builds (Slack, Pagerduty, etc.) Write clear, informative error messages for your tests SSH into the failed build machine to debug remote test env

Slide 27

Slide 27

Success rate number of passing runs divided by the total number of runs over a period of time

Slide 28

Slide 28

Failed signals are not all bad

Slide 29

Slide 29

Success rate benchmark 90%+ success rate on default branches

Slide 30

Slide 30

Success rate: What the data shows Benchmark: 90%+ on default

Slide 31

Slide 31

Throughput average number of workflow runs that an organization completes on a given project per day

Slide 32

Slide 32

Slide 33

Slide 33

Slide 34

Slide 34

Throughput benchmark

Slide 35

Slide 35

Throughput benchmark It depends.

Slide 36

Slide 36

Throughput: What the data shows Benchmark: at the speed of your business

Slide 37

Slide 37

Throughput is the most dependent on the other metrics

Slide 38

Slide 38

Slide 39

Slide 39

High-performing teams in 2023

Slide 40

Slide 40

The impact of Platform teams

Slide 41

Slide 41

Platform Teams, DevOps, and YOU

Slide 42

Slide 42

No, DevOps is not dead

Slide 43

Slide 43

Slide 44

Slide 44

The Rise of Platform Teams

Slide 45

Slide 45

Slide 46

Slide 46

Platform Perspective: Duration Identify and eliminate impediments to developer velocity Set guardrails and enforce quality standards across projects Standardize test suites & CI configs (shareable configs / policies) Welcome failed pipelines, i.e. fast failure Actively monitor, streamline, & parallelize pipelines across the org

Slide 47

Slide 47

Platform Perspective: MTTR Ephasise value of deploy-ready, default branches Set up effective monitoring & alerting systems, track recovery time Limit frequency & severity of broken builds w/ role-based policies Config- and Infrastructure-as-Code tools limit misconfig potential Actively monitor, streamline, & parallelize pipelines across the org

Slide 48

Slide 48

Platform Perspective: Success Rate With low success rates, look at MTTR & shorten recovery time first Set baseline success rate, aim for continuous improvement, look for flaky tests or test coverage gaps Be mindful of patterns & influence of external factors, i.e. decline on Fridays, holidays, etc.

Slide 49

Slide 49

Platform Perspective: Throughput Map goals to reality of internal & external business situations, i.e. customer expectations, competitive landscape, codebase complexity, etc. Capture a baseline, monitor for deviations Alleviate as much developer cognitive load from day-to-day work

Slide 50

Slide 50

2023 State of Software Delivery Report go.jmeiss.me/SoSDR2023

Slide 51

Slide 51

timeline.jerdog.me Thank You. @IAmJerdog @jerdog /in/jeremymeiss @jerdog@hachyderm.io

Slide 52

Slide 52