QA Operating Model

Tithely QA Strategy

A unified quality model across 8 product teams — better releases, fewer defect escapes, faster feedback. AI generates the tests; SDETs are the quality engineers who judge them.

8Product teams

9QE team

4KPIs

18Month horizon

The pipeline — every ticket flows left to right through four gates

Gate 1PR

→

Gate 2Merge

→

Gate 3Staging

→

Gate 4Production

How to read this deck: it is the operating manual for the QA function — the directions from the QA Manager (Josh Partridge). It defines how every ticket moves to production, what gates block, how we measure coverage, and what every embedded SDET owns. Use the checklists; they persist your progress.

The Model

The Tithely QA Model

Standardize QA across all 8 product teams so we ship higher-quality releases, faster — with one clear gate, one feedback loop, and one definition of done.

The strategic shift

QA Engineer → SDET. Every quality engineer writes code; every test is automated. SDETs are not test authors — they are quality engineers who own the bar.
One SDET per team. Each of the 8 product teams has a dedicated embedded SDET — no shared resources.
Parallel, not sequential. SDETs run alongside developers from refinement onward — quality is built in, not inspected at the end.
AI-first automation. Claude generates ~95% of tests; the SDET reviews, judges, and approves every one.

Two goals — never conflate them

Surge target — End of Q3 2026

Critical paths (P0) green on every high-risk surface within 90 days of full SDET placement. This is not 80% coverage.

Comprehensive horizon — Month 18

80% of P1 critical-path coverage. The long game — full regression depth across all 8 teams.

Reporting discipline: "critical paths green" (Q3) and "80% coverage" (Month 18) are two different goals. Never report one as if it were the other to leadership.

Process Flow

QA Process Flow — Ticket to Production

The operational foundation of the strategy: exactly when each test type runs, which environment it targets, and who is responsible. Tests live in a separate test repo that GitHub Actions pulls from at each pipeline stage.

Code-level gates (before QA receives the build)Unit tests pass · linting clean · SonarQube quality gate · Snyk scan · unit coverage ≥ 80%

QA Environment — test & quality gatesP0 smoke · P1 regression · DAST security scan · performance/load · multi-device (browsers + iOS/Android)

ProductionQA Manager sign-off · canary release (gradual rollout) · smoke + health checks · live monitoring

P0 vs P1 — the test classification

P0 — critical happy path. Journeys that, if broken, cause system-wide customer-visible failure. Surge target: 100% by Q3.
P1 — full regression. The full scenario including edge cases and negative paths. Targets 80–90% functional coverage; builds to Month 18.
A journey is not fully covered until both a P0 and a P1 test exist for it.

The feedback loop

Any gate failure triggers an immediate Slack alert to the SDET and developer with the failing report. The build does not advance until the SDET judges it green. The nomenclature is aligned with bug priorities — a P0 test failure is a release blocker.

CI/CD Gates

The Four-Stage Gate Design

Every team uses the Guild-maintained GitHub Actions template. Teams may add steps; they may not remove or loosen gates. Each stage has a time budget — exceeding it by >20% is a CI failure.

1 · PR — <3 min

Unit tests · linting · type check · SonarQube quality gate · Snyk dependency scan · unit coverage ≥ 80%.

BLOCKING — PR cannot merge

2 · Merge — <8 min

Integration tests · Pact contract verification · P0 critical-path smoke suite (tagged tests from the test repo) · Claude coverage-gap report (non-blocking).

BLOCKING — merge to main fails

3 · Staging — <20 min

P1 full regression (all journeys) · Percy visual snapshot · axe-core a11y audit · OWASP ZAP DAST scan · performance regression (p95 vs baseline).

BLOCKING — staging deploy fails

4 · Production — manual

Josh sign-off · SDET confirms no open regressions · rollback plan documented · canary deploy (30 min synthetic monitoring before full rollout).

BLOCKING — requires explicit approval

Why budgets are enforced as failures: a slow gate gets bypassed. Keeping PR under 3 min and merge under 8 min is what keeps the gate trusted and the feedback loop tight.

Gate Reference

Blocking Checks & Flake Quarantine

What blocks, who owns it, and what happens on failure. Quarantine is the pressure valve that keeps flaky tests from eroding trust in the gate.

Check	Stage	Owner	Failure action
Unit coverage <80%	PR	Dev + SDET	PR blocked; dev adds tests before re-review
SonarQube / Snyk critical	PR	Guild template	PR blocked; resolve, accept with justification, or patch within 48 hrs
Pact contract broken	Merge	Embedded SDET	Merge blocked; consumer or provider fixes and republishes
Smoke (P0) suite failure	Merge	Embedded SDET	Merge blocked; SDET triages within 30 min — fix or quarantine
Claude coverage-gap report	Merge	Embedded SDET	Non-blocking; posted as PR comment; SDET acts within the sprint
E2E (P1) failure (>1 non-quarantined)	Staging	Embedded SDET	Deploy blocked; fix or quarantine within 2 hrs
axe-core critical a11y	Staging	Guild + SDET	Deploy blocked; must resolve before release
Performance p95 >15% over baseline	Staging	Guild stewards	Deploy blocked; eng lead alerted; root cause required
OWASP ZAP high-severity	Staging	Guild + DevSecOps	Deploy blocked; remediation ticket; exception needs CISO sign-off

Flake quarantine: a test is flaky if it fails non-deterministically on 2+ of the last 5 runs with no code change. The SDET tags it @quarantine within 24 hrs — it still runs and logs to the flake dashboard, but is excluded from blocking gates and does not count toward coverage. Resolution SLA: 5 business days to fix or delete. A re-promoted test must pass 10 consecutive runs before re-entering the gate. @ai-generated flake rate is tracked separately — if it exceeds the human baseline, the prompts get a Guild review.

KPIs

Coverage Definition & The 4 KPIs

Coverage = % of identified critical-path journeys covered by at least one passing automated P0 or P1 test in Zephyr Scale. Not line coverage. Not test-case count. A journey is "covered" when an automated test exists, passes on the last 5 consecutive CI runs, and is tagged to that journey in Zephyr. This is the single definition used in every dashboard, gate check, and leadership report.

The four metrics — tracked monthly

< 8 / mo

P0/P1 bug volume — 25% reduction, tied to the org-wide OKR (Q4 2026).

< 5%

Defect escape rate — production bugs ÷ total bugs ('Found in Environment = Production' in Jira). Month 18.

80%

P1 regression coverage — % of journeys with a passing P1 test. Month 18.

< 2%

Flaky test rate — non-deterministic failures ÷ total. Month 18.

Milestone targets

Milestone	P0	P1	Escape	Flaky	Date
P0 surge complete	100%	baseline	baseline	baseline	End of Q3 2026
Gate 1 — all teams reporting	100%	≥30%	<15%	<10%	Month 6
Gate 2 — coverage build	100%	≥60%	<10%	<5%	Month 10
Gate 3 — year-1 review	100%	≥70%	<7%	<3%	Month 12
Comprehensive horizon	100%	≥80%	<5%	<2%	Month 18

Surge Plan

Surge Coverage Plan & 90-Day Burn-Down

The surge target is critical paths green on every high-risk surface within 90 days of the last SDET being placed and productive — infrastructure first, coverage second.

High-risk surfaces — day-90 targets (measured from July 1)

Team	Paths	Day 30	Day 60	Day 90
Enterprise Giving	12	4	8	12/12
SMB / YoY Giving	11	3	7	10/11
People Participation	12	3	7	10/12
People Core	12	2	5	9/12

Medium / lower risk — paths green by month 6

Comms & Content (8 paths) — email/SMS send, unsubscribe, bounce, template editor
Org Mgmt & Tooling (9 paths) — billing, RBAC admin, integration settings
Growth Team (7 paths) — onboarding, trial conversion, feature flags
Elvanto (10 paths) — service planning, roster, volunteer scheduling

How Giving hits day-90 green

Giving is treated as a migration project, not a standard ramp. The Enterprise Giving SDET runs a dedicated 2-week Stripe surface-mapping spike (days 1–14). Every critical path gets a Stripe sandbox from day 1 — no path is marked green without a passing test against the sandbox.

Register total: 91 journeys across 8 teams · 41/47 high-risk green at surge.

First Win

P0 Pipeline Wiring — First Steps Per Team

Wiring even a handful of tagged P0 tests into the QA-deploy gate proves the flow and delivers visible value before full coverage is built. Coverage % is irrelevant at this stage — the goal is to close the loop.

The wiring pattern — same for every team

Step 1: Tag 3–5 existing Zephyr cases that cover a critical happy path as @P0. That's enough to start.
Step 2: Add a GitHub Actions workflow that triggers on every deploy to QA, checks out the central test repo, runs only @P0 tests, and posts pass/fail to Slack within 5 minutes.
Step 3: Set the workflow as a blocking status check. Any QA deploy with a failing P0 test stops and notifies the developer immediately.
Done looks like: developer deploys to QA → P0 suite runs automatically → result in Slack in under 5 minutes.

Tagging order (highest priority first)

Week 2: Enterprise Giving (Stripe migration), SMB / YoY Giving
Week 3: People Participation, People Core
Weeks 4–6: Comms, Org Mgmt, Elvanto, Growth

P0 trigger template

.github/workflows/p0-suite.yml

on:
  workflow_dispatch:
    inputs:
      environment:
        default: qa
jobs:
  p0-suite:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          repository: tithely/qa-test-repo
          token: ${{ secrets.QA_REPO_TOKEN }}
      - run: npm ci
      - run: npx playwright test --grep @P0
        env:
          BASE_URL: ${{ vars.QA_BASE_URL }}
          STRIPE_TEST_KEY: ${{ secrets.STRIPE_TEST_KEY }}
      - name: Post result to Slack
        if: always()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            { "text": "P0 suite: ${{ job.status }} on QA deploy" }

The config is identical for every team — the only variable is which tests get the @P0 tag first.

Team Shape

Team Shape & SDET Assignments

9 total QE members covering 8 product teams. All report directly to the QA Manager. One dedicated SDET embedded per team — no shared resources.

Headcount & reporting

Role	Count	Notes
QA Manager	1	Josh Partridge — strategy, Guild, vendor mgmt, escalations, final prod sign-off
Internal SDET	1	Embedded on a high-risk product team
Build Online SDETs	7	Embedded across the remaining 7 teams
Total	9	One SDET in each of the 8 teams

What an embedded SDET owns

Build P0 & P1 automation against the team's critical paths
Use and contribute to the standard framework + prompt library
Improve quality in refinement and implementation, not just at the end
Report coverage, escapes, and flake rate weekly

Risk tier & surge priority

Team	Risk	Surge priority
Enterprise Giving	High	Day 90: 12/12 green
SMB / YoY Giving	High	Day 90: 10/11 green
People Participation	High	Day 90: 10/12 green
People Core	High	Day 90: 9/12 green
Comms & Content	Medium	Paths green by month 6
Org Mgmt & Tooling	Medium	Paths green by month 6
Elvanto	Medium	Paths green by month 6
Growth Team	Lower	Paths green by month 6

Concentration risk is real: 7 of 9 are from one vendor. Mitigated by IP ownership of all test code, a versioned Guild prompt library, a 60-day knowledge-transfer requirement, and a secondary vendor identified by month 3.

AI-First

AI-First Automation Approach

Building this coverage across 8 teams in 18 months is not feasible at human-only throughput. Claude generates the tests; the SDET is the quality engineer who reviews every one before it ships.

What Claude generates

Happy-path tests from specs · edge-case parameterisation
API scaffold from OpenAPI schemas · negative-path enumeration
Regression tests from bug reports
Coverage-gap analysis on PR diffs
Boilerplate — data factories, page objects, fixtures

What the SDET must supply

Domain judgment (Stripe migration risk, child-safety rules)
Risk-based prioritisation — which tests run first
Verifying assertions actually catch real bugs
Upstream requirements review · exploratory & adversarial testing
Prompt quality, library curation, mutation-test oversight

Mutation testing — the real quality signal. Flake rate alone does not prove tests catch bugs. Monthly, Stryker (JS/TS) or PITest (Java) introduces a small deliberate code change (e.g. > to ≥). A good test kills the mutant — it fails. A tautology test does not. Target: ≥70% mutation score by month 12; teams below 60% get their prompts reviewed.

The tautology-test problem. The most common AI failure mode is a test that passes regardless of what the code does. Before approving any AI-generated test, the SDET must be able to answer: "Can I make this test fail by introducing a real bug?" "Claude generated it" is never sufficient to approve.

Stripe Migration

Stripe Migration Coverage

The highest-priority testing surface in the strategy. It touches both Giving teams, runs in parallel with SDET onboarding, and has zero tolerance for defect escape given the financial and trust implications.

Payment methods

Card (3DS, soft/hard decline, card update) · ACH (verification, micro-deposit, return codes) · Apple Pay · Google Pay · card-present kiosks (NFC, receipt).

SCA / 3DS flows

3DS2 frictionless / challenge / failed · SCA exemptions (low-value, MIT, recurring) · 3DS1 fallback · recurring off-session re-auth. All against Stripe's test card matrix.

Recurring lifecycle

Create (weekly/monthly/annual) · schedule change · pause/resume · cancel · Smart Retries · dunning · leap-year & month-end edge cases.

Refunds & disputes

Full / partial refund · ACH refund timing · chargeback webhook handling · evidence submission · multi-fund refund allocation.

API authz & HMAC fuzzing (not just happy-path). The transactions HMAC issue class requires dedicated security testing: HMAC signature-bypass attempts, replay attacks outside the timestamp window, malformed webhook payloads, and missing/invalid idempotency keys. These run in the staging ZAP DAST scan and as a dedicated Stripe authz suite in the Giving merge gate.

Environment strategy: dedicated Stripe sandbox per environment tier · test card matrix automated in the Giving data factory · webhook replay tool (Stripe CLI) for local dev · no production Stripe keys in CI · Stripe API version pinned (upgrades need Giving SDET sign-off) · sandbox refresh on every staging deploy.

Standards

Test Standards & Patterns

Consistency across 8 teams is what makes the shared library and prompt library work. These apply to all test code — AI-generated or human-written.

AAA structure

Arrange: data factory only — no raw DB inserts, no hardcoded IDs, no shared state between tests.
Act: one action per test.
Assert: explicit, specific assertions on outcome — never on implementation detail; no multiple independent assertions per test.
Mutation check: before approving, articulate what code change would make this test fail.

Page objects & shared utilities

All Playwright tests use Guild-maintained page objects — no raw DOM selectors in test files.
Data factories handle all test data; auth/session helpers are shared.
Stripe test-card constants live in the Giving data factory — no hardcoded card numbers anywhere else.

Naming conventions

Test names: [subject]_[condition]_[expectedOutcome]

submitDonation_withExpiredCard_showsDeclinedError
fetchDonorHistory_whenLoggedOut_returns401
recurringCharge_onMonthEnd_rollsToValidDate

Tag taxonomy

Tag	Meaning
`@P0` / `@P1`	Critical happy path / full regression
`@journey-[team]-[id]`	e.g. `@journey-giving-003`
`@ai-generated`	Flake rate tracked separately
`@quarantine`	Excluded from blocking gate
`@regression-[bug-id]`	Written from an escaped defect

Manual cases (Zephyr Scale): precondition, step-by-step actions, expected result per step, pass/fail criteria, journey tag. Exploratory and hardware protocols follow their own logged standard.

Maintenance

Iteration, Maintenance & Exploratory

Tests are code — they decay and need maintenance. This is who owns what, and how the loop closes. SDETs also gate quality before dev starts, because AI amplifies whatever is in the requirements.

The maintenance loop

Trigger	Process	Owner	SLA
Feature changes	SDET reviews diff in PR; updates affected tests (re-prompts if AI-generated) before merge	Embedded SDET	Same sprint
Feature removed	Archive/delete tests; remove journeys from Zephyr; update prompt library	Embedded SDET	Same sprint
Escaped defect	Root-cause the missed journey; write `@regression-[bug-id]` test; document prompt gap	SDET + Josh	Within 5 business days of fix
Exploratory finding	If it's a register gap, add the journey and prompt Claude for coverage; else file a bug	Embedded SDET	Register updated in 2 business days
Mutation score <60%	Guild steward reviews prompts; SDET re-generates weak suite sections	Guild steward + SDET	Same monthly cycle

Exploratory cadence — 100% human

One 60–90 min charter-driven session per sprint per SDET, minimum (Giving: two). Protected in sprint planning.
Focus: new features, AI coverage gaps, recent incidents, low mutation-score areas.
Notes logged in Jira within 24 hrs; gaps feed the register within 2 business days.

Upstream requirements validation

Refinement: SDET reviews stories for testability — AC specific enough for AI to generate from?
Definition of Ready: no SDET sign-off → story doesn't start dev.
Definition of Done: dev can't close a story without SDET QE sign-off in Jira.

Security

Security Testing Depth

Security is integrated into the pipeline and tied to the existing security remediation program. Findings log directly into that backlog — not a separate board.

Test type	Tool	Stage	Blocking?	Scope
Dependency scan	Snyk	PR	Yes (crit/high)	All deps; no critical, high → 48 hr plan
SAST	SonarQube	PR	Yes	SQLi, XSS, hardcoded secrets, insecure deserialization
Secret scanning	GitHub Adv. Security	PR	Yes	Stripe keys, API tokens, DB credentials
DAST (authenticated)	OWASP ZAP	Staging	Yes (high)	All auth'd endpoints; OWASP Top 10; session, CSRF
API authz fuzzing	ZAP + k6	Staging (Giving)	Yes	HMAC bypass, replay, missing auth, IDOR, scope escalation
Stripe webhook authz	Custom suite	Merge (Giving)	Yes	Signature validation, replay window, malformed/missing HMAC
Penetration test	External vendor	Quarterly	N/A	Full surface → remediation backlog

Severity mapping: Critical = P0 (block release) · High = P1 (48 hr SLA) · Medium = P2 (next sprint) · Low = backlog. The QA Manager reviews the remediation backlog weekly and flags any item that has slipped its SLA. The monthly QA report includes a security burn-down alongside coverage.

Toolchain

Toolchain & Technology Stack

One stack across all 8 teams. Deviations require Guild approval. Free / open-source first — adopt these before evaluating paid alternatives.

Category	Tool	Notes
AI test generation	Claude (Code + API)	Primary test author; SDET reviews all output
E2E / UI	Playwright	Multi-browser; TS-first; Claude generates to this framework
API / contract	Supertest / REST Assured + Pact	Pact Broker hosted by Guild (self-hosted to start)
Unit	Jest / JUnit / pytest	Matched to each team's language stack
Mutation testing	Stryker (JS/TS) / PITest (Java)	Monthly; ≥70% score target by month 12
Load / perf	k6	Budgets & profiles enforced as a staging CI gate
Visual regression	Percy / Chromatic	Snapshot diffs on every staging deploy
Accessibility	axe-core + Playwright	WCAG 2.1 AA; 0 critical violations gate
DAST / SAST / deps	OWASP ZAP / SonarQube / Snyk	Authenticated scan + PR gates
Test management	Zephyr Scale	Journey register, coverage %, AI-vs-human tracking
Dashboards	Grafana + Allure	Coverage, escape rate, mutation score, flake, perf trends
CI/CD	GitHub Actions	4-stage gate design; Guild templates

Realistic year-1 tooling cost: ~$8k–$15k. Most of the stack is open source. Net-new paid spend is usage-based Anthropic API, Chromatic Pro, Zephyr Scale, and SonarCloud — self-host the Pact Broker and use OSS k6 to start.

Roadmap

Roadmap & Programme Gates

June 2026 is the placement window (PI2 begins Jun 24). Month 1 = July 2026 — full operational start. All surge targets (Day 90) are measured from July 1.

Programme gates & KPI milestones

Milestone	When	Target
Day 90 — Surge target	End of Q3 2026	Critical paths green on all 4 high-risk surfaces
Gate 1 — all teams live	Month 6 (Dec 2026)	≥30% coverage; P0 suites active on 8/8 teams
Gate 2 — coverage build	Month 10 (Apr 2027)	≥60% coverage; escape <10%; mutation ≥60%
Gate 3 — year-1 report	Month 12 (Jun 2027)	≥70%; vendor conversion recommendation
Comprehensive horizon	Month 18 (Dec 2027)	≥80%; escape <5%; flaky <2%; mutation ≥70%

Program increments

PI2 (Jun 24 – Sep 1): foundation buildout + 90-day surge on all four high-risk surfaces.
PI3 (Sep 2 – Nov 10): medium-risk teams to critical-path green; hit Gate 1.
PI4 (Nov 11 →): coverage build toward Gate 2 and beyond.

Activating the plan — what's next

Raise PI2 epics in Jira under Production Confidence — one per team; align with each PM + EM before raising.
Road show with product/eng — walk each team through this plan; gather feedback.
Wire the first P0 suites to the QA-deploy gate (see "First Win").
Raise PI3 epics next cycle — medium-risk teams' P0 + start of P1 regression.

Onboarding

SDET Onboarding & AI Orientation

Every SDET completes the 30-day baseline track. Those without prior AI testing experience add a 3–5 day AI orientation in week 1. This is your first month — track it.

30-day baseline track

Week 1: product walkthrough with PM · codebase overview with dev lead · dev env + Stripe sandbox setup · Guild intro · review your team's critical-path register.
Week 2: Playwright hands-on (write one critical-path E2E manually) · CI walkthrough · Jira + Zephyr setup · shared utilities · Grafana + Allure orientation.
Week 3: first sprint planning with your team · identify top 3 journey gaps · draft your surge coverage plan · review the Guild prompt library.
Week 4: submit first AI-generated PR · gap analysis to Josh · participate in first Guild meeting.

3–5 day AI orientation

Day 1–2: Claude Code + API hands-on; prompt structure; context injection with journey register + API schema.
Day 3–4: generate a test from a real spec; compare to a manual equivalent; spot tautology tests; practice the re-prompt loop.
Day 5: run Stryker on sample output; interpret mutation score; debrief with Josh.

Resources & references

Docs

Playwright

E2E framework: selectors, fixtures, page objects, the @tag grep we use to wire P0 suites.

playwright.dev/docs/intro Docs

Stryker Mutator

Mutation testing for JS/TS — how mutants are generated and how to read a mutation score.

stryker-mutator.io/docs Docs

Pact (Contract Testing)

Consumer/provider contracts, the Pact Broker, and merge-gate verification.

docs.pact.io Docs

k6 Load Testing

Load profiles, thresholds, and the p95 regression gate enforced at staging.

k6.io/docs Docs

OWASP ZAP

Authenticated DAST scans and API authz fuzzing for the Giving pipeline.

zaproxy.org/docs Docs

Stripe Testing

Test card matrix, SCA/3DS test scenarios, and the webhook replay CLI.

docs.stripe.com/testing

What good looks like at 6 months: you independently own your team's critical-path suite, can articulate the risk profile of any failing test, contribute prompts to the Guild library, and run exploratory sessions without a charter from Josh.