A Saturday Morning Conversation
I was having coffee with a colleague who works in Washington state government IT. We were talking about agentic engineering—how I've been using AI agents to tackle complex software problems—when the conversation took a turn I didn't expect.
"We have this accessibility mandate," they said. "Every agency in the state has to bring their websites and web applications into compliance with WCAG 2.2. And we're stuck."
Stuck wasn't an exaggeration. Their team maintains a portfolio of web applications spanning over a decade of Microsoft development: Classic ASP pages that nobody wants to touch, ASP.NET WebForms applications with auto-generated markup, .NET Framework 4.0 MVC applications, newer .NET Core applications, and everything in between. Many use Telerik controls—a third-party component library that generates its own HTML, much of it outside the developers' direct control.
The standard they're held to has 334 individual checklist items organized across five sections. For each page of each application, someone needs to evaluate every applicable item and document what passes, what fails, and what needs to change.
"We have people who are great at building features and fixing bugs," my colleague continued, "but nobody on the team has deep accessibility expertise. Some of these frameworks—especially the WebForms apps—we barely understand the markup they generate. How do we even assess what needs to change, let alone change it?"
I recognized the problem immediately. It wasn't a lack of motivation or funding. It was the paralysis of scope. When you don't know how big the problem is, you can't plan, you can't prioritize, and you can't make progress.
"Let me try something," I said. "Give me the PDF of the standards you're held to, and point me at one of your public-facing applications. I'll build a proof of concept this morning."
They sent me the PDF. I opened my laptop. By lunch, we had a working system.
The Mandate and the Reality
Every state agency in Washington faces the same requirement: bring web properties into conformance with WCAG 2.2 AA. This isn't optional. It's not a nice-to-have. It's a mandate with real deadlines, real oversight, and real consequences for the people who can't access government services when compliance fails.
The intent is exactly right. Government websites serve everyone—including people who are blind, have low vision, can't use a mouse, have cognitive disabilities, or rely on assistive technologies. When a CAPTCHA image has no text alternative, a blind user literally cannot reset their password. When form fields lack programmatic labels, a screen reader user can't tell what information is being requested. These aren't edge cases. These are people being locked out of services they have a right to access.
The challenge isn't the why. It's the how.
The Stack Reality
If every state agency ran modern single-page applications on a current framework, this would be hard but tractable. The reality is far messier:
| Framework | Era | Markup Control | Accessibility Challenge |
|---|---|---|---|
| Classic ASP | Late 1990s | Full control, but spaghetti code | Nobody wants to refactor, and the developers who wrote it are long gone |
| ASP.NET WebForms | 2002-2012 | **Low** — framework generates HTML from server controls | The `<asp:GridView>` renders a `<table>` you don't directly control. `<asp:TextBox>` may or may not produce a `<label>` association depending on how it's configured |
| .NET Framework MVC | 2009-2019 | High — Razor views produce predictable HTML | Better, but older views weren't written with ARIA attributes or landmark roles in mind |
| .NET Core MVC | 2016-present | High | Most accessible of the legacy stacks, but still needs explicit work |
| Telerik Controls | Any era | **Very low** — renders its own DOM, often complex nested markup | The control library handles rendering; you configure accessibility through API properties that may or may not exist for your version |
Every application in the portfolio is different. Every framework has different constraints. The team maintaining them has to be expert in all of them—or more realistically, they're competent in one or two and inherited the rest.
The Scope Problem
WCAG 2.2 AA isn't a simple checklist you can knock out in an afternoon. It's 334 individual requirements organized across five sections:
- Structure and Semantics — page titles, language attributes, heading hierarchy, landmarks, tables, iframes
- Links and Navigation — link purpose, keyboard accessibility, skip links, focus order, navigation consistency
- Images and Visual Design — alt text, color contrast ratios, text reflow, responsive design, visual cues
- User Input, Forms, and Dynamic Content — form labels, ARIA roles, keyboard operability, validation feedback, custom widgets
- Multimedia, Animations, and Motion — captions, audio descriptions, autoplay controls, flashing content
Each item is classified as either Required (must comply for AA conformance) or Best Practice (should comply for optimal accessibility). For every page of every application, each applicable item needs to be evaluated.
Most teams look at this list and freeze. Where do you start? Which pages matter most? Which failures are blocking real users right now versus which are technical best practices? Without answers to these questions, teams either do nothing or do everything—both of which are bad strategies.
What the Standard Actually Requires
Before showing how we approached the audit, it's worth understanding what the WCAG 2.2 standard looks like in practice. Not every item applies to every page, and understanding the structure helps explain how the AI agents specialize.
The Five Sections
Section 1: Structure and Semantics is foundational. If your page doesn't have a proper <html lang="en"> attribute, a descriptive <title>, a logical heading hierarchy (<h1> through <h6>), and landmark regions (<header>, <nav>, <main>, <footer>), everything downstream suffers. Screen readers use this structure to navigate. Without it, users are flying blind—ironically, the exact problem the standard is trying to solve.
Section 2: Links and Navigation requires that every link has a discernible purpose, that keyboard users can navigate without a mouse, that skip links let users bypass repeated header content, and that navigation patterns are consistent across pages. This is where the "MERIT Help" links that go to href="#" (nowhere) get flagged.
Section 3: Images and Visual Design covers alt text on images, color contrast ratios (4.5:1 for normal text, 3:1 for large text), text reflow at mobile widths without horizontal scrolling, and ensuring that color isn't the sole means of conveying information. This is measurable, data-driven work—you can compute contrast ratios from CSS values.
Section 4: User Input, Forms, and Dynamic Content is the largest section and the one that hits legacy frameworks hardest. Every form input needs a programmatically associated <label>. Required fields need to be designated. Error messages need to be linked to the fields they describe. Custom widgets need proper ARIA roles, states, and properties. Keyboard focus management must be intentional, not accidental.
This is where WebForms and Telerik applications struggle most. A <label> element needs a for attribute matching the input's id—but WebForms generates IDs like ctl00_ContentPlaceHolder1_txtEmail, and the label association depends on how the developer configured the server control.
Section 5: Multimedia, Animations, and Motion covers captions, audio descriptions, autoplay controls, and flashing content limits. Most government web applications have minimal multimedia, so this section is frequently N/A—but when it applies, the requirements are strict (e.g., no more than 3 flashes per second).
Required vs. Best Practice
Not everything in the standard is mandatory. Required items must be implemented for WCAG AA conformance. Best Practice items should be implemented but won't cause a conformance failure if missing. This distinction matters for prioritization—a team should fix all Required violations before addressing Best Practice recommendations.
The Proof of Concept: AI-Powered Accessibility Auditing
Here's where Saturday morning gets interesting.
What Are Agentic Engineering Workflows?
For readers who haven't encountered this term: agentic engineering uses AI models not as chatbots you ask questions to, but as autonomous agents that perform work. You give an agent a task, tools, and context. It plans its approach, executes steps, makes decisions, and produces results—with human oversight at key checkpoints.
The tools I use are Claude Code as the AI backbone and Playwright for browser automation. Claude Code can navigate web pages, take screenshots, inspect the DOM, simulate keyboard interactions, and evaluate what it finds against a set of standards.
The Architecture: Five Specialist Agents
Rather than having one agent try to evaluate all 334 checklist items—which would overwhelm its context and produce shallow results—I built a team of five specialist agents, each expert in one section of the WCAG standard:
The orchestrator navigates to each page, captures a screenshot (visual evidence), a DOM snapshot (semantic evidence), extracts color values for contrast measurement, and runs a keyboard tab sequence to test focus order. Then it dispatches all five specialist agents in parallel—each receives the evidence plus their specific section of the standard.
Each specialist evaluates every applicable item in their section and returns structured findings: what passed, what failed, the WCAG criterion reference, severity, and a proposed fix.
The orchestrator collects all five agents' results and assembles a page report.
Why This Works Across Any Framework
This is the key insight for government teams dealing with heterogeneous stacks: the audit evaluates the rendered output, not the source code.
It doesn't matter if your page was generated by WebForms, Razor, Telerik, or hand-written HTML. The agent opens a browser, navigates to the URL, and evaluates what arrives in the DOM. A missing <label> association is a missing <label> association whether the page was built in 2004 or 2024.
This means one audit process works against your entire portfolio. No framework-specific tooling. No source code access required. No build pipeline integration needed for the initial assessment.
The full source code for this proof of concept is available at github.com/NotMyself/dcyf-accessibility.
Real Results: Auditing DCYF's MERIT Application
The application my colleague pointed me to was MERIT—Washington's Managed Education and Registry Information Tool, a professional development and workforce registry operated by the Department of Children, Youth, and Families (DCYF).
What We Audited
MERIT has nine publicly accessible pages (everything behind the login requires credentials we didn't have):
| Page | Route | What It Does |
|---|---|---|
| Welcome | `/MERIT/Home/Welcome` | Landing page with sign-in and training search |
| Find Training (Hub) | `/MERIT/Search` | Navigation hub for three search types |
| Find Training (Search) | `/MERIT/Search/Trainings` | Search form for training courses |
| Find Trainers | `/MERIT/Search/Trainers` | Search form for approved trainers |
| Find Organizations | `/MERIT/Search/Organizations` | Search form for organizations |
| Sign In / Register | `/MERIT/Home/SignInRegister` | Login and registration forms |
| Recover Username | `/MERIT/Home/RecoverUsername` | Account recovery form |
| Forgot Password | `/MERIT/Public/ForgotPassword.aspx` | Password reset with CAPTCHA |
| Find STARS ID | `/MERIT/Home/FindStarsId` | ID lookup form |
Notice that last page: ForgotPassword.aspx. That's a legacy WebForms page living alongside the modern MVC pages. This is exactly the mixed-stack reality state agencies deal with.
The Numbers
| Metric | Value |
|---|---|
| Standards items evaluated | 2,484 (across 9 pages) |
| Pass | 974 (39%) |
| Fail | 185 (7%) |
| N/A | 1,325 (53%) |
| **Critical findings** | **10** |
| **High findings** | **28** |
| Medium findings | 87 |
| Low findings | 60 |
The high N/A count is expected—most pages don't have multimedia (Section 5 is largely N/A), and the simpler pages don't have forms (most of Section 4 is N/A). What matters are the failures, especially Critical and High.
The Most Impactful Findings
The Forgot Password page uses an image CAPTCHA—a distorted picture of letters and numbers that users must type to proceed. This CAPTCHA has no alt text and no audio alternative. A blind user visiting this page to reset their password encounters an image they cannot perceive, with no alternative method to complete the challenge.
This means a blind person literally cannot reset their MERIT password. That's not a technical inconvenience—it's a complete access barrier to a state government service.
The fix: replace the custom CAPTCHA with a modern accessible alternative like reCAPTCHA v3 or Cloudflare Turnstile, which handle verification without requiring users to solve visual puzzles.
Across every page with a form—seven of the nine pages audited—form inputs use adjacent text as visual labels but don't use <label> elements with for attributes to create a programmatic association. This means screen reader users hear the input element but not what it's for.
On the Sign In page, a screen reader user Tab-navigating to the username field hears "edit text" instead of "Username: edit text." They have to guess what each field expects.
This is the most pervasive finding. Every form on the site has this issue. The fix is straightforward—add <label for="inputId"> elements—but it needs to be applied everywhere.
None of the nine pages have a "skip to main content" link. Keyboard-only users must Tab through every header element, navigation link, and search bar before reaching the main content on every single page they visit. On pages with extensive navigation, this means dozens of Tab presses before reaching the first form field.
The Forgot Password page (ForgotPassword.aspx) is a legacy WebForms page with a fundamentally different architecture from the rest of the site. It uses table-based layout instead of CSS, has no <main> landmark, its page title is the generic site name rather than identifying the page purpose, and the CAPTCHA instructions reference visual content only ("Type the characters you see in the image").
This page alone accounts for 4 of the 10 Critical findings.
What the Reports Look Like
Each page gets a detailed report with findings organized by WCAG section, severity ratings, and proposed fixes. Here's what a findings table looks like:
| # | Topic | WCAG | Level | Severity | Description | Proposed Fix |
|---|---|---|---|---|---|---|
| 1 | Missing Main Landmark | 2.4.1 | Required | High | No `<main>` landmark. Screen reader landmark navigation skips core content. | Wrap content in `<main>` element. |
| 2 | Missing Skip Link | 2.4.1 | Required | High | No skip link. First Tab stop is "Sign In" button. | Add skip link as first focusable element. |
| 3 | MERIT Help Dead Link | 2.4.4 | Required | High | `href="#"` goes nowhere—purpose is indiscernible. | Fix with valid URL or convert to button. |
The executive summary rolls up all page findings with severity counts, per-section compliance rates, and prioritized remediation recommendations.
Where This Goes Next
What I built on a Saturday morning is a proof of concept. It demonstrates that AI agents can meaningfully evaluate web applications against WCAG standards and produce actionable findings. But it's a starting point, not a finished product.
Here's what becomes possible with further development, documented in the project's ENHANCEMENTS.md:
1. Exact Code Fixes
The current audit evaluates the rendered output without access to source code. Given access to the application source, the agents can produce exact fixes—file path, line number, and corrective code.
This works across technology stacks because the agents understand each framework's markup generation patterns. A WebForms fix looks different from an MVC fix, which looks different from a .NET Core fix. The agent knows that adding a <label> in WebForms means configuring the AssociatedControlID property on an <asp:Label> server control, while in MVC it means adding @Html.LabelFor() in the Razor view.
2. Azure Boards Integration
Given access to your organization's Azure DevOps project, the system can create detailed epics, stories, and tasks for resolving accessibility issues. Each finding becomes a work item with the WCAG reference, severity, affected pages, and remediation steps. Your team gets a structured backlog instead of a PDF to interpret.
3. Pull Request Review
Given access to Azure DevOps pull requests, the system can scan proposed changes for accessibility violations and post feedback directly on the PR. This catches new violations before they reach production—a shift-left approach that prevents accessibility debt from accumulating.
4. Automated Fixes
Given even more autonomy, the system can apply fixes and submit pull requests for team review. It can also update a submitted PR to address issues discovered during its own review. Your team reviews and approves rather than writes every fix from scratch.
5. Additional Standards
The same architecture extends beyond accessibility. Other evaluations can scan source code for OWASP Top 10 security issues, enforce coding standards across a codebase, or flag known anti-pattern coding practices. The pattern is general: define the standard, build specialist agents, run them against the codebase.
Why This Matters for Your Agency
Visibility Before Remediation
You can't fix what you can't see. The most common failure mode I observe is teams attempting to remediate without first understanding the scope. They pick a page, start fixing issues they notice, and six months later realize they've addressed 3% of the problem.
An automated audit gives you a complete inventory of compliance gaps across your entire portfolio in hours, not months. You see every Critical finding that's blocking real users right now, every High finding that's creating significant barriers, and every Medium and Low finding that needs attention eventually.
Prioritization That Makes Sense
Not all findings are equal. A CAPTCHA that blocks blind users from resetting their password is a fundamentally different problem than a footer link that could use a more descriptive text. The severity framework maps directly to action:
| Severity | What It Means | Action |
|---|---|---|
| **Critical** | Completely blocks access for one or more disability groups | Fix immediately—people are being locked out right now |
| **High** | Significant barrier that makes features very difficult to use | Fix in the current sprint/cycle |
| **Medium** | Moderate impact or best practice with significant effect | Plan for the next cycle |
| **Low** | Best practice not followed, minor impact | Address as part of ongoing maintenance |
This gives leadership a way to allocate resources based on impact rather than guessing.
Framework-Agnostic
The audit evaluates rendered HTML. It doesn't care if your page was generated by WebForms, MVC, Telerik, or hand-written markup. This means one process covers your entire portfolio—no framework-specific tooling, no source code access needed for the initial assessment.
Repeatable
Run the audit before remediation to establish a baseline. Run it again after fixes to verify compliance. Run it quarterly to catch regressions. The process produces comparable results every time, giving you a measurable compliance trajectory instead of subjective assessments.
Built on a Saturday Morning
This proof of concept went from a conversation over coffee to a working system producing real findings against a production application in a single morning. That's not because the problem is trivial—it's because agentic engineering workflows compress effort in ways that weren't possible even a year ago.
Imagine what a focused, multi-week engagement could deliver across your agency's full portfolio.
The Path Forward
The accessibility mandate is achievable. It requires the right approach, the right tools, and realistic expectations about what automated auditing can and can't do.
What AI Auditing Does Well
- Systematically evaluates every applicable standard against every page
- Catches site-wide issues (missing skip links, unlabeled forms) that manual review might inconsistently flag
- Measures contrast ratios, validates heading hierarchies, checks ARIA attributes—the mechanical, repetitive work that humans do slowly and inconsistently
- Produces structured, prioritized findings that development teams can act on directly
What Still Needs Human Judgment
- Evaluating whether alt text meaningfully describes an image (an agent can verify alt text exists, but judging quality requires context)
- Testing with actual screen readers and assistive technologies
- Assessing cognitive load and readability
- Evaluating touch interactions on mobile devices
- Making remediation decisions when fixes involve design trade-offs
The right model is AI auditing for coverage and consistency, with human expertise for judgment calls and validation. The AI handles the 334-item checklist so your accessibility specialists can focus on the problems that require human insight.
Let's Talk
I build custom agentic engineering workflows that solve complex software problems. The accessibility audit demonstrated here is one application of a general approach: define the problem clearly, build specialized agents, and let them work systematically at a scale that human effort alone can't match.
If your agency is facing the accessibility mandate and your team is struggling with scope, legacy systems, or expertise gaps, I can help:
- Audit your portfolio — run the accessibility evaluation against your public-facing and internal applications, across any framework, and deliver prioritized findings your team can act on
- Build remediation workflows — create agents that understand your specific technology stack and produce exact code fixes, work items, and pull requests
- Establish continuous compliance — integrate accessibility review into your development pipeline so new violations are caught before they reach production
The mandate is real. The deadline is real. But the path forward is clearer than you might think.
Get in touch to discuss what a targeted engagement could look like for your agency.
The full proof of concept source code is available at github.com/NotMyself/dcyf-accessibility. The audit results for the DCYF MERIT application are published at notmyself.github.io/dcyf-accessibility.
