Accessibility Compliance Is Achievable: AI-Powered Auditing for Government Web Applications

A Saturday Morning Conversation

I was having coffee with a colleague who works in Washington state government IT. We were talking about agentic engineering—how I've been using AI agents to tackle complex software problems—when the conversation took a turn I didn't expect.

"We have this accessibility mandate," they said. "Every agency in the state has to bring their websites and web applications into compliance with WCAG 2.2. And we're stuck."

Stuck wasn't an exaggeration. Their team maintains a portfolio of web applications spanning over a decade of Microsoft development: Classic ASP pages that nobody wants to touch, ASP.NET WebForms applications with auto-generated markup, .NET Framework 4.0 MVC applications, newer .NET Core applications, and everything in between. Many use Telerik controls—a third-party component library that generates its own HTML, much of it outside the developers' direct control.

The standard they're held to has 334 individual checklist items organized across five sections. For each page of each application, someone needs to evaluate every applicable item and document what passes, what fails, and what needs to change.

"We have people who are great at building features and fixing bugs," my colleague continued, "but nobody on the team has deep accessibility expertise. Some of these frameworks—especially the WebForms apps—we barely understand the markup they generate. How do we even assess what needs to change, let alone change it?"

I recognized the problem immediately. It wasn't a lack of motivation or funding. It was the paralysis of scope. When you don't know how big the problem is, you can't plan, you can't prioritize, and you can't make progress.

"Let me try something," I said. "Give me the PDF of the standards you're held to, and point me at one of your public-facing applications. I'll build a proof of concept this morning."

They sent me the PDF. I opened my laptop. By lunch, we had a working system.

The Mandate and the Reality

Every state agency in Washington faces the same requirement: bring web properties into conformance with WCAG 2.2 AA. This isn't optional. It's not a nice-to-have. It's a mandate with real deadlines, real oversight, and real consequences for the people who can't access government services when compliance fails.

The intent is exactly right. Government websites serve everyone—including people who are blind, have low vision, can't use a mouse, have cognitive disabilities, or rely on assistive technologies. When a CAPTCHA image has no text alternative, a blind user literally cannot reset their password. When form fields lack programmatic labels, a screen reader user can't tell what information is being requested. These aren't edge cases. These are people being locked out of services they have a right to access.

The challenge isn't the why. It's the how.

The Stack Reality

If every state agency ran modern single-page applications on a current framework, this would be hard but tractable. The reality is far messier:

Framework	Era	Markup Control	Accessibility Challenge
Classic ASP	Late 1990s	Full control, but spaghetti code	Nobody wants to refactor, and the developers who wrote it are long gone
ASP.NET WebForms	2002-2012	Low — framework generates HTML from server controls	The `<asp:GridView>` renders a `<table>` you don't directly control. `<asp:TextBox>` may or may not produce a `<label>` association depending on how it's configured
.NET Framework MVC	2009-2019	High — Razor views produce predictable HTML	Better, but older views weren't written with ARIA attributes or landmark roles in mind
.NET Core MVC	2016-present	High	Most accessible of the legacy stacks, but still needs explicit work
Telerik Controls	Any era	Very low — renders its own DOM, often complex nested markup	The control library handles rendering; you configure accessibility through API properties that may or may not exist for your version

Every application in the portfolio is different. Every framework has different constraints. The team maintaining them has to be expert in all of them—or more realistically, they're competent in one or two and inherited the rest.

The Scope Problem

WCAG 2.2 AA isn't a simple checklist you can knock out in an afternoon. It's 334 individual requirements organized across five sections:

Structure and Semantics — page titles, language attributes, heading hierarchy, landmarks, tables, iframes
Links and Navigation — link purpose, keyboard accessibility, skip links, focus order, navigation consistency
Images and Visual Design — alt text, color contrast ratios, text reflow, responsive design, visual cues
User Input, Forms, and Dynamic Content — form labels, ARIA roles, keyboard operability, validation feedback, custom widgets
Multimedia, Animations, and Motion — captions, audio descriptions, autoplay controls, flashing content

Each item is classified as either Required (must comply for AA conformance) or Best Practice (should comply for optimal accessibility). For every page of every application, each applicable item needs to be evaluated.

Most teams look at this list and freeze. Where do you start? Which pages matter most? Which failures are blocking real users right now versus which are technical best practices? Without answers to these questions, teams either do nothing or do everything—both of which are bad strategies.

What the Standard Actually Requires

Before showing how we approached the audit, it's worth understanding what the WCAG 2.2 standard looks like in practice. Not every item applies to every page, and understanding the structure helps explain how the AI agents specialize.

The Five Sections

Section 1: Structure and Semantics is foundational. If your page doesn't have a proper <html lang="en"> attribute, a descriptive <title>, a logical heading hierarchy (<h1> through <h6>), and landmark regions (<header>, <nav>, <main>, <footer>), everything downstream suffers. Screen readers use this structure to navigate. Without it, users are flying blind—ironically, the exact problem the standard is trying to solve.

Section 2: Links and Navigation requires that every link has a discernible purpose, that keyboard users can navigate without a mouse, that skip links let users bypass repeated header content, and that navigation patterns are consistent across pages. This is where the "MERIT Help" links that go to href="#" (nowhere) get flagged.

Section 3: Images and Visual Design covers alt text on images, color contrast ratios (4.5:1 for normal text, 3:1 for large text), text reflow at mobile widths without horizontal scrolling, and ensuring that color isn't the sole means of conveying information. This is measurable, data-driven work—you can compute contrast ratios from CSS values.

Section 4: User Input, Forms, and Dynamic Content is the largest section and the one that hits legacy frameworks hardest. Every form input needs a programmatically associated <label>. Required fields need to be designated. Error messages need to be linked to the fields they describe. Custom widgets need proper ARIA roles, states, and properties. Keyboard focus management must be intentional, not accidental.

This is where WebForms and Telerik applications struggle most. A <label> element needs a for attribute matching the input's id—but WebForms generates IDs like ctl00_ContentPlaceHolder1_txtEmail, and the label association depends on how the developer configured the server control.

Section 5: Multimedia, Animations, and Motion covers captions, audio descriptions, autoplay controls, and flashing content limits. Most government web applications have minimal multimedia, so this section is frequently N/A—but when it applies, the requirements are strict (e.g., no more than 3 flashes per second).

Required vs. Best Practice

Not everything in the standard is mandatory. Required items must be implemented for WCAG AA conformance. Best Practice items should be implemented but won't cause a conformance failure if missing. This distinction matters for prioritization—a team should fix all Required violations before addressing Best Practice recommendations.

The Proof of Concept: AI-Powered Accessibility Auditing

Here's where Saturday morning gets interesting.

What Are Agentic Engineering Workflows?

For readers who haven't encountered this term: agentic engineering uses AI models not as chatbots you ask questions to, but as autonomous agents that perform work. You give an agent a task, tools, and context. It plans its approach, executes steps, makes decisions, and produces results—with human oversight at key checkpoints.

The tools I use are Claude Code as the AI backbone and Playwright for browser automation. Claude Code can navigate web pages, take screenshots, inspect the DOM, simulate keyboard interactions, and evaluate what it finds against a set of standards.

The Architecture: Five Specialist Agents

Rather than having one agent try to evaluate all 334 checklist items—which would overwhelm its context and produce shallow results—I built a team of five specialist agents, each expert in one section of the WCAG standard:

The orchestrator navigates to each page, captures a screenshot (visual evidence), a DOM snapshot (semantic evidence), extracts color values for contrast measurement, and runs a keyboard tab sequence to test focus order. Then it dispatches all five specialist agents in parallel—each receives the evidence plus their specific section of the standard.

Each specialist evaluates every applicable item in their section and returns structured findings: what passed, what failed, the WCAG criterion reference, severity, and a proposed fix.

The orchestrator collects all five agents' results and assembles a page report.

Why This Works Across Any Framework

This is the key insight for government teams dealing with heterogeneous stacks: the audit evaluates the rendered output, not the source code.

It doesn't matter if your page was generated by WebForms, Razor, Telerik, or hand-written HTML. The agent opens a browser, navigates to the URL, and evaluates what arrives in the DOM. A missing <label> association is a missing <label> association whether the page was built in 2004 or 2024.

This means one audit process works against your entire portfolio. No framework-specific tooling. No source code access required. No build pipeline integration needed for the initial assessment.

The full source code for this proof of concept is available at github.com/NotMyself/dcyf-accessibility.

Real Results: Auditing DCYF's MERIT Application

The application my colleague pointed me to was MERIT—Washington's Managed Education and Registry Information Tool, a professional development and workforce registry operated by the Department of Children, Youth, and Families (DCYF).

What We Audited

MERIT has nine publicly accessible pages (everything behind the login requires credentials we didn't have):

Page	Route	What It Does
Welcome	`/MERIT/Home/Welcome`	Landing page with sign-in and training search
Find Training (Hub)	`/MERIT/Search`	Navigation hub for three search types
Find Training (Search)	`/MERIT/Search/Trainings`	Search form for training courses
Find Trainers	`/MERIT/Search/Trainers`	Search form for approved trainers
Find Organizations	`/MERIT/Search/Organizations`	Search form for organizations
Sign In / Register	`/MERIT/Home/SignInRegister`	Login and registration forms
Recover Username	`/MERIT/Home/RecoverUsername`	Account recovery form
Forgot Password	`/MERIT/Public/ForgotPassword.aspx`	Password reset with CAPTCHA
Find STARS ID	`/MERIT/Home/FindStarsId`	ID lookup form

Notice that last page: ForgotPassword.aspx. That's a legacy WebForms page living alongside the modern MVC pages. This is exactly the mixed-stack reality state agencies deal with.

The Numbers

Metric	Value
Standards items evaluated	2,484 (across 9 pages)
Pass	974 (39%)
Fail	185 (7%)
N/A	1,325 (53%)
Critical findings	10
High findings	28
Medium findings	87
Low findings	60

The high N/A count is expected—most pages don't have multimedia (Section 5 is largely N/A), and the simpler pages don't have forms (most of Section 4 is N/A). What matters are the failures, especially Critical and High.

The Most Impactful Findings

The Forgot Password page uses an image CAPTCHA—a distorted picture of letters and numbers that users must type to proceed. This CAPTCHA has no alt text and no audio alternative. A blind user visiting this page to reset their password encounters an image they cannot perceive, with no alternative method to complete the challenge.

This means a blind person literally cannot reset their MERIT password. That's not a technical inconvenience—it's a complete access barrier to a state government service.

The fix: replace the custom CAPTCHA with a modern accessible alternative like reCAPTCHA v3 or Cloudflare Turnstile, which handle verification without requiring users to solve visual puzzles.

Across every page with a form—seven of the nine pages audited—form inputs use adjacent text as visual labels but don't use <label> elements with for attributes to create a programmatic association. This means screen reader users hear the input element but not what it's for.

On the Sign In page, a screen reader user Tab-navigating to the username field hears "edit text" instead of "Username: edit text." They have to guess what each field expects.

This is the most pervasive finding. Every form on the site has this issue. The fix is straightforward—add <label for="inputId"> elements—but it needs to be applied everywhere.

None of the nine pages have a "skip to main content" link. Keyboard-only users must Tab through every header element, navigation link, and search bar before reaching the main content on every single page they visit. On pages with extensive navigation, this means dozens of Tab presses before reaching the first form field.

The Forgot Password page (ForgotPassword.aspx) is a legacy WebForms page with a fundamentally different architecture from the rest of the site. It uses table-based layout instead of CSS, has no <main> landmark, its page title is the generic site name rather than identifying the page purpose, and the CAPTCHA instructions reference visual content only ("Type the characters you see in the image").

This page alone accounts for 4 of the 10 Critical findings.

What the Reports Look Like

Each page gets a detailed report with findings organized by WCAG section, severity ratings, and proposed fixes. Here's what a findings table looks like:

#	Topic	WCAG	Level	Severity	Description	Proposed Fix
1	Missing Main Landmark	2.4.1	Required	High	No `<main>` landmark. Screen reader landmark navigation skips core content.	Wrap content in `<main>` element.
2	Missing Skip Link	2.4.1	Required	High	No skip link. First Tab stop is "Sign In" button.	Add skip link as first focusable element.
3	MERIT Help Dead Link	2.4.4	Required	High	`href="#"` goes nowhere—purpose is indiscernible.	Fix with valid URL or convert to button.

The executive summary rolls up all page findings with severity counts, per-section compliance rates, and prioritized remediation recommendations.

Where This Goes Next

What I built on a Saturday morning is a proof of concept. It demonstrates that AI agents can meaningfully evaluate web applications against WCAG standards and produce actionable findings. But it's a starting point, not a finished product.

Here's what becomes possible with further development, documented in the project's ENHANCEMENTS.md:

1. Exact Code Fixes

The current audit evaluates the rendered output without access to source code. Given access to the application source, the agents can produce exact fixes—file path, line number, and corrective code.

This works across technology stacks because the agents understand each framework's markup generation patterns. A WebForms fix looks different from an MVC fix, which looks different from a .NET Core fix. The agent knows that adding a <label> in WebForms means configuring the AssociatedControlID property on an <asp:Label> server control, while in MVC it means adding @Html.LabelFor() in the Razor view.

2. Azure Boards Integration

Given access to your organization's Azure DevOps project, the system can create detailed epics, stories, and tasks for resolving accessibility issues. Each finding becomes a work item with the WCAG reference, severity, affected pages, and remediation steps. Your team gets a structured backlog instead of a PDF to interpret.

3. Pull Request Review

Given access to Azure DevOps pull requests, the system can scan proposed changes for accessibility violations and post feedback directly on the PR. This catches new violations before they reach production—a shift-left approach that prevents accessibility debt from accumulating.

4. Automated Fixes

Given even more autonomy, the system can apply fixes and submit pull requests for team review. It can also update a submitted PR to address issues discovered during its own review. Your team reviews and approves rather than writes every fix from scratch.

5. Additional Standards

The same architecture extends beyond accessibility. Other evaluations can scan source code for OWASP Top 10 security issues, enforce coding standards across a codebase, or flag known anti-pattern coding practices. The pattern is general: define the standard, build specialist agents, run them against the codebase.

Why This Matters for Your Agency

Visibility Before Remediation

You can't fix what you can't see. The most common failure mode I observe is teams attempting to remediate without first understanding the scope. They pick a page, start fixing issues they notice, and six months later realize they've addressed 3% of the problem.

An automated audit gives you a complete inventory of compliance gaps across your entire portfolio in hours, not months. You see every Critical finding that's blocking real users right now, every High finding that's creating significant barriers, and every Medium and Low finding that needs attention eventually.

Prioritization That Makes Sense

Not all findings are equal. A CAPTCHA that blocks blind users from resetting their password is a fundamentally different problem than a footer link that could use a more descriptive text. The severity framework maps directly to action:

Severity	What It Means	Action
Critical	Completely blocks access for one or more disability groups	Fix immediately—people are being locked out right now
High	Significant barrier that makes features very difficult to use	Fix in the current sprint/cycle
Medium	Moderate impact or best practice with significant effect	Plan for the next cycle
Low	Best practice not followed, minor impact	Address as part of ongoing maintenance

This gives leadership a way to allocate resources based on impact rather than guessing.

Framework-Agnostic

The audit evaluates rendered HTML. It doesn't care if your page was generated by WebForms, MVC, Telerik, or hand-written markup. This means one process covers your entire portfolio—no framework-specific tooling, no source code access needed for the initial assessment.

Repeatable

Run the audit before remediation to establish a baseline. Run it again after fixes to verify compliance. Run it quarterly to catch regressions. The process produces comparable results every time, giving you a measurable compliance trajectory instead of subjective assessments.

Built on a Saturday Morning

This proof of concept went from a conversation over coffee to a working system producing real findings against a production application in a single morning. That's not because the problem is trivial—it's because agentic engineering workflows compress effort in ways that weren't possible even a year ago.

Imagine what a focused, multi-week engagement could deliver across your agency's full portfolio.

The Path Forward

The accessibility mandate is achievable. It requires the right approach, the right tools, and realistic expectations about what automated auditing can and can't do.

What AI Auditing Does Well

Systematically evaluates every applicable standard against every page
Catches site-wide issues (missing skip links, unlabeled forms) that manual review might inconsistently flag
Measures contrast ratios, validates heading hierarchies, checks ARIA attributes—the mechanical, repetitive work that humans do slowly and inconsistently
Produces structured, prioritized findings that development teams can act on directly

What Still Needs Human Judgment

Evaluating whether alt text meaningfully describes an image (an agent can verify alt text exists, but judging quality requires context)
Testing with actual screen readers and assistive technologies
Assessing cognitive load and readability
Evaluating touch interactions on mobile devices
Making remediation decisions when fixes involve design trade-offs

The right model is AI auditing for coverage and consistency, with human expertise for judgment calls and validation. The AI handles the 334-item checklist so your accessibility specialists can focus on the problems that require human insight.

Let's Talk

I build custom agentic engineering workflows that solve complex software problems. The accessibility audit demonstrated here is one application of a general approach: define the problem clearly, build specialized agents, and let them work systematically at a scale that human effort alone can't match.

If your agency is facing the accessibility mandate and your team is struggling with scope, legacy systems, or expertise gaps, I can help:

Audit your portfolio — run the accessibility evaluation against your public-facing and internal applications, across any framework, and deliver prioritized findings your team can act on
Build remediation workflows — create agents that understand your specific technology stack and produce exact code fixes, work items, and pull requests
Establish continuous compliance — integrate accessibility review into your development pipeline so new violations are caught before they reach production

The mandate is real. The deadline is real. But the path forward is clearer than you might think.

Get in touch to discuss what a targeted engagement could look like for your agency.

The full proof of concept source code is available at github.com/NotMyself/dcyf-accessibility. The audit results for the DCYF MERIT application are published at notmyself.github.io/dcyf-accessibility.