AI insights
-
What is the main shift this article says UX designers are facing in the age of AI?
The article says the central question has moved from "which tool should I learn?" to "what is my role worth when the work becomes easier to generate but harder to trust?" Its main takeaway is a shift from proving value by showing artifacts like wireframes and polished flows toward a role centered on judgment and trust.
Topic focus: Core Claim -
Why is evaluation literacy becoming such an important skill in design?
The article frames evaluation literacy as critical because AI makes outputs easier to generate but harder to trust. A related post reinforces this by warning that fluent AI output can look complete even when the evidence is thin, so designers need stronger judgment to assess quality and assumptions.
-
Does this article suggest UX design is disappearing because of AI?
No. The primary article describes a change in where value comes from, not the end of the role, and the related career post says plainly that "UX is not going anywhere" even though it is changing. Together, they point to a redesign of the job around how designers think and evaluate, not a wholesale replacement.
-
What key idea should readers understand about the designer’s role as AI tools improve?
A key term in the primary article is the move from "artifact maker" to "AI system orchestrator," meaning value shifts away from producing static deliverables alone. The systems-focused secondary article echoes this by saying patterns are cheap and judgment is not, and that design has moved toward systems, tokens, and code.
Topic focus: Data Point -
What concrete time-based evidence or scope does the author give for this perspective?
The author says the article comes out of the last two years of hands-on project work, along with many conversations with design leaders, researchers, mentees, and product professionals. That makes the piece a synthesis of recent practical experience rather than a purely theoretical opinion.
-
What practical habit can help teams avoid trusting AI output too quickly?
A useful practice is to slow down long enough to test assumptions behind polished AI-generated work rather than accepting fluency as proof. The leadership article gives a concrete routine: 15–20 minutes of structured clarity as "speed insurance" to prevent confident mistakes from outrunning understanding.
Topic focus: Pitfall -
What should I read next if I want to go deeper on how UX work is changing beyond static deliverables?
A strong next read is "The Spiral Climbs: Ideas Are Expensive, Systems Are Cheap" because it closely extends the primary article’s argument about value moving beyond artifacts. It adds concrete framing: idea-to-code distance is now short, live artifacts beat static decks, and design work is increasingly happening in systems rather than repeated manual production.
Topic focus: Data Point
- Design value is shifting from making screens and flows to deciding when AI should be used, how it is governed, and how trust is maintained.
- Because AI systems are probabilistic, good UX now depends less on perfect paths and more on recovery, transparency, reversibility, and oversight when outputs are uncertain or wrong.
- The article argues that evaluation literacy is the key skill, meaning designers can define quality, test behavior, spot failure patterns, and improve systems over time.
- This work includes creating clear rubrics, building realistic test cases, tracking signals like corrections and escalations, and turning failures into better releases.
- Market data suggests routine production work is being automated while AI-skilled workers earn more, so artifact creation alone is becoming less valuable.
- In practice, evaluation is becoming a release discipline tied to quality gates, governance, compliance, and cross-functional accountability, not a post-launch extra.
Why Evaluation Literacy Is Becoming the Most Valuable AI Skill in Design
Before we get into it, a quick note: this is a long read.
It comes out of the last two years of hands-on project work, plus many conversations with design leaders, researchers, mentees, and product professionals trying to make sense of the same shift from different angles. I have watched the questions change in real time.
Less: which tool should I learn?
More: what is my role worth when the work becomes easier to generate but harder to trust?
That is the question underneath this article.

There was a time when a designer could prove their value by showing the work.
The wireframes. The polished flows. The deck full of user quotes. The neat little system of components and annotations that made a messy product look under control.
That time is ending.
Not because design is becoming less important. Because the center of gravity is moving.
AI can now generate screens, draft journeys, summarize research, write interface copy, scaffold code, and produce endless variants before lunch. Recent survey data points to the same conclusion from multiple angles: the 2025 State of AI in Design report says 89% of designers report AI has improved their workflow, while 96% say they developed AI skills through self-directed learning. Maze’s 2026 Future of User Research report adds a related signal from research teams: 69% now use AI in at least some part of their workflow. That tells you both how quickly adoption has moved and how immature the operating model still is. That changes the economics of execution. It does not eliminate the need for design. It changes what design is paid to do.
The new value is upstream. It lives in deciding when AI should be used at all, where automation stops, how trust is earned, how failure is handled, how behavior is measured, and who is accountable when a system acts wrong with confidence.
That is the shift most teams still have not fully absorbed.
The future of UX is not artifact creation at a higher speed. It is AI system orchestration. And the practical skill that sits at the center of that shift is evaluation literacy.
Not prompting in isolation. Not tool fluency. Not another carousel of AI-generated mockups.
Evaluation literacy is the ability to define what good looks like, test whether the system is doing it, catch the ways it fails, and improve behavior over time.
That is where design starts becoming strategic again.
The real shift: from deterministic software to probabilistic systems
Most UX methods were built for software that behaved like a machine. Input went in, output came out, and the designer’s job was to reduce friction, eliminate ambiguity, and make the path clear.
AI systems break that bargain.
Traditional software is deterministic. If the logic is sound, the same input should produce the same output every time. Correctness is mostly binary. Error handling is usually about edge cases, bugs, or user mistakes.
AI systems are different. They are probabilistic. They generate responses based on statistical inference, context, and model behavior that can shift over time. The same input can lead to slightly different outputs. Sometimes the output is useful but incomplete. Sometimes it is polished and wrong. Sometimes it looks certain when it should hesitate. Model drift and latent bias make that uncertainty harder to manage over time. That same shift also shows up in current evaluation guidance: generative systems are variable by nature, which is why strong AI products need explicit recovery paths, evaluation loops, and better signals for uncertainty instead of pretending every answer is final.
For teams working in regulated or high-risk contexts, this is not just a design problem. It is increasingly a literacy, oversight, and compliance problem, too.
In a deterministic product, the designer optimizes control. In a probabilistic product, the designer must manage uncertainty.
This is not a semantic difference. It changes what “good UX” means.
| Dimension | Deterministic UX | Probabilistic UX |
|---|---|---|
| Core logic | Rules-based; the same input should produce the same output | Statistical inference: the same input can produce different outputs |
| Definition of correctness | Mostly binary: right or wrong | Contextual: useful, partial, uncertain, or wrong |
| Primary design goal | Control, clarity, and friction reduction | Uncertainty management, recovery, and governability |
| Main failure pattern | Bugs, broken logic, and obvious edge cases | Hallucinations, drift, bias, overconfidence, and weak handoffs |
| User trust model | Trust comes from consistency and predictability | Trust comes from transparency, reversibility, and oversight |
| Role of the designer | Map the ideal path and reduce deviation | Shape behavior under ambiguity and design for recovery |
| Key interface patterns | Validation, guardrails, and error prevention | Confidence gradients, undo, escalation, and feedback loops |
| What measurement looks like | Task success, drop-off, completion, usability issues | Correction rates, retries, escalations, drift, and quality over time |
| What good looks like | Smooth, predictable, efficient interaction | Explainable, recoverable, governable interaction |
| Operating model implication | Design reviews and release checks are often enough | Continuous evaluation, instrumentation, and governance are required |
A good AI experience is not one that feels magical for thirty seconds in a demo. It is one that remains understandable, recoverable, and governable when the model behaves like a model instead of a brochure.
That means new design primitives start to matter:
- confidence gradients instead of false certainty;
- reversibility instead of one-way automation;
- escalation paths instead of fake self-sufficiency;
- telemetry instead of intuition alone;
- policy and governance instead of hand-wavy trust language.
Designers used to map ideal paths. Now they need to shape systems that remain usable under ambiguity.
That is a harder job. It is also a more valuable one.
Why artifact creation is being repriced
The market is already paying for the shift
If you misunderstand what AI is automating, you will misunderstand where your role needs to go.
A lot of visible design labor is becoming cheaper.
AI can already help produce flows, summarize calls, cluster notes, generate components, rewrite copy, and convert prompts into rough prototypes. It can compress the time between idea and artifact so aggressively that many activities once treated as proof of seniority are starting to look like baseline production tasks.
That does not mean craft is dead. It means craft alone is no longer enough.
The old model rewarded people for making the artifact. The new model increasingly rewards people for defining the system behind the artifact.
| Dimension | Old design value stack | New design value stack |
|---|---|---|
| Primary source of value | Producing artifacts | Shaping system behavior |
| What gets rewarded | Wireframes, flows, mockups, polished deliverables | Judgment, orchestration, evaluation, and governance |
| Core unit of work | Screen, flow, component, deck | Behavior, system, policy, and quality loop |
| Definition of seniority | Better craft, faster output, cleaner handoff | Better decisions, stronger operating models, clearer accountability |
| Relationship to research | Gather findings and synthesize insights | Turn insights into rubrics, datasets, measures, and action |
| Relationship to engineering | Handoff and collaboration | Ongoing instrumentation, evaluation, and system stewardship |
| Role of design systems | Consistency and reuse | Constraint engine for quality, compliance, and AI output |
| Trust and risk | Often handled downstream or separately | Designed into the product through controls and oversight |
| Accessibility | Frequently audited late | Embedded as a structural requirement from the start |
| What makes someone hard to replace | Taste, craft, and execution speed | Evidence-backed judgment, cross-functional influence, and the ability to govern uncertainty |
That includes questions like:
- Should this workflow be automated or augmented?
- What is the acceptable failure rate?
- What should the system do when confidence is low?
- When should a human be required to intervene?
- How do we know whether the AI is actually improving the customer outcome?
- What metrics tell us about whether trust is increasing or collapsing?
- What compliance threshold must be met before this ship?
Those are design questions now. Serious ones.
The teams that still treat AI as a speed layer on top of old UX practice will create more output. The teams that treat AI behavior as a governed product surface will create better products.
The economic signal is already visible. PwC’s 2025 AI Jobs Barometer reports that workers with AI skills commanded a 56% wage premium in 2024. Indeed Hiring Lab reports that the share of U.S. job postings referencing GenAI more than tripled between September 2023 and September 2024. Autodesk’s 2025 AI Jobs Report adds a second signal: mentions of AI skills in general U.S. job listings rose 56.1% year to date through April 2025. Routine production work is being pushed downward toward automation, while AI-literate workers are being repriced upward. In practical terms, the market is beginning to pay more for the designer who can evaluate, govern, and operationalize AI than for the designer who can merely produce more screens faster.
There is a difference.
Evaluation literacy is the hinge skill
From design taste to release criteria
Every market shift creates a lot of noise. People latch onto the loudest visible skill and mistake it for the durable one.
That happened with prompt engineering.
Prompting matters. It is part of the work. But prompting by itself is becoming a weak moat. The stronger differentiator is the ability to evaluate and improve AI behavior in production conditions.
It means treating model behavior the way strong product teams treat any critical product surface: something to define, test, observe, measure, and improve deliberately.
A designer or researcher with evaluation literacy can do five things that many AI-assisted practitioners still cannot:
- They can define a rubric. Not vague taste. Actual criteria. Groundedness. Relevance. Tone. Refusal quality. Escalation logic. Time saved. Error severity. Trust impact.
- They can build or shape a dataset. Not a giant benchmark from nowhere. A working set of realistic tasks, edge cases, and failure examples drawn from actual product use.
- They can instrument behavior. They know which signals matter: correction rates, undo rates, successful completion, abandonment, escalation to human support, time to confidence, and time to recovery.
- They can read failure patterns. They do not stop at “the AI was wrong.” They distinguish between grounding failure, reasoning failure, tone failure, unsafe action, broken handoff, inaccessible output, compliance drift, or over-automation.
- They can turn those failures into the next iteration. That is the real work. Not admiring the demo. Tightening the loop.
This is why evaluation literacy is becoming the most valuable AI skill in design. It converts AI from a flashy generator of possibilities into a governable system that can earn trust over time.
In enterprise environments, this is where the argument becomes operational. Evaluation literacy becomes a release discipline. It belongs inside the delivery pipeline, tied to definitions of done, quality gates, and audit trails. A serious AI feature should move through the same kind of operational scrutiny that teams already expect for security, performance, and privacy. OpenAI’s current evaluation guidance makes the same point directly: serious teams need eval-driven development, with datasets, regression tests, trace-based review, and explicit criteria becoming part of the operating model rather than an afterthought. That is the difference between a convincing demo and a governable product surface.
Evaluation is no longer a side exercise after design. It is a governance milestone before release.
It also creates a common language across design, research, product, engineering, compliance, and leadership. That matters more than people think.
When a team cannot agree on what counts as good, it cannot scale quality. When it cannot scale quality, it cannot safely scale AI.
The designer’s new operating model: design, instrument, evaluate, govern
Where teams break
One reason teams get stuck is that they try to bolt AI onto a pre-AI workflow. The roles, rituals, and definitions of done stay the same while the product itself becomes less predictable.
That does not hold.
The sequence of design, instrumentation, evaluation, and governance is useful conceptually, but enterprise teams cannot afford to treat it as slow, separate phases. At scale, they have to become concurrent and operationalized inside the pipeline.
A better model looks like this:
1. DESIGN the behavior and the control model
This is where human judgment starts. You decide whether AI belongs in the workflow, what the user is actually trying to accomplish, what the system should attempt autonomously, and where human control must remain obvious.
That includes prompt-as-interface writing, interaction design for agents, tool visibility, fallback behavior, and failure recovery. It also includes saying no when AI is the wrong pattern.
2. INSTRUMENT the behavior
If you cannot observe it, you cannot improve it.
Instrumentation means building the event structure and metrics that tell you how the AI feature behaves in the wild. Not just output quality, but user behavior around that output. Do users accept it, edit it, retry it, escalate, or abandon the flow after it acts?
Without that layer, you are designing based on theater.
3. EVALUATE against enterprise rubrics
This is where the work gets serious.
You need test sets, rubrics, regression checks, scenario coverage, edge cases, human review, and ongoing comparison between what the system was expected to do and what it actually did.
And the rubric cannot stop at usefulness. It has to include accessibility, inclusion, policy compliance, and failure severity. Good AI behavior is not merely helpful. It must also be compliant, reversible, explainable, and safe enough for the context.
That means AI-generated UI and AI-driven actions should be audited for accessibility in the same way teams audit for security and privacy. If the output fails WCAG expectations, breaks semantic structure, weakens keyboard flow, or creates inaccessible states, then it is not high-quality output. It is a release risk. This is already moving from theory to automated governance workflows. Miro says teams using its automated design-system governance workflows report 60% less time on design review and save an average of two hours per week on compliance and accessibility checks. At the standards level, WCAG 2.2 and the W3C Design Tokens work make the bigger point: accessibility and system structure cannot be bolted on after generation. They have to be embedded in the system that generates the output.
Evaluation is not a one-time gate. It is an operating discipline.
In strong DesignOps environments, the evaluation layer feeds directly into CI/CD. Prompts, components, flows, and model updates should trigger regression checks, rubric-based reviews, accessibility validation, and risk-tier approvals before release. The point is not to slow delivery. It is to prevent teams from shipping polished uncertainty with no evidence behind it.
4. GOVERN the behavior
As soon as AI moves from generating content to taking actions, governance stops being optional.
Governance includes risk classification, approval thresholds, auditability, privacy constraints, incident response, role clarity, and explicit boundaries around agency. It is product behavior translated into operating rules.
Operationally, that means model changes trigger review, high-risk actions require human approval, accessibility and compliance failures block release, and teams keep an evidence trail of what changed, what passed, what failed, and who signed off.
That is why governance is increasingly a design concern. Because the user experiences those rules as part of the product.
Policy is no longer outside the interface. Policy is the interface.
The durable skill stack for the next phase of UX
Think in layers, not shopping lists
People keep asking which AI skills to learn as if the answer were a shopping list. It is more useful to think in layers.
The durable skill stack looks something like this.
1. AI literacy and pattern fit
You need to understand what different models are good at, where they fail, what they cost, and when they should not be used. This is not technical purity. It is a product judgment.
A designer who cannot distinguish between a deterministic rule and a probabilistic guess will make expensive mistakes.
2. Designing AI behaviors
This includes conversational flows, multi-turn repair, handoff moments, confidence communication, reversible actions, and clear boundaries between suggestion and action.
The quality of AI UX is often less about the first answer than the recovery path after the answer is incomplete, wrong, or overconfident.
3. Evaluation literacy
This is the hinge skill. It links design to evidence. It turns opinion into an observable quality. It creates a bridge between research insight and product behavior.
4. Data and instrumentation literacy
You do not need to become a data scientist. But you do need to think in metrics, event structures, and evidence loops. If AI is changing behavior at the system level, designers and researchers need to see the behavior at the system level.
5. Governance, risk, and compliance fluency
Privacy, bias, escalation logic, disclosure, auditability, accessibility, consent, and accountability are no longer abstract concerns delegated to another team downstream. They shape the interaction model directly.
The enterprise designer who cannot speak in terms of controls, evidence, and auditability will struggle to influence what actually ships.
6. AI-ready design systems
Design systems are evolving from consistency libraries into machine-readable constraint engines. Semantic naming, tokens, structured patterns, accessibility requirements, and design-to-code governance now affect how reliably AI can generate, adapt, and scale interface output.
That includes structural accessibility. Contrast rules, semantic roles, focus behavior, component states, error handling, and content patterns cannot live as optional cleanup. If those constraints are not embedded in the system, the model will generate faster than the team can govern.
In other words, the design system is becoming part operating system, part policy layer. It no longer just speeds up production. It constrains what acceptable output looks like across brand, accessibility, and compliance. The ecosystem is already moving that way. The W3C Design Tokens Community Group published its first stable specification in October 2025, and Figma’s MCP server brings design context directly into agentic workflows. The direction is clear: design systems are becoming more machine-readable, more enforceable, and more central to the quality of AI output. In enterprise settings, that also means they become part of the compliance layer.
Messy systems produce messy AI outputs. That sentence will age well.
7. Systems thinking and organizational influence
The best AI design work is rarely isolated to a screen. It lives across product, engineering, data, legal, support, security, and leadership. Designers who can work across that mess will become more valuable than those who remain trapped in deliverables.
The rise of agent experience and machine experience
The API is becoming part of the experience
A lot of design thinking still assumes the user is a person staring at a screen. That assumption is getting weaker.
We are entering an era where products are increasingly used, filtered, negotiated, or interpreted by agents. Some will act on behalf of the business. Some will act on behalf of the customer. Some will mediate between the two.
This is where the shift from UX to AX becomes useful. Sometimes the experience is still screen-based. Sometimes it becomes headless, agent-mediated, or effectively agent-to-agent.
That means the interface is no longer the full product.
The structure underneath starts mattering more: semantic clarity, API predictability, machine-readable systems, permission logic, headless architectures, and rules that agents can navigate without guessing.
That structure also carries accessibility. A system that is semantically weak, structurally inconsistent, or careless about states and labels does not just confuse assistive technology. It also creates ambiguity for the model and fragility in the agent workflow.
This is where machine experience stops being a side note and becomes a structural requirement.
For years, design could hide complexity behind elegant visuals. Agents are less forgiving. They need structure, explicit intent, and stable rules. A hidden action inside a clever interface might delight a human. It can completely fail an agent.
That is why the role is evolving from maker to curator of programmatic intent. Anthropic’s introduction to the Model Context Protocol and Figma’s MCP documentation both point in the same direction: semantic structure, headless architecture, API predictability, and permissions logic are becoming part of the product experience itself.
The strongest way to understand this shift is through a centaur model of intelligence. Humans still frame the problem, set boundaries, interpret meaning, and carry accountability. Machines handle speed, recall, pattern processing, and execution at scale. Good orchestration does not mean handing the work over. It means designing the partnership so that machine action remains legible, governable, and subordinate to human judgment.
In that world, the API is not a backend detail. It is part of the experience architecture.
Design value is moving below the visible surface.
That does not make interface design irrelevant. It means interface design is no longer the only canvas that matters.
The entry-level problem nobody should ignore
The apprenticeship ladder is part of the system
Every profession tells itself it can automate the junior work and keep the senior thinking. That fantasy eventually runs into a pipeline problem.
Design is now running into it.
If AI handles more of the production tasks that once trained junior designers and researchers, then the old apprenticeship ladder starts to collapse. Teams save time in the short term while quietly starving the future supply of people who know how to reason their way through complexity. The labor-market evidence is hard to ignore: SignalFire reports that Big Tech new-grad hires fell 25% from 2023 to 2024, Stanford Digital Economy Lab researchers found that employment for software developers aged 22 to 25 fell nearly 20% from its late-2022 peak by September 2025, and Hult International Business School reported that 37% of leaders would rather hire AI than a recent graduate. This is not just a hiring problem. It is a capability pipeline problem.
This matters for leaders and mid-career practitioners alike.
The answer is not to protect old workflows out of nostalgia. The answer is to redesign the apprenticeship model.
Junior talent should spend less time polishing artifacts no one remembers and more time learning how to verify AI output, identify failure modes, map risk, build evidence trails, and explain tradeoffs.
Senior talent should spend less time acting as bottleneck reviewers and more time codifying quality through prompt and model curation, scorecards, heuristics, evaluation rubrics, governance rules, and eval-driven development.
That is a better apprenticeship for the next decade anyway.
The people who learn to audit, explain, and improve AI systems early will likely become stronger strategists later.
What design leaders should do now
Build the governance model before the scale model
This shift is not just about personal skill development. It is about operating model design.
If you lead a design team, here are the moves that matter most.
- Stop measuring progress only through output volume. Faster production is useful, but it is not enough. Add metrics around correction, trust, recovery, accessibility conformance, and quality over time.
- Redefine roles around capability coverage. You need some mix of AI product literacy, agent interaction design, evaluation, governance, instrumentation, accessibility, and system stewardship. Do not assume the old role titles cover that work.
- Make design systems machine-readable and enforceable. If AI is now part of your production flow, your system needs stronger semantic discipline than before.
- Formalize evaluation rituals inside delivery. Build review cadences around datasets, rubrics, incident analysis, accessibility failures, drift, and real-world behavior instead of just screen reviews.
- Treat accessibility as a release criterion, not a cleanup task. If AI-generated UI or AI-supported flows create barriers, the system is not ready.
- Teach teams to think in risk tiers. Low-risk automation, medium-risk suggestion, high-risk human approval. This should not be hidden in policy decks. It should show up in the product logic.
- Design for auditability. Teams need evidence trails: what changed, which rubric was applied, which thresholds were met, what was escalated, and who approved release. Without that, governance becomes theater.
- Align the work with emerging regulatory and risk frameworks. NIST-style controls, enterprise audit expectations, and AI accountability requirements are not abstract legal concerns. They shape what responsible design leadership looks like now.
- Protect human accountability. AI can accelerate execution. It cannot carry responsibility for what your company ships.
- Redesign junior development. Move early-career talent toward verification, analysis, and structured judgment instead of repetitive production work alone.
What mid-career designers and researchers should do now
Build proof, not panic
The market is noisy. The temptation is to either panic or over-index on tools.
Both are mistakes.
A better path is to build proof.
If you are mid-career, you do not need to become an ML engineer. You do need to become harder to flatten.
The simplest way to do that is to create evidence that you can operate at the system level. The strongest portfolio move is still the simplest: build one proof point that shows you can design, instrument, and evaluate an AI feature end to end, even as a prototype.
Build one project that shows you can:
- frame a problem where AI may or may not belong;
- design an AI behavior with clear user control;
- define a rubric for success and failure;
- instrument a few meaningful signals;
- run a small evaluation loop;
- produce a lightweight governance or risk artifact;
- show how accessibility and inclusion were treated as design constraints, not cleanup work.
That single project will say more about your future value than ten AI-generated screens.
You should also strengthen the parts of your judgment AI does not replace well: systems thinking, decision framing, research interpretation, service design, stakeholder alignment, and the ability to explain tradeoffs without hiding behind jargon.
This is not romantic humanism. It is market positioning.
A better way to talk about the future of UX
Avoid the two bad stories
A lot of writing about AI and design swings between two bad extremes.
One says everything is changing, and no one is safe. The other says nothing important is changing, and good design will always win.
Neither is useful.
The truth is more specific.
Some parts of UX are being commoditized.
Some parts are becoming more strategic.
Some roles will narrow.
Some will split.
Some people will get trapped in faster execution.
Some will move into orchestration.
That is the actual shift.
The future likely belongs to designers and researchers who can do three things at once:
- understand human behavior;
- shape machine behavior; and
- build the evidence loop that keeps the system honest.
That is a bigger job than traditional UX. It is also closer to where design should have been heading all along.
The strongest supporting evidence for that claim is not philosophical. It is operational: Maze’s 2026 Future of User Research report says 69% of research teams now use AI in at least some portion of their work, while vendors like Dovetail and Marvin are openly positioning AI as a way to cut analysis time and return hours to researchers. The execution layer is speeding up. The value of judgment is rising with it. When synthesis gets cheaper, interpretation becomes the real leverage.
Conclusion
Design is not disappearing into AI.
It is being asked to grow up.
The craft still matters. The interface still matters. The story still matters. But they are no longer enough on their own.
The work now is to turn uncertain systems into usable ones, impressive outputs into accountable behavior, and AI from a generator of artifacts into a component of products people can actually trust.
That is why the old title of “maker” is starting to feel too small.
The next valuable designer is not just someone who can produce more. It is someone who can define, measure, and improve the behavior of systems that do not behave the same way twice.
That is the job. Evaluation literacy is how you start proving you can do it.
“How do I help someone know whether it was done well?”
— John Maeda
What kind of experience will your team be known for when the product starts acting on its own?
Key takeaways
- AI is shifting UX from artifact production toward system orchestration, where behavior, measurement, and governance matter more than output volume alone.
- Evaluation literacy is becoming the hinge skill because it turns probabilistic AI behavior into something teams can test, improve, and trust.
- The designers who gain leverage now will be the ones who can design AI behaviors, instrument them, evaluate them, and govern them responsibly.
Selected evidence and endnotes
- Economic repricing of AI capability
- PwC — 2025 Global AI Jobs Barometer — https://www.pwc.com/gx/en/services/ai/ai-jobs-barometer.html
- Indeed Hiring Lab — Rapid Growth in GenAI Job Postings: Expectations and Surprises — https://www.hiringlab.org/2024/11/21/growth-in-ai-job-postings-trends-and-surprises/
- Autodesk — 2025 AI Jobs Trends Report (PDF) — https://adsknews.autodesk.com/app/uploads/2025/06/2025-Autodesk-AI-Jobs-Trends-Report-AI-The-Future-of-Work.pdf
- Adoption and capability gap
- State of AI in Design 2025 — https://www.stateofaidesign.com/
- Foundation Capital — The State of AI in Design: What we learned from 400+ designers — https://foundationcapital.com/ideas/the-state-of-ai-in-design-what-we-learned-from-400-designers
- Human-centered AI and risk management
- Google PAIR — People + AI Guidebook — https://pair.withgoogle.com/guidebook/
- NIST — AI Risk Management Framework — https://www.nist.gov/itl/ai-risk-management-framework
- NIST — Generative AI Profile (NIST AI 600-1) — https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence
- Evaluation literacy and eval-driven development
- OpenAI — Evaluation best practices — https://developers.openai.com/api/docs/guides/evaluation-best-practices/
- OpenAI — Agent evals — https://developers.openai.com/api/docs/guides/agent-evals/
- OpenAI — Evals guide — https://developers.openai.com/api/docs/guides/evals/
- Accessibility as a release constraint
- W3C — Web Content Accessibility Guidelines (WCAG) 2.2 — https://www.w3.org/TR/WCAG22/
- W3C WAI — WCAG Overview — https://www.w3.org/WAI/standards-guidelines/wcag/
- NIST AI RMF 1.0 (PDF) — https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf
- Design systems as AI infrastructure
- W3C Design Tokens Community Group — Design Tokens specification reaches first stable version — https://www.w3.org/community/design-tokens/2025/10/28/design-tokens-specification-reaches-first-stable-version/
- Design Tokens Format Module 2025.10 — https://www.designtokens.org/TR/2025.10/format/
- Figma — Introducing our MCP server: Bringing Figma into your workflow — https://www.figma.com/blog/introducing-figma-mcp-server/
- Figma — Design Context, Everywhere You Build — https://www.figma.com/blog/design-context-everywhere-you-build/
- Machine Experience, agents, and MCP
- Anthropic — Introducing the Model Context Protocol — https://www.anthropic.com/news/model-context-protocol
- Model Context Protocol — Official docs — https://modelcontextprotocol.io/docs/getting-started/intro
- Figma Developer Docs — Figma MCP server — https://developers.figma.com/docs/figma-mcp-server/
- Research acceleration and the rising premium on judgment
- Maze — The Future of User Research Report 2026 — https://maze.co/resources/user-research-report/
- Maze Blog — The Future of User Research Report 2026: Trends & Insights — https://maze.co/blog/future-user-research-2026/
- Dovetail — AI Analysis — https://dovetail.com/product/ai-analysis/
- HeyMarvin — AI in User Research — https://heymarvin.com/resources/ai-in-ux-research
- AssemblyAI — How AI helps Marvin’s users spend 60% less time analyzing research data — https://www.assemblyai.com/blog/ai-helps-marvins-users-spend-60-less-time-analyzing-research-data
- DesignOps, governance, and compliance workflows
- Miro — Design System Governance: How to Keep Design and Code in Sync — https://miro.com/research-and-design/design-system-governance/
- Miro — AI Playbook for Design System Governance — https://miro.com/ai-playbooks/design-system-governance/
- Miro Blog — The new governance and compliance framework for AI agents — https://miro.com/blog/agent-automation-development-lifecycle/
- Entry-level disruption and the apprenticeship problem
- SignalFire — The SignalFire State of Tech Talent Report 2025 — https://www.signalfire.com/blog/signalfire-state-of-talent-report-2025
- Stanford Digital Economy Lab — Canaries in the Coal Mine? (PDF) — https://digitaleconomy.stanford.edu/app/uploads/2025/11/CanariesintheCoalMine_Nov25.pdf
- Stanford Digital Economy Lab — Canaries in the Coal Mine? publication page — https://digitaleconomy.stanford.edu/publication/canaries-in-the-coal-mine-six-facts-about-the-recent-employment-effects-of-artificial-intelligence/
- Hult International Business School — New survey reveals traditional undergraduate education is failing to prepare students for the future of work — https://www.hult.edu/blog/wi_skills_survey/






You must be logged in to post a comment.