Build Smarter Analytics Assistants with Fabric Data Agents and Copilot Studio
📎 Slide deck: Build Smarter Analytics Assistants with Fabric Data Agents - Piotr Prussak.pdf
Speaker: Piotr Prussak — Data & AI Architect (PL-300, DP-600, DP-700, AI-102, CSPO)
Key Takeaways
- Modeling is the #1 lever for agent accuracy — not the AI model
- Column descriptions are the single highest-ROI grounding mechanism
- F2 is enough to start — Copilot/AI included on all paid SKUs since April 2025
- Build a golden test set (20–50 questions) BEFORE deploying to production
- Re-evaluate every 3 months — this space moves faster than enterprise planning cycles
- Pattern: Copilot Studio = orchestrator, Fabric Data Agent = domain expert
Session Roadmap
- Honest Caveats — what you need to know before investing time
- AI Solutions Landscape — Data Agents, Copilot Studio, and where they fit
- Setup, Prerequisites & Costs — what it takes to get started
- Solution Walkthroughs — three data scenarios, increasing complexity
- Deep Dives — modeling, schema design, grounding, and testing
- Decision Guides — take-home frameworks
Honest Caveats
Caveat #1: This Is Preview
- Fabric Data Agents + Copilot Studio integration is currently in preview
- New features ship monthly (MCP endpoints, M365 Copilot integration, ontology support all landed in last 6 months)
- SLAs, performance guarantees, full docs not yet final
- Build to learn, not to bet the farm — yet
Caveat #2: Microsoft Follows Adoption
- Microsoft invests in features that get used — mothballs what doesn't stick
- Examples of killed products: Cortana Intelligence Suite, Power BI Dataflows v1, Data Activator (unclear trajectory)
- If you adopt early, you influence the roadmap. If you wait, the feature may not survive.
Caveat #3: Set a 3-Month Horizon
- GPT-4o → GPT-4.1 → GPT-5 GA in Copilot Studio — three model generations in under two months
- Schedule a formal re-evaluation every 3 months
Caveat #4: This Is One Piece of the Toolkit
- Azure AI Foundry Agents — custom agents, code-first control
- Semantic Kernel — orchestration framework for complex AI workflows
- Microsoft 365 Copilot — end-user surface
- Copilot Studio — low-code agent builder/orchestrator
- MCP (Model Context Protocol) — emerging interop standard
AI Solutions Landscape
What Is a Fabric Data Agent?
- AI-powered assistants for natural language conversations about enterprise data
- Understands schema across lakehouses, warehouses, semantic models, KQL databases, ontologies
- Enforces governance — RLS, CLS, user permissions flow through automatically
- Stores conversation history across sessions
- Not just NL-to-SQL — reasons across multiple sources, maintains context
What Is Copilot Studio?
- Low-code platform for building custom AI agents with multi-agent orchestration
- Connected Agents — link Fabric Data Agents as specialized "experts"
- Multi-channel: Teams, web, M365 Copilot, custom apps
- Currently on GPT-5 GA with versioning controls
- Pattern: User asks in Teams → Copilot routes → Data Agent queries → grounded answer returns
Where Do These Fit?
| Option | When to Use |
|---|---|
| Native Copilot in Power BI | User is in a report, needs contextual Q&A |
| Fabric Data Agent (standalone) | Domain expert on specific dataset, analysts in Fabric |
| Data Agent + Copilot Studio | Multi-agent, mixed knowledge, deploy to Teams/web |
| Azure AI Foundry / Semantic Kernel | Full code-first control, custom RAG, complex workflows |
Start with the simplest option. Escalate complexity only when needed.
Setup, Prerequisites & Costs
Prerequisites Checklist
- F2+ capacity (or P1+ with Fabric enabled) — Copilot/AI included on all paid SKUs since April 2025
- Tenant settings: Fabric Data Agent, Cross-geo AI processing, XMLA endpoints, Standalone Copilot — all enabled
- At least one data source with data (Warehouse, Lakehouse, Semantic Model, KQL DB, or Ontology)
- Copilot Studio: same tenant, same account, M365 Copilot license
Authentication Mode (Critical Decision)
- User Authentication — queries run as end user (RLS enforced per user) ← right choice for enterprise
- Agent Author Authentication — queries run as author (simpler, but shared access)
Costs
| Item | Cost |
|---|---|
| F2 capacity | ~$262/month (Copilot included) |
| Copilot Studio PAYG | $0.01/credit |
| Copilot Studio prepaid | $200/tenant/month (25,000 credits) |
| M365 Copilot (authoring) | $30/user/month |
💡 F2 Copilot inclusion was a game-changer — many still think F64 is required.
Solution Walkthroughs
Solution A: Technical / Operational Data (Start here)
- Scenario: Fabric Capacity Metrics / FUAM as data source
- Well-structured, narrow domain, numeric-heavy, low ambiguity
- Example queries: peak CU usage, workspace consumption, failed refreshes
- Why this works: Schema is self-descriptive, questions map to single-table aggregations
- MVP path — use your own capacity data, no setup needed, immediate relevance
Solution B: Business Data — Complex Schema (What goes wrong)
- Scenario: Wide World Importers (many-to-many, SCD patterns, multi-granularity facts, self-referencing hierarchies)
- Common failures:
- Ambiguous joins — agent picks wrong path through M:M
- Temporal confusion — doesn't know which date = "current"
- Granularity mismatch — aggregates at wrong level
- Hallucinated columns — invents column names that sound right
- Over-joining — joins 6 tables when answer was in one
- The lesson: Raw complex schemas are hostile to AI agents. The agent isn't broken — the schema was never designed for this consumer.
Solution C: Business Data — Simplified Schema (Same data, modeled right)
- Same questions as Solution B — now they work
- Changes made: star schema, descriptive column names, column descriptions populated, bridge tables hidden, SCD abstracted into "current" views
- The punchline: The agent didn't get smarter. The data got clearer.
Modeling is the prerequisite for production-quality agent responses.
Deep Dives
Data Agent Modeling
- Agent sees: table names, column names, data types, relationships, descriptions
- Agent does NOT see data values unless it queries them
- Best practices:
- Descriptive, unambiguous names — avoid abbreviations
- Define explicit foreign keys and cardinality
- Write rich column descriptions — single highest-ROI grounding mechanism
- Hide internal/technical columns
- Use meta-prompting: ask the agent to generate its own instructions from the schema
- Prefer calculated columns over measures for values agents need to filter/group on
Schema Design
- Denormalize strategically — flatten M:M into bridge-free views
- Resolve SCD ambiguity — create "current" views alongside history
- Eliminate field name collisions ("date" in 12 tables — which one?)
- Separate concerns — one semantic model per bounded domain
- Agent instructions capped at 15,000 characters — be concise
- Define what the agent SHOULD and SHOULD NOT answer
Getting Grounded Responses
- Grounding = responses tied to actual data, not hallucinated
- Key mechanisms: agent instructions, agent descriptions, column descriptions, semantic model layer
- Goal: not "always answers" — it's "answers correctly or says it can't"
Testing Patterns
- Build golden question set (20–50 questions with known correct answers) BEFORE production
- Run after every model update, schema change, or instruction edit
- Test for: correct answers, graceful refusal on out-of-scope, consistency across phrasings, concurrent load
- Copilot Studio supports side-by-side agent version comparison — use it
Decision Guides
Semantic Model vs. Direct SQL/Lakehouse
| Use Semantic Model when... | Go Direct to SQL/Lakehouse when... |
|---|---|
| Business logic in DAX measures | Exploratory / ad-hoc (data science) |
| Need consistent calculations | Schema is simple + self-descriptive |
| RLS/CLS already defined | Data not yet modeled (raw ingestion) |
| Well-bounded domain | Performance requires engine pushdown |
Copilot Studio vs. Native Fabric Copilot
- Native: users already in reports, contextual Q&A, no custom orchestration
- Studio: multi-agent, custom topics/triggers, Teams/web deployment, mixed knowledge sources
Signs You're NOT Ready to Deploy
- ✗ No golden question test set
- ✗ No column descriptions in semantic model or schema
- ✗ No clear domain boundary ("it should answer everything")
- ✗ No executive sponsor who understands "this is preview"
- ✗ No plan for monitoring responses in production
- ✗ No defined escalation path for wrong answers
If more than two apply, invest in readiness before deployment.
Five Things to Do Monday Morning
- Verify Fabric tenant settings — enable Data Agents, Copilot, XMLA endpoints
- Build one data agent on capacity metrics — prove the platform works
- Audit one production semantic model — add column descriptions, check naming clarity
- Write 20 golden test questions for your most likely agent domain
- Schedule a 3-month re-evaluation checkpoint (next: June 2026)