Linking enterprise data estates to Fabric for OneLake & AI: Real World Case studies
Description
A key challenge in getting your data estate ready for using Fabric's unified capabilities is eliminating data silos and complexity. That requires implementing novel design patterns for connecting, capturing, transforming, and delivering data pipelines from existing applications and legacy platforms into Fabric for modernizing your data estate.
In this session, Codin Pora, VP of Partner Technology at Striim, presents a set of real-world, technical use cases drawn from enterprise environments across retail, aviation, and financial services that show how teams embed mirroring, continuous streaming, CDC, transformations, data cleansing, in-line AI, and governance to deliver data pipelines into Microsoft Fabric for building operational, analytical, and AI workloads.
Key Takeaways
- How we redesigned CITGO’s data platform on Microsoft Fabric balancing control and autonomy.
- How using domain-based workspaces, OneLake zero-copy access, CI/CD automation, and layered security,
- Key business and technical requirements
- Avoid Business units duplicating enterprise data
- One domain per business unit
- Dev: experimentation sandbox
- Multiple workspaces per domain
My Notes
Action Items
- [ ]
Resources & Links
Slides
Building a Zero-Copy Enterprise Data
Platform
Built on Microsoft Fabric. Architected by LevelShift.
Session Details
Title
Building a Zero-Copy Enterprise
Data Platform for CITGO
Time and Venue
19 March | 1:00PM - 1:20PM
Expo Theater
Hemanth Gaddale
Siva Pamarti
Program Manager, LevelShift
Technology Strategist, CITGO
Session highlights:
- How we redesigned CITGO’s data platform on Microsoft Fabric balancing control and autonomy.
- How using domain-based workspaces, OneLake zero-copy access, CI/CD automation, and layered security,
eliminated duplication, and scaled governed self-service
CITGO’s needs for Advanced Analytics
Key business and technical requirements
Data Management
Platform Capabilities
•
No copies of enterprise Synapse data
•
Self-service artifact management in Git repos
•
IT to control enterprise data distribution
•
Multiple environments: Dev, Test, Prod
•
Avoid Business units duplicating enterprise data
•
Self-service promotion between environments
•
Flexibility for Business units to bring data from
•
Analytics and reporting on enterprise data
non-enterprise sources
•
ML experimentation and model comparison
Have centralized enterprise data distribution model
•
Implement Governance
•
Solution Architecture
Four-pillar architecture for enterprise self-service
Pillar 1: Segmentation Principles
Enterprise Domain • Business Unit Domains • Environment Workspaces • Capacity Pools • Access Boundaries
Enterprise Domain
Business Unit Domains
Environment Workspaces
IT Team Controlled
Light Oils, Refining, Lubes
Dev, Test, Prod per Domain
• One domain per business unit
• Dev: experimentation sandbox
• Multiple workspaces per domain
• Test: validation environment
• Environment-based separation
• Prod: production workloads
• Self-service within guardrails
• Git branch alignment
• Isolated capacity pools
• Automated promotion
• Single connection to source
enterprise data
• Synapse shortcuts configured
by IT
• Gold-tier curated data
• No access for business units
• No workspace sprawl
Workspace Architecture Diagram
Azure Synapse
Enterprise
Data Warehouse
Enterprise Domain Workspace (IT Controlled)
Shortcuts
Gold Lakehouse • Shortcuts to Synapse • Curated Enterprise Data
Shortcuts
• SQL Pools Serverless
Light Oils
Shortcuts
Shortcuts
Lubes
Refining
Dev WS
Dev WS
Dev WS
Test WS
Test WS
Test WS
Prod WS
Prod WS
Prod WS
Pillar 2: Data Distribution Strategy
Zero-copy architecture with centralized control
Core Principles
Single Source of Truth • No Data Duplication • IT-Controlled Distribution • OneLake Shortcuts • Business Unit Flexibility
•
•
•
•
•
Enterprise Data
Shared Data Products
Non-Enterprise Data
IT Controlled
Cross-Domain Access
Business Unit Managed
Synapse → Fabric shortcuts
(no copy)
IT provisions in Enterprise
Domain
Read-only access to business
units
Gold-tier curated datasets
Single connection to Synapse
•
•
•
•
•
Business unit creates data
product
Registered in enterprise
catalog
Shared via shortcuts to other
domains
No physical copy created
OneLake native sharing
•
•
•
•
•
Business unit brings own
sources
Stored in domain workspace
Independent of enterprise data
Self-service ingestion
Domain-specific datasets
Data Distribution Flow
How data flows from Synapse to business units without duplication
✓ IT controls provisioning
Light Oils Domain
Azure Synapse
✓ Single Synapse connection
OneLake
Shortcut
Enterprise Domain
Gold Lakehouse
Curated Data
Refining Domain
✓ Zero copy - shortcuts only
Lubes Domain
Pillar 3: Automation & CI/CD
Templatized provisioning and automated deployments
Template-Based Provisioning
CI/CD Pipeline
Domain Template:
• Workspace structure
• Capacity assignment
• Default permissions
• Folder organization
Environment Template:
• Dev/Test/Prod
• Git integration setup
• Service principals
• Connection strings
Component Templates:
• Lakehouse scaffolding
• Notebook templates
• Pipeline patterns
• Semantic model starters
Deployment Pipeline Architecture
Self-service promotion with No Code Approach
Developer
• Code in Dev WS
Deployment
Pipeline
Git Repo
Azure DevOps
GitHub
Dev Workspace
Test Workspace
Deploy: Deployment
Admin
Deploy: Auto
• Commit to Git
Deployment
Pipeline
Prod Workspace
Deploy: Deployment
Admin
Pillar 4: Governance & Security
Microsoft Purview integration with layered access controls
IT Control Layer
Workspace Access Layer
Data Classification Layer
• Domain & workspace
• Role-based permissions
• Sensitivity labels (Public,
provisioning
• Enterprise data distribution
(Admin, Member, Contributor,
Internal, Confidential, Restricted)
• Promoted, Certified artifacts
Viewer)
• Capacity management
• Service principal authentication
• Data lineage tracking
• Purview policy enforcement
• Row-level security (RLS)
• Audit logging
• Master data governance
• Workspace isolation
Microsoft Purview Unified Governance
Data Discovery • Classification • Lineage • Catalog • Compliance Dashboard • Policy Management • Access Reviews
Access Control Matrix
Role-based permissions across domains and workspaces
Zero copy, Governed, Self-Service Advanced Analytics Architecture
Microsoft Fabric
Azure Synapse
Enterprise
Data Warehouse
Domain Roles Mapping
Admin
IT Team/COE
Shortcut
ELT WS
Enterprise
Domain
Storage WS
Bronze LH
Data Pipelines + Data Flows
- Notebooks
via SQL Endpoint
Silver LH
Gold WH
Shortcut
Domain Driven Workspace
Structure
Contributor
Dev Team
Admin
IT Team
Data Owners
LO Data Owner
Data Science Work Loads
ELT WS
Bronze LH
Shortcut
Analytical Data WS
Bronze LH
Silver LH
Silver LH
Gold WH
Gold WH
IT Team
Admin
Data Owner
Contributor
Dev Analytics
Contributor
Dev Data Science
Viewer
Business User
Existing Power BI Landscape
Business Unit
Domain
Semantic
Models
Analytical Work Loads
ML Data WS
Models, Environments,
Experiments &
Notebooks
Admin
Reports
Analytics WS
Certified Reports,
Semantic Models
WS Structure
Project1 Folder
Data Pipelines + Data Flows + Notebooks
Project 2 Folder
Implementation Roadmap
Phased approach for CITGO Fabric deployment
Foundation
•
•
Fabric capacity provisioning
•
•
•
Enterprise Domain workspace
structure creation
•
2 Pilot
Pilot Domain
Domain
•
3Scale
ScaleOut
Out
4Optimize
Optimize
Select pilot business unit
•
Automate deployment
•
Performance tuning
(Light Oils)
•
Deploy to Refining domain
•
Cost optimization
Create Dev/Test/Prod
•
Additional business unit
•
Advanced features
enablement
onboarding
workspaces
Synapse to OneLake shortcuts
configuration
•
Configure Git integration
•
Optimize capacity allocation
•
User feedback incorporation
Purview catalog setup and
•
Deploy CI/CD pipelines
•
Knowledge transfer sessions
•
Documentation completion
integration
•
Train pilot team on Fabric
Template development for
domains
Business Value
What business units can do in their domain workspaces
Analytics & Reporting
ML Experimentation
• Create Power BI reports from enterprise data
• Spark notebooks
• DirectLake mode for optimal performance
• MLflow for experiment tracking
• Custom semantic models
• Model training and comparison
• Self-service dashboards
• AutoML capabilities
Data Integration
Code Management
• Connect non-enterprise data sources
• Git repository integration
• Data Factory pipelines
• Collaborative development
• Dataflows for transformation
• Version control for all artifacts
• Custom data ingestion
• Self-service deployment
Restrictions: Cannot copy enterprise data • Cannot create duplicate Synapse connections • Cannot provision new domains (IT only)
Workspace Segmentation
Executive Summary
Vision
Enterprise Domain + Business Unit Domains with controlled access and
capacity optimization
Data Distribution
Zero-copy shortcuts to Synapse, centralized enterprise data, no duplication
across workspaces
Implement a governed, self-service Microsoft Fabric
platform that enables CITGO business units to
Automation & CI/CD
leverage enterprise data from Synapse without
Templatized provisioning, Git integration, automated promotion across
Dev/Test/Prod
duplication, while maintaining centralized IT control
over data distribution and providing business teams
with flexibility to integrate their own data sources
and manage their analytics lifecycle.
Governance & Security
Microsoft Purview integration, IT-controlled provisioning, sensitivity labels,
service principals
Sound off.
The mic is all yours.
Influence the product roadmap.
Join the Fabric User Panel
Join the SQL User Panel
Share your feedback directly with our
Fabric product group and researchers.
Influence our SQL roadmap and ensure
it meets your real-life needs
https://aka.ms/JoinFabricUserPanel
https://aka.ms/JoinSQLUserPanel
THANK YOU