Auth Model

The system uses a layered authentication model where user identity flows through every layer:

Auth Flow
User → VS Code (Azure AD SSO)
    → Bearer token sent to Orchestration API
    → API validates token, extracts user identity
    → API uses service principal for downstream calls
    → Power BI / Snowflake enforce RLS / role-based access
  • User authenticates via Azure AD / Entra ID SSO
  • Short-lived bearer tokens — no stored passwords
  • Service principal with least-privilege scopes for PBI/Snowflake
  • User identity preserved for audit and RLS enforcement

Query Validation

Every generated query is validated before execution. The validator checks for:

CheckDescriptionAction on Fail
schema_whitelistQuery only references allowed tables/schemasBlock + explain
no_mutationsNo INSERT, UPDATE, DELETE, DROP, TRUNCATEBlock + explain
row_limitLIMIT/TOP clause enforced (max 10,000)Auto-inject limit
no_cross_joinPrevent accidental cartesian productsBlock + explain
cost_estimateEstimated scan size below thresholdWarn + require approval

Policy Engine

Policies are defined as JSON rules and evaluated at runtime:

Policy ExampleJSON
{
  "name": "block-pii-tables",
  "description": "Prevent access to PII-containing tables",
  "condition": {
    "tables_referenced": {
      "contains_any": ["dim_patient", "dim_member_ssn"]
    }
  },
  "action": "block",
  "message": "Direct access to PII tables is not permitted. Use the anonymized views instead."
}

Role-Based Policies

Different users see different data. Analysts get aggregated views, admins get detail-level access.

Cost Guardrails

Queries exceeding estimated cost thresholds require explicit approval before execution.

Time-Based Restrictions

Restrict heavy queries to off-peak hours. Lightweight queries run anytime.

Data Classification

Tables tagged as PHI, PII, or Confidential have additional access checks.

Human-in-the-Loop

Certain operations require explicit user approval before proceeding:

Approval Flow
// System detects a high-cost query
⚠️ "This query will scan ~2.4 TB. Estimated cost: $1.20."
   "Do you want to proceed?"

   [✅ Execute]  [❌ Cancel]  [📝 Modify]

// User clicks Execute → query runs
// User clicks Modify → prompt for refinement

Approval is required for:

  • Queries exceeding cost threshold
  • Full data exports (no row limit)
  • Queries with low confidence scores (< 0.7)
  • First-time access to a new data source

Audit Trail

Every interaction is logged with full context for compliance and debugging:

Audit RecordJSON
{
  "timestamp": "2026-04-07T14:32:01Z",
  "user": "jsmith@contoso.com",
  "session_id": "sess_abc123",
  "question": "Show PMPM by region for Q1",
  "intent": "kpi_trend",
  "source": "power_bi",
  "query": "EVALUATE SUMMARIZECOLUMNS(...)",
  "validation": "passed",
  "rows_returned": 12,
  "duration_ms": 340,
  "status": "success"
}

Row-Level Security

Row-level security is enforced at the data layer, not in the application:

Power BI RLS

RLS roles defined in the semantic model. The service principal impersonates the user's identity so PBI enforces the correct filters automatically.

Snowflake Roles

User identity mapped to Snowflake roles via the orchestration layer. Each role has specific warehouse, schema, and table grants.