§ 01 — Product

What it is, and what it isn't.

Agentic IT collapses the cost, security, compliance, and reliability work of running Azure and Microsoft 365 into a single chat. The agent reads your inventory, knows your conventions, formulates a plan, and offers to run the fix once you approve. This page is the pitch — and the boundary lines.

A typical Monday · 09:47

Your phone buzzes. Cost spend on Azure jumped 14% over the weekend. The dashboard is open. So is Defender. So is the cost analyser. So is the architect's Notion page about which subscriptions are even yours.

You can spend ninety minutes piecing together what changed across seven tools — or you can open one chat and say: "What spiked our Azure spend over the weekend, and what should I do about it?"

The agent reads the inventory, runs the numbers, identifies the three new VMs in rg-prod-eu with no owner tag, drafts a remediation plan, and shows you the approval card before it touches a thing. Total time: under two minutes.

That's the platform. The rest of this page explains what makes that interaction possible — and where it stops.

The pain we solve

Mid-size IT teams running on Azure / Microsoft 365 / vBox-cloud carry permanent operational debt. Three shapes of pain, by name:

PAIN · 01

The orphans nobody owns

Disks attached to deleted VMs. Public IPs from a six-month-old test. Storage accounts older than the engineer who created them. The bill keeps charging; nobody knows what's safe to remove.

Quiet 5–18% of your monthly cloud bill
PAIN · 02

The wiki doesn't match prod

The naming convention says rg-{env}-{region}-{app}. Reality says half of it. The tagging policy says owner is required. Reality says it's empty in 1,400 resources. Drift compounds; nobody has a free afternoon to crawl it.

Eats compliance reviews and cost-attribution accuracy
PAIN · 03

The senior just left

The runbook for "what to do when Defender flags a public storage account" lived in their head. The OneNote page has the first three steps. Step four is a question mark. Onboarding a replacement takes weeks of re-discovery.

Weeks of capacity, repeated per role change

Today, vs. with Agentic IT

The same workflow — investigate a cost spike — measured in tools, copy-pastes, and minutes:

Investigate a 14% Azure cost spike

  1. Open Azure Cost Management. Filter to last 7 days. Spot a peak in subscription prod-eu.
  2. Open Azure Resource Graph. Query for resources created last weekend. Copy-paste 12 IDs into a notepad.
  3. Open the architect's Notion page to remember which RG should own those resources.
  4. Switch to Defender for Cloud. Cross-check whether any of these 12 resources have alerts.
  5. Open az CLI. Run az resource show on each ID to find the owner tag — half are blank.
  6. Open Slack. Ask the team channel: "did anyone provision X over the weekend?"
  7. Wait. Decide. Document. Maybe write a follow-up ticket. Hope you don't forget.
~75 min elapsed 7 tools opened 4 copy-pastes 1 follow-up ticket

The same investigation with Agentic IT

  1. Open the chat. Type: "What spiked our Azure spend over the weekend, and what should I do?"
  2. The orchestrator routes to the cost specialist. It runs get_cost_breakdown, list_orphaned_disks, resource_inventory in parallel — all under one approval-gated tool surface.
  3. Within 30 seconds: a streamed answer with three new VMs in rg-prod-eu, no owner tags, $42/day each. A drafted remediation: tag, downsize, or delete.
  4. Click ✓ on the approval card to apply tags. Click ✗ on the delete suggestion until you've checked with the team.
  5. The whole turn is logged in Langfuse with cost, tokens, and tools used. The chat lives in the project. Tomorrow's standup has a link.
~90 sec elapsed 1 chat 0 copy-pastes Full audit trail

What you can actually do

Five outcomes the platform delivers today, mapped to the tools and agents behind them:

$

Find the money

Identify orphaned resources, idle VMs, mis-sized disks, and over-provisioned everything. Estimate savings before you act.

  • list_orphaned_disks · list_orphaned_nics
  • list_idle_vms · get_cost_breakdown
  • estimate_savings · price-calculator MCP

Tighten the perimeter

Find open ports, public storage, expired Key Vault items, weak SQL configurations, missing NSG associations. Prioritise by Defender score impact.

  • list_open_ports · list_public_storage
  • list_keyvault_issues · list_defender_alerts
  • get_secure_score · list_unassociated_nsgs
§

Stay in compliance

Tag drift, RBAC drift, missing resource locks, policy violations across management groups. Surface them before audit week.

  • list_missing_tags · list_policy_violations
  • get_rbac_issues · list_resource_locks
  • list_management_groups

Sleep at night

Backup coverage, site-recovery health, monitoring gaps, alert quality. Catch the broken backup before the disaster, not after.

  • get_backup_coverage · list_unprotected_vms
  • get_site_recovery_status · get_alert_coverage
  • list_missing_diagnostic_settings

Go faster, every day

Schedule recurring scans, save invocable skills (slash commands), share projects, sync to Git or SharePoint, hand off via persistent project memory.

  • cron + one-shot scheduled jobs
  • invocable skills · slash commands
  • GitHub / SharePoint two-way sync

The platform, in numbers

11 MCP integrations
8 Specialist agents
38 Domain entities
100% Destructive ops gated
5s Stop-button SLA
0 Vendor lock-in

The numbers are the design's load-bearing claims. Eleven MCP integrations means the agent can talk to vBox, Jira, Outlook, Teams, SharePoint, Azure CLI, web search, document tools, and the schedule store through one uniform protocol. Eight specialists fan out in parallel when a question warrants depth. Thirty-eight entities cover the whole domain — chats, projects, artifacts, memory, schedules, integrations, audit. One hundred percent is the only acceptable number for "destructive operations gated" — every write tool routes through an approval card. Five seconds is the watchdog SLA on the Stop button. Zero vendor lock-in: the LLM transport is OpenRouter; switch any model per tenant, per user, per turn.

Who it's for

Persona · 01

IT operator

Daily ops, troubleshooting, hygiene. The chat replaces the "open six tools, copy IDs around, decide" loop with a conversation that knows your inventory.

Persona · 02

Cloud architect

Cost / security / reliability reviews. The suggestion engine pre-stages findings; the project memory tracks decisions across reviews.

Persona · 03

Tenant admin

Manages agents, tools, model modes, credit caps, custom MCP connections, sharing. Configurable everything; observable everything via Langfuse.

Persona · 04

Developer

Uses Claude Code / OpenCode integrations and the terminal for IDE-style work. Artifacts sync to Git automatically.

What it is not

Knowing the boundaries is half the pitch. The platform is deliberate about what it doesn't try to be:

The three principles that shape everything

If you remember nothing else, remember these:

1

The chat is the IDE

Artifacts, terminal sessions, schedules, approvals — every workflow can be initiated, observed, and resolved inside the chat thread. The WebSocket carries token streams, tool calls, approval gates, artifact saves, and routing events as separate event types so the UI can render them inline. → §02·Request lifecycle

2

Approvals before destruction

A platform that can delete cloud resources must never do so without an explicit user click. Every write tool routes through IApprovalManager and lands in the audit trail (tool_input JSON, tool_result JSON, approval status). The headless path auto-approves only because there is no user — and tenant admins can deny destructive tools entirely. → §07·Approvals

3

The user owns the model and the integrations

Tenant admins choose which LLM modes appear; users override the model per mode; users connect their own MCP servers. The platform never assumes a single global LLM or a fixed integration list. The auto-router transparently downgrades to the cheapest capable model when a budget is hit — never blocks. → §07·Credits & auto-router

The promise

Open the chat. Ask the question. Approve the action. Never lose the audit trail. Never get blocked by cost. Never wait for a senior engineer's runbook page to load. That's the entire pitch.

See also

For six concrete end-to-end examples — cost spike triage, security review, subscription onboarding, backup audit, a scheduled scan, and a live incident — read §01.2 · Real use cases →.