Problem Statement

IAM Permissions Explorer

A design brief for identifying and remediating excessive or unused cloud IAM permissions — bridging the gap between what identities can access and what they actually use.

The Crisis

The Entitlement Problem

Cloud environments grant permissions far faster than they revoke them. The result: identities accumulate massive, mostly-unused access over time — creating an attack surface that grows invisibly.

Industry analysis shows 75% of security failures by end of 2023 result from inadequate management of identities and privileges. Research from Microsoft's Entra team confirms the majority of granted cloud permissions go entirely unused.

"Dormant entitlements constitute a vast, unnecessary attack surface that threat actors actively exploit to escalate privileges and facilitate lateral movement."

Yet enforcing least privilege at scale is notoriously difficult — not because engineers lack intent, but because the tooling fails them.

Why Current Tools Fall Short

The Visibility Gap

The core engineering challenge is a Gap Analysis: identifying the delta between what IAM policies say an identity can do versus what telemetry logs record it actually doing.

This sounds simple. In practice it is plagued by:

Taxonomical misalignment

IAM service prefixes differ from CloudTrail API prefixes. ses in IAM → email in logs. Action names diverge. s3:ListAllMyBuckets vs the CLI call aws s3 ls.

Telemetry blind spots

CloudTrail omits high-volume data events (S3 GetObject, Lambda invocations) by default. Some actions like CloudWatch:PutMetricData are never recorded at all.

API versioning mutations

Legacy SDK calls append version strings: ListDistributions2017_03_25. A robust tool needs regex or ML classifiers to normalise these back to root IAM actions.

Design Goal

What This Tool Must Do

① Visualise

Translate raw log-policy delta into intuitive executive-level metrics (PCI, Risk Margin) and interactive inheritance graphs — not raw JSON arrays.

② Remediate

Automatically generate least-privilege replacement policies. Show a side-by-side diff so engineers can validate changes before deploying. One-click export to IaC.

③ Empower

Give developers self-service policy generation. Break the centralised bottleneck where a single security team reviews every policy request — a known velocity killer.

User Persona

Primary Operator

SC

Sarah Chen

Senior Cloud Security Engineer · AWS + Azure · 6 yrs exp

Background

Sarah sits at the intersection of security governance and developer velocity. She owns IAM policy reviews, compliance audits, and incident response — while simultaneously being the person developers blame when deployments are slow.

She manages permissions for 3,000+ identities across two cloud environments. She uses the AWS console daily, has scripted her own ad-hoc gap analyses in Python, and is deeply familiar with CloudTrail. She distrusts "magic" automation because she has seen it break production.

Goals

G

Pass the next SOC2 audit

Needs evidence that orphaned accounts have been cleaned up and least privilege is enforced across all environments.

G

Reduce her policy review backlog

Currently 40+ open IAM policy requests from developers. Each one requires manual context-gathering that takes 30–90 minutes.

G

Not be the reason a service went down

Her deepest fear: an automated policy revocation breaks a critical production job that only runs once a quarter.

Three Core Pain Points

01

Fear of breaking production

A role that has only used read permissions for 89 days might need write access on day 90 for a quarterly batch job. Traditional gap analysis tools have no predictive context. When stakes of an outage far outweigh the abstract risk of an over-permissioned account, the default posture degrades to "allow all."

Design implication: Every remediation suggestion must show confidence signals — last-used timestamp, usage frequency heatmap, inferred job schedule — not just "this was unused."

02

Hidden dependencies and inheritance complexity

A developer's effective permissions derive from a labyrinthine chain: Active Directory groups → assumed cross-account IAM roles → nested Resource-Based Policies → Permission Boundaries → Service Control Policies. When the source of a privilege is obfuscated, engineers hesitate to alter anything.

Machine identities make this worse — CI pipelines, Lambda functions, and containers rotate rapidly and have wildly different scopes than humans.

Design implication: The tool must visualise the full inheritance chain for any selected identity — not just the attached policies.

03

Centralisation bottleneck and developer friction

When a central security team must manually review all IAM policies, they become the primary bottleneck. Developers — lacking context on what permissions they actually need — request wildcard policies (s3:*) to guarantee nothing breaks. The security team, lacking application context, cannot meaningfully push back.

This creates an adversarial dynamic characterised by delayed deployments and mutual distrust.

Design implication: Developers must be able to self-serve scoped policy generation using their activity logs, without going through the security team for initial drafts.

Secondary Personas

Application Developer

Needs to understand why their deployment failed (AccessDenied) and self-generate the minimum policy required. Values speed, not security depth.

CISO / Security Director

Needs a single-number risk posture score to report to the board. Does not want to read policy JSON — wants trend lines and compliance percentages.

Wireframes

3-Screen Design

Three interconnected screens covering discovery → investigation → remediation. Click between screens using the controls below. Annotations explain key design decisions.

Screen 1 — Risk Dashboard

Screen 2 — Identity Deep Dive

Screen 3 — Remediation Diff

IAM Permissions Explorer · Dashboard

Permissions Overview

Tracking window: 90 days · Last scan: 2 min ago

Export Report

▶ Run Scan

PERMISSIONS CREEP INDEX

68

↑ +4 from last week

ORPHANED ACCOUNTS

127

Zero activity, 90 days

LEAST PRIV. ADHERENCE

41%

Target: >80%

UNUSED PERMISSIONS

2,841

Across 3,214 identities

Risk Composition

68

PCI Score

Critical identities (28%)

High risk (27%)

Compliant (45%)

Remediation TradeoffInteractive

Slide to see impact of deploying N new scoped policies

Policies

50

PCI Reduction: −18 New score: 50

Est. remediation effort: ~4 hrs

Identities by Risk 127 showing · Sort: Risk ↓

Identity / Role	Type	Risk	Permissions Used	Last Active
arn:aws:iam::prod/DataPipelineRole Attached: AdministratorAccess	ROLE	CRITICAL	8%	6 days ago	Inspect →
svc-account@project.iam Azure · Subscription: prod-001	SVC ACC	HIGH	22%	14 days ago	Inspect →
john.doe@company.com IAM User · Groups: dev-team, data-eng	USER	MEDIUM	44%	Yesterday	Inspect →
LambdaExecutionRole-ETL Attached: S3FullAccess, DynamoDBFull	ROLE	LOW	78%	1 hr ago	Inspect →

Screen 1 Annotations

1

Permissions Creep Index (PCI) — Single top-line metric quantifying the divergence between granted and used permissions. Gives CISOs a boardroom-ready number without needing to interpret raw data.

2

Remediation Tradeoff Slider — Interactive element showing how deploying N new scoped policies reduces PCI. Turns risk reduction into a game with a visible, quantified payoff. Directly addresses Sarah's "is it worth it?" hesitation.

3

Usage bar per identity — Ratio of used/granted permissions shown as a progress bar. Colour-coded red/amber/green by severity. Scannable at a glance — no number reading required for triage.

4

Multi-cloud unified view — AWS and Azure identities shown in one table, distinguished by the identity type badge. Reduces context-switching between console UIs.

IAM Permissions Explorer · Identity Deep Dive — DataPipelineRole

← Back to All Identities

DataPipelineRole

arn:aws:iam::123456789:role/DataPipelineRole

CRITICAL RISK

Generate Fix →

Permission Usage (90 days)

s3:GetObject

92%

s3:PutObject

67%

dynamodb:GetItem

34%

s3:DeleteBucket

0%

iam:PassRole

0%

ec2:TerminateInstances

0%

+ 2,834 more unused

never called

⚠ Source: AdministratorAccess · Grants ALL AWS permissions

Access Inheritance Chain

SCP: DenyLeave

Org root

AdministratorAccess

Attached directly

DataPipelineRole

Lambda fn: etl-daily

CI: github-actions

SCP Service Control Policy POL Managed Policy ROLE IAM Role

⚠ Critical Finding Wildcard Admin Attached

This role has AdministratorAccess attached but has only called 3 distinct AWS services in the last 90 days out of 300+ available. The role is assumed by a Lambda function and a CI pipeline — neither of which requires administrative access. This constitutes a critical attack path: a compromised CI token could pivot to full account takeover.

Generate Least-Privilege Policy →

Detach Admin Policy

JIT Access Instead

Screen 2 Annotations

1

Per-action usage frequency bars — Each permission listed with its 90-day invocation frequency. Engineers see exactly which actions are genuinely used vs never called, with dangerous unused permissions (DeleteBucket, iam:PassRole) surfaced in red to demand attention.

2

Visual inheritance chain — Indented tree shows how permissions flow: SCP → Managed Policy → Role → Assumers. The engineer can see at a glance that two machine identities (Lambda + GitHub Actions) are assuming this overprivileged role, making it a critical lateral movement risk.

3

Context-aware remediation options — Because the role is entirely unused for admin actions, three options are offered: generate a scoped replacement policy, directly detach the admin policy (hard revocation), or convert to JIT access. The choice respects the engineer's risk tolerance.

4

Plain-English finding summary — A narrative explanation of WHY this is a problem, not just WHAT it is. Addresses Sarah's need to justify remediation decisions to developers without writing her own impact analysis.

IAM Permissions Explorer · Remediation — Policy Diff Review

← Back to DataPipelineRole

Review Proposed Policy Changes

Auto-generated based on 90-day usage analysis · Verify before deploying

↓ Export JSON

Open PR in GitHub

✓ Deploy Now

⚡ Confidence: High — No permissions were used in 90 days. No scheduled jobs detected. Last invocation: 6 days ago using only S3 and DynamoDB actions. Review quarterly batch jobs before deploying.

Current Policy (AdministratorAccess)

2,841 excess permissions

 {
  "Version": "2012-10-17",
− "Statement": [{
−  "Effect": "Allow",
−  "Action": "*",
−  "Resource": "*"
− }]
 }

Proposed Policy (Least Privilege)

+6 scoped actions

 {
  "Version": "2012-10-17",
+ "Statement": [{
+  "Effect": "Allow",
+  "Action": [
+   "s3:GetObject",
+   "s3:PutObject",
+   "dynamodb:GetItem",
+   "dynamodb:PutItem",
+   "dynamodb:Query",
+   "logs:CreateLogGroup"
+  ],
+  "Resource": [
+   "arn:aws:s3:::etl-*",
+   "arn:aws:dynamodb:us-east-1:..."
+  ]
+ }]
 }

Test in Sandbox

Deploy to staging, run integration tests, then promote

Open GitHub PR

Add to IaC Terraform/CDK repo for review

Deploy Directly

Apply immediately. Rollback in 1 click if issues arise

Screen 3 Annotations

1

Confidence Signal Banner — Before seeing the diff, the engineer sees a plain-English confidence statement about WHY this recommendation is safe (or what to check). This directly addresses the #1 pain point: fear of breaking production. A quarterly-batch job warning appears proactively.

2

Synchronized side-by-side JSON diff — Left panel: current wildcard policy. Right panel: generated least-privilege policy. Colour-coded: red lines removed, green lines added. Panels scroll in sync. Key design choice: wildcards (*) are highlighted in red/green to make the "wildcard → specific action" replacement visually obvious.

3

Resource scoping visible in the diff — The new policy not only scopes actions but also scopes resources (e.g., arn:aws:s3:::etl-*). Engineers can verify the blast radius is correctly contained before approving.

4

Three deployment paths — Sandbox test, IaC PR, or direct deploy. This respects different org maturity levels. Every path includes rollback affordance. The IaC path is recommended (accent colour) to encourage GitOps practices.

Feature Write-up

Proposed Features, Prioritisation & Success Metrics

A max 1-page equivalent write-up synthesising the design rationale, feature tiers, and the KPI matrix used to measure success.

Feature Prioritisation

Features are ordered by the intersection of user impact (addresses a core pain point) and feasibility (achievable in an MVP timeframe without requiring deep ML infrastructure).

P1

CRITICAL

Unified Permission Gap Dashboard with PCI Score

A single-screen view showing the Permissions Creep Index, Least Privilege Adherence Score, and orphaned account count — all derived from a backend gap analysis. Identities ranked by risk with usage bars. This is the entry point for every user session and must communicate "what's on fire" within 5 seconds of loading.

Addresses Pain #1CISO-facingMVP coreMulti-cloud

P1

CRITICAL

Per-Identity Inheritance Graph + Usage Breakdown

Deep dive view for any identity: visualises the full access chain (SCP → Policy → Role → Principal), lists every granted action with its 90-day invocation frequency, and highlights dangerous unused permissions in red. Eliminates the need for Sarah to manually trace inheritance across 5 different console screens.

Addresses Pain #2Security engineer-facingMVP core

P1

CRITICAL

AI-Generated Policy + Synchronized JSON Diff Viewer

Auto-generates a least-privilege replacement policy using 90-day usage data. Presents changes in a side-by-side diff with colour-coded additions/removals, scoped resource ARNs, and a confidence signal banner (flags potential quarterly jobs). One-click deployment with three paths: sandbox, IaC PR, or direct. This directly neutralises the fear of breakage by making changes transparent and reversible.

Addresses Pain #1Addresses Pain #3Core differentiatorMVP core

P2

HIGH

Remediation Tradeoff Slider

Interactive slider on the dashboard: "If I fix N identities, my PCI drops by X." Gamifies risk reduction and helps security teams prioritise remediation sprints. Gives the CISO an "effort vs impact" view for resource allocation. Technically straightforward once the gap analysis engine is running.

CISO-facingEngagement driverPost-MVP v1

P2

HIGH

Developer Self-Service Policy Generator

A simplified interface where developers can paste their Lambda/EC2 role ARN, select a date window, and receive a least-privilege policy draft without involving the security team. Addresses the centralisation bottleneck. Includes a "request review" button that creates a ticket for the security team — not a full bypass, but a massive acceleration.

Addresses Pain #3Developer-facingPost-MVP v1

P3

MEDIUM

Just-in-Time (JIT) Temporary Elevation Workflow

Replace standing privileged roles with on-demand temporary access: developer requests elevated permissions for N hours, approved by a security engineer, auto-expires. Prevents permanent privilege accumulation at its source. Requires deeper integration with identity providers (Okta, Azure AD) and is architecturally complex — ideal for a v2 milestone.

Zero Trust alignedv2 roadmapPAM maturity

Original Design Decisions

Confidence Signal over raw confidence scores: Rather than showing a numeric ML confidence score (opaque, distrusted), the diff view shows a plain-English explanation of the evidence: "No permissions used in 90 days. No scheduled jobs detected." This is more actionable and builds trust faster.

Three deployment paths, not one: Most tools offer "deploy" or "export." Our tool offers sandbox → IaC PR → direct deploy as distinct options, meeting teams where they are in GitOps maturity without forcing a single workflow.

Success Metrics (KPIs)

KPI	Definition	Target	Business Impact
Least Privilege Adherence Score	% of identities operating at or below minimum required access (granted vs used comparison)	> 80%	Quantifies Zero Trust enforcement; reduces theoretical blast radius of credential compromise
Orphaned Account Ratio	% of active identities with zero telemetry activity over 90-day tracking period	< 2%	Eliminates dormant backdoors; directly supports SOC2 and GDPR compliance audits
Authorization Failure Rate	% of API requests resulting in AccessDenied (total failed ÷ total requests)	< 0.5%	Confirms policies are accurate — high rate indicates over-restriction breaking production workflows
Remediation Adoption Rate	% of generated policy recommendations that are deployed (sandbox, IaC, or direct) within 30 days	> 60%	Measures tool trust and UX effectiveness; low rate indicates engineers fear the suggestions
Lateral Movement Risk Score	Composite: count of highly-privileged accounts × rate of standing privileges × presence of static credentials	Continuous ↓	Quantifies attacker's escalation potential following an initial breach
Policy Review Cycle Time	Average hours from developer policy request to approved deployment (measures bottleneck reduction)	< 4 hrs	Directly measures developer friction reduction; baseline of 30–90 hrs manual review

Bonus · Development Action Items

MVP Blueprint — Dev Handoff

Five concrete engineering tasks scoped for a Minimum Viable Product. Backend tasks marked purple, frontend tasks marked green. Each item includes discussion points for the engineering team.

B1

Scalable Log Ingestion Pipeline

BACKEND SPRINT 1 Node.js / JavaSQSLambda

Build a resilient data pipeline to ingest CloudTrail logs (AWS) and Activity Logs (Azure) at scale. Architecture: stream logs into Amazon SQS → serverless function polls queue, parses nested JSON, handles network timeouts, deserialises events into queryable data objects, filters noise (read-only describe/list events). Must handle CloudTrail's payload truncation limit (102,401–131,072 character requests).

Configure CloudTrail S3 delivery + SQS integration with dead-letter queues for parsing failures
Handle Azure Activity Log schema inconsistencies — some resource providers omit evidence blocks entirely
Normalise API version strings: ListDistributions2017_03_25 → cloudfront:ListDistributions using regex classifiers
Discussion: Should data events (S3 GetObject, Lambda invoke) be opt-in or always-on? Cost and noise tradeoff.
Discussion: How do we handle sts:GetCallerIdentity (always allowed, skews usage analysis)?

B2

Gap Analysis & Heuristic Mapping Engine

BACKEND SPRINT 1–2 Pythonbotocore mappingGraphDB

Core algorithmic engine that maps deserialized CloudTrail events to IAM privilege names using a maintained translation dictionary (SDK endpoints ↔ IAM prefixes). Executes temporal gap analysis: compare 90-day usage array against policies currently attached to the evaluated role. Output: explicit list of unused actions per identity, plus a permissions graph for inheritance chain traversal.

Build and maintain IAM ↔ API mapping dictionary (critical: ses→email, cloudwatch→monitoring, etc.)
Handle IAM services mapping to multiple API endpoints (e.g. lex → models.lex + runtime.lex)
Store identity inheritance graphs in a graph database (Neptune / Neo4j) for efficient traversal queries
Discussion: How do we handle Azure RBAC's authorization.evidence.roleAssignmentId when multiple concurrent role paths exist?
Discussion: Configurable tracking window (30d / 90d / 180d) — how do we surface "last used" for infrequent batch jobs?

B3

Automated Least-Privilege Policy Generation Service

BACKEND SPRINT 2 JSON synthesisIAM APITerraform

Synthesises gap analysis output into a new least-privilege JSON policy document. Generation logic must safely scope down wildcards (e.g. s3:* → specific used actions), preserve necessary resource constraints, tags, and conditional context keys from the original policy. Must produce valid IAM JSON that passes AWS policy simulator validation before being surfaced to the UI.

Wildcard expansion: when a role has dynamodb:* but only calls 3 actions, emit only those 3 actions
Resource scoping: where CloudTrail captures the specific resource ARN, include it in the generated policy
Preserve Condition blocks from original policy (IP restrictions, MFA requirements, time windows)
Generate Terraform / CDK equivalents alongside JSON for teams using IaC
Discussion: Should we always require a human approval step, or allow auto-remediation for zero-usage orphaned accounts?

F1

CIEM Dashboard + Metrics Visualisation

FRONTEND SPRINT 2 React / Next.jsD3.jsWebSocket

Build the primary dashboard (Screen 1 of wireframes) rendering the Permissions Creep Index, Least Privilege Adherence Score, orphaned account count, and sortable identity table with usage bars. Must handle real-time scan results via WebSocket or SSE. Graph visualisation library (D3 or Recharts) to render the PCI donut and remediation tradeoff slider. Dark-mode native, responsive down to 1280px.

PCI donut chart with animated transition on score change — provides immediate visual feedback on remediation impact
Identity table: virtual scrolling for 10,000+ identities; filter by environment (AWS/Azure), type (user/role/svc), risk level
Remediation tradeoff slider: debounced API call returns projected PCI reduction for N remediated identities
Discussion: Should the scan be on-demand (button) or continuous background polling? Implications for API rate limits and cost.

F2

Side-by-Side JSON Diff Viewer + Deployment UI

FRONTEND SPRINT 3 ReactMonaco EditorGitHub API

The deep-nested JSON diff viewer (Screen 3 of wireframes). Dual-pane layout with synchronized scrolling, syntax highlighting, and colour-coded diff indicators (red removals, green additions) that bubble up to parent nodes. JSON normalisation to prevent false positives from key ordering differences. Deployment affordances: sandbox test, GitHub PR creation (requires OAuth), and direct AWS/Azure API deployment with rollback button.

Use Monaco Editor (VS Code's engine) or react-diff-viewer as base — do NOT build a diff engine from scratch
JSON normalise before diff: sort keys alphabetically, strip whitespace variation — eliminates false positives
Collapsible node trees (both panels sync collapse state) — critical for large policies with 50+ statement blocks
GitHub PR creation: pre-fills PR title, description, and reviewer assignment from the security team roster
Confidence signal banner: surface "last used" timestamp, detected cronjob patterns, quarterly batch job heuristics
Discussion: Rollback mechanism — should we snapshot the previous policy version before applying changes?

Technical Risk Register

High: CloudTrail data event cost

Enabling S3 GetObject logging on large buckets can cost hundreds of dollars/month. Mitigation: offer opt-in per bucket with cost estimator shown before enabling.

High: IAM ↔ API mapping drift

AWS adds ~200 new IAM actions per year. Mapping dictionary needs automated testing against live AWS documentation on every release.

Medium: Azure RBAC opacity

Azure's authorization.evidence field is inconsistently emitted across resource providers. Need provider-specific parsing fallbacks.

Medium: False positive suppression

Actions that run quarterly (batch jobs, compliance reports) will appear as "unused" in a 90-day window. Need job schedule inference to prevent false positive remediation suggestions.

cloud-sheriff

IAM Permissions Explorer

The Entitlement Problem

The Visibility Gap

What This Tool Must Do

① Visualise

② Remediate

③ Empower

Primary Operator

Background

Goals

Three Core Pain Points

Application Developer

CISO / Security Director

3-Screen Design

Screen 1 Annotations

Screen 2 Annotations

Screen 3 Annotations

Proposed Features, Prioritisation & Success Metrics

Feature Prioritisation

Original Design Decisions

Success Metrics (KPIs)

MVP Blueprint — Dev Handoff

Technical Risk Register