Short notes on cloud security, identity, AI tool risk, vulnerability workflows, and practical security engineering. Written from real engineering work.
A CVSS 9.8 score on an internal service behind compensating controls is less urgent than a CVSS 6.5 on an internet-facing service with a public exploit and no authentication required. The score tells you severity in isolation, not whether it matters in your environment.
The signals I weight beyond CVSS: Is the service internet-facing? Is the CVE in the CISA KEV catalog? What is the EPSS exploitation probability? What compensating controls exist? How critical is the underlying asset? How hard is the fix? Working through those questions consistently is what turns a 200-item scanner output into a five-item action list.
Before approving a new SaaS app, I want to understand: what data the app needs and why, how authentication works and whether it supports SSO, what admin roles exist and who will hold them, what logging the vendor provides, what the vendor's data retention policy says, and what the offboarding process looks like when we stop using it.
Most SaaS approvals skip the offboarding question. That is where the access management debt starts. An app approved without a defined offboarding process will still have active OAuth permissions and user accounts six months after the last person stopped using it.
The most common cause of a bad Conditional Access deployment is moving too fast from policy design to enforcement. The fix: use report-only mode, review the data, pilot with a small group, then enforce broadly.
Report-only mode shows what the policy would have done without blocking anyone. Review sign-in logs for at least a week. Look for unexpected blocks, edge cases, and service accounts that authenticate non-interactively. Before enforcing, verify your break-glass accounts are excluded. A CA change that locks out the break-glass account is a very bad day.
Prompt injection is when malicious content in input to an AI model causes it to follow instructions it should not. In a consumer chat context, the impact is limited. In an enterprise workflow where the model can read emails, write tickets, summarize documents, or trigger actions, the trust boundary is much harder to define.
The practical risk in most enterprise AI deployments is not sophisticated injection attacks. It is that the AI tool has access to sensitive data, produces output that drives decisions, and there is no human review in the loop. The fix is primarily about defining what data the tool can access, what actions it can take, and whether humans remain in the decision loop for outputs that affect real systems.
Most security findings that go unfixed do not fail because the risk was unclear. They fail because the finding did not give the owner enough to act. A useful remediation ticket includes: what is wrong in plain language, why it matters in this specific environment, how it was validated, what the specific change is, who the single owner is, how to verify the fix worked, and what the expected completion date is.
The retest criteria are especially important. Without them, closure is a matter of opinion. With them, closure is a verifiable fact.
Every alert is a response to something that already happened. A secure default prevents the condition from existing. Blocking legacy authentication is a secure default. Adding an alert for legacy authentication use is a response to the same problem.
The highest-value security work is often reducing the number of risky paths that exist, not adding more visibility into them. Enforcing MFA everywhere, blocking legacy auth, tightening admin role scope, and removing unnecessary app access all reduce the attack surface before anyone attempts anything.
A scanner finding is an observation. A security risk is a finding that has been validated against the actual environment. The difference matters because scanner output has false positives, stale findings, and findings that are technically correct but do not apply to the specific configuration.
Validation asks: Is this issue real in this environment? Is the affected service actually running and accessible? Is the specific configuration that creates the vulnerability present? Are there compensating controls? Skipping validation wastes engineering time on things that either are not real or cannot be exploited in practice.
Guest access in Microsoft 365 is necessary for legitimate collaboration. The problem is not that guests exist. It is that guest accounts accumulate without lifecycle management, and the default settings in many tenants are more permissive than the organization realizes.
What I look for: How many active guest accounts exist? What resources can they access? When were they last active? Is there an expiration date? Who is the business owner? Is external sharing scoped to specific domains or open? Most guest access problems are process problems, not technical ones. The fix is usually a review process and an offboarding step.
When evaluating a request to use a new AI tool, I want to understand: What data will enter the tool? What types of users will use it? What will the output be used for? Has the tool gone through vendor risk review? What does the vendor's data retention policy say? Is there logging of inputs and outputs? Are humans reviewing sensitive outputs before they drive decisions?
The most common gap is that tools are approved based on vendor marketing rather than a review of actual data handling terms. Vendor documentation on data retention, training data opt-out, and enterprise data isolation tells you far more about the real risk than a SOC 2 summary.
Translating security risk for non-technical audiences usually fails in one of two directions: oversimplification that loses the nuance needed for a good decision, or so much hedging that the risk does not land at all.
The model I use: Risk (what the issue is in plain language). Evidence (what confirms it). Impact (what happens if left open, in terms the stakeholder cares about). Fix (what specifically needs to change). Owner (the single person responsible). Validation (how we confirm the change worked). This structure works for technical owners and executives because it gives both groups enough to make their role-appropriate decision.
I started by taking apart the family computer because I wanted to know what was inside. I wrote batch scripts to automate things I did not fully understand. I worked around admin controls because figuring out how to bypass them was more interesting than asking for permission.
None of that was malicious. It was curiosity about how systems behaved under pressure. And it turned out that understanding how something breaks is the most reliable way to understand how it works. Every assumption I tested, every control I bypassed, and every configuration I broke and had to fix taught me something about the gap between what a system was supposed to do and what it actually did. That gap is where security engineering lives.
A penetration test that produces a list of vulnerabilities without context for how to fix them has done half the job. The value of offensive testing is not just finding weak points. Understanding the attack path well enough to design a fix that actually closes it is what matters.
The best offensive work does three things: it shows a realistic attack path from an initial position to a meaningful impact, it explains what made each step possible, and it connects each finding to a specific control change that would have stopped the attack. That last part is what turns a finding into engineering.