Nanitor is the internal auditor for patching

At Nanitor we run a fleet of Debian Linux machines. The patching itself is delegated to unattended-upgrades, the standard Debian mechanism. The interesting question is not whether it ran, but whether anyone is watching. Because our product is a CTEM platform (Continuous Threat Exposure Management), Nanitor itself plays that role: it cross-checks every host against the upstream CVE feed and would surface any host where the patches did not actually land.

Overview

Last week, the Nanitor HealthScore in our daily SecOps Slack report dipped from 99.23% on Wednesday to 85.87% on Thursday, then climbed back to 99% by Monday. The dip itself was the entire incident. A new bind9 security release landed across our Debian fleet, unattended-upgrades rolled it out within 24 hours, and the HealthScore tracked the recovery in real time. No human had to touch a host.

This post walks through the timeline using one of those hosts as a concrete example, the ISO 27001 controls behind the cycle, and the escalation path when 24 hours is too slow.

Why this matters

unattended-upgrades does the patching. That part is meant to be uneventful: a standard Debian mechanism on a systemd timer, applying packages on its own. The interesting question is whether anyone is watching, because without an independent watcher the only signal is silence, and silence is the same shape whether the patcher is working or has quietly stopped.

Nanitor plays the watcher. CTEM named the vulnerability the evening before any package was touched. The audit log recorded every package upgrade with its old and new version. Six CVEs flipped from "vulnerable" to "no longer vulnerable" in the same scan. The Nanitor HealthScore in our daily SecOps Slack report showed the dip and the recovery in real time, without anyone writing it down.

For the ISMS, that is the evidence chain behind "we manage technical vulnerabilities". For a manager, it is the difference between "yes, we patch automatically" and "here is patching, audited independently, working on this day, in 24 hours, with the trail to show it".

Why we show the dip

Vulnerability discovery has accelerated, and AI-assisted tooling is part of why. New CVEs surface every day at a pace that did not exist a few years ago, and on a Debian-majority estate a single upstream release lights up most of the fleet at once. A 14-point dip in the morning HealthScore is what that cadence looks like in practice. It is the cadence to expect, not the exception.

The discipline is to make the dip visible. The old instinct was to wait until everything was clean before reporting, so the screen stayed quiet while the patcher caught up. We do the opposite: the drop has to show on the morning report, the clock starts the moment Nanitor flags it, and the recovery has to show in the next report. A practice that hides the dip cannot prove it patches anything.

Background

The daily SecOps Slack report is the morning glance: one Nanitor HealthScore, one P0 count, one set of "operational" green ticks. Most days it is uneventful, which is exactly what we want from a well-run estate. The boring days are the win. When something moves, the dip itself is the story.

The signal

Four mornings in a row from this past week:

2026-05-20  Health 99.23%   P0 0
2026-05-21  Health 85.87%   P0 0    <-- 14-point drop
2026-05-22  Health 94.09%   P0 0
2026-05-25  Health 99.21%   P0 0

Everything else was green. Network, SaaS servers, system performance, security posture: operational. The only signal was the Thursday HealthScore sliding from 99 to 85 and slowly climbing back to 99 by Monday.

A 14-point drop on a small estate is significant. It points somewhere specific. Time to look.

The investigation

Inside Nanitor, the drop traced to the same source on every affected host: a new bind9 security release was missing across most of our Debian fleet. To make the forensics concrete, the rest of this section walks through one of those hosts: backup-host, one of our backup hosts running Debian 13. At 20:40 on May 21st, three Patch status changed events fired against it:

backup-host (Debian 13)
  bind9-libs       1:9.20.23-1~deb13u1   missing   (High)
  bind9-dnsutils   1:9.20.23-1~deb13u1   missing   (High)
  bind9-host       1:9.20.23-1~deb13u1   missing   (High)

The host was on 1:9.20.21. Nanitor was telling us the 9.20.23 security release for bind9 was out, and our backup host had not received it yet. These packages are the bind9 client tools (dig, host, dnsutils) and shared libraries, not the bind9 server itself; none of our hosts run a bind9 server.

That release closed six CVEs:

So the dip was real, traceable, and pointed at a known fix.

The fix

Debian's unattended-upgrades runs on a systemd timer that fires around 06:20 each morning. At 06:40 on May 22nd, Nanitor recorded the audit events:

2026-05-22 06:40   bind9-libs       1:9.20.21-1~deb13u1 -> 1:9.20.23-1~deb13u1
2026-05-22 06:40   bind9-host       1:9.20.21-1~deb13u1 -> 1:9.20.23-1~deb13u1
2026-05-22 06:40   bind9-dnsutils   1:9.20.21-1~deb13u1 -> 1:9.20.23-1~deb13u1

The same scan flipped all six CVEs to no longer vulnerable for that asset, and the same happened on every other affected host across the estate as their respective timers ran. The patch-missing flags persisted in Nanitor a little longer because the patch-state scan runs less often than the audit log; by 20:40 on May 22nd they were cleared too. The Friday report still showed 94% because the score lags slightly; by Monday the rest of the estate had caught up and the number was back at 99%.

Timeline of the incident: May 20 health 99.23% normal; May 21 at 20:40 Nanitor detects 3 bind9 patches missing on the affected host and the score drops to 85.87%; May 22 at 06:40 unattended-upgrades applies the fix and 6 CVEs are cleared, score recovers to 94.09%; May 25 the estate is fully recovered at 99.21%.

The shape of it

Two systems handled the entire incident without intervention. They are deliberately separate:

  1. Nanitor (CTEM) is the auditor. It had the new patch identified the evening before any package manager touched the host, and would have raised the same flag indefinitely if no patcher had ever run.
  2. unattended-upgrades is the patcher. It closed the gap automatically the next morning on its predictable schedule, and has no opinion on whether the patch was needed.

Neither half can certify itself. The daily Slack report exposed the gap as a number the whole team could see, which is what made it visible to humans rather than just to the dashboard. From detection to applied fix: about ten hours. From detection to Nanitor confirming the host clear: roughly twenty-four hours. Fully automated end to end.

Operational takeaway When detection (Nanitor) and remediation (unattended-upgrades) are independent and both audited, the dip in the HealthScore is the entire incident report. Nothing more is needed for routine patches.

When the auditor disagrees with the patcher

The pattern Nanitor catches most often at customer sites is the inverse of the bind9 story above. The patcher is configured. The timer is firing. The operator believes the estate is current. Nanitor reports otherwise. The natural first instinct is to question the auditor's signal. The forensics resolve it: a misconfigured source list, a held package, a stale apt cache, an unattended-upgrades unit that fails silently on a specific repository. The patcher was running. It was simply not applying what the operator thought it was applying.

That is the case an internal auditor is actually for. When detection and remediation share a process, both report success together. When they are separate, the auditor can disagree with the patcher, and a disagreement is information. Nanitor's findings name the exact package, version, and host where the gap sits. The bind9 incident above is the easy case, where the patcher and the auditor agreed twenty-four hours apart. The harder, more useful cases are the ones where they do not.

Auditor's philosophy The truth is always best told. That is the auditor's job: name what is true on the host, even when the patcher and the operator both believe everything is fine.

Risk, the ISMS view

The HealthScore dip is the visible side of an ISO 27001 risk loop. Two controls cover this case directly:

  • A.5.7 · Threat intelligence. Information about information security threats is collected and analysed. Our CTEM platform is the operational form of this control: a daily-refreshed feed that names which packages on which hosts are exposed to which CVEs.
  • A.8.8 · Management of technical vulnerabilities. Information about technical vulnerabilities is obtained, exposure evaluated, and appropriate measures taken. unattended-upgrades is the measure; the daily Nanitor HealthScore is the evaluation.

In risk-assessment terms the same incident reads:

  • Likelihood before patch: a new bind9 security release with multiple high-severity CVEs against the bind9 codebase, landing on hosts that carry the client tools and shared libraries.
  • Impact if exploited: low in this case. The hosts carry the bind9 client tools and shared libraries, not a listening DNS service, so exploitation would need an attacker to push malicious input through the local tools or library callers, a much narrower path than a public bind9 server.
  • Treatment: automated patching on a 24-hour timer; CTEM as the detection layer; Ansible for any case where 24 hours is too slow.
  • Residual risk: accepted, with the daily metric as the ongoing check.

Nanitor flags this at the package level, which is the right layer for fleet-scale vulnerability management. Severity belongs to the package; impact belongs to the host. The auditor names the gap in terms anyone can verify (package, version, host); the ISMS decides what that gap means given the role the host plays. The same workflow has to remain valid when the package set or the host role makes the impact higher, which is the point of keeping the two assessments separate.

The HealthScore moving from 99.23% to 85.87% and back is the risk register, in motion. Not a paragraph in a binder. A number on the morning report.

Operational takeaway Package-level detection is the right layer for fleet-scale vulnerability management. It scales, it is deterministic, and it produces an audit trail that survives external review. Runtime detection is a different question, answered by other tools. The two layers are complementary, not substitutes.

What if it had been worse

The case above was low-impact in itself, so the timer was the right floor for it. The point of writing it up is that the same playbook has to hold up when it is not. unattended-upgrades is the floor. Twenty-four hours is fine for the vast majority of Debian security updates. It is not fine if a CVE is actively exploited, has a public proof-of-concept, or sits on an exposed network path.

In that case the timeline above is too slow. Detection stays the same (CTEM is already finding it), but the fix needs to come from us, not the timer. The escalation path is short:

- name: Force-upgrade bind9 on the affected host
  hosts: backup-host
  become: yes
  tasks:
    - name: Upgrade bind9 packages to latest
      ansible.builtin.apt:
        name: [bind9-libs, bind9-host, bind9-dnsutils]
        state: latest
        update_cache: yes

A single playbook against the asset (or the affected group) closes the gap in minutes instead of hours, and the next Nanitor scan confirms it. The same play scales: if a critical patch lands on a quiet Tuesday afternoon, we do not wait for the next morning's timer.

Operational takeaway Automation is the floor. The ability to step in is the ceiling. Build the floor first; keep the ceiling reachable.

Closing

The interesting part of this incident is not the bind9 patch. It is that nobody had to touch a host, and that Nanitor would have flagged the same gap indefinitely if the timer had quietly stopped running. Nanitor saw the gap, unattended-upgrades closed it, Nanitor confirmed the close, the Slack report told the story in a single line, and the HealthScore climbed back to 99% on its own.

Routine, audited, and predictable. That is the goal.