You know that feeling when you sit down on a Friday to โfix one small bugโ and wake up on Tuesday with two entire feature epics shipped, a redesigned navigation system, and your agents auto-updating themselves from GitHub releases?
Yeah. That happened.
Buckle up โ this is a big one. โ
๐จ The Great Dark Mode Migration
Letโs start with the one that annoyed me every single day: half the app was still bright white. Login page? Dark. Dashboard? Blinding. It was like walking from a movie theater into a parking lot at noon.
The diagnosis was brutal: 418 hardcoded colors scattered across the entire frontend. Every bg-white, every text-gray-800, every border-gray-200 โ all of them completely ignoring the dark mode class on the root element.
The fix? A surgical strike across 23 files, 149 replacements:
| Before | After |
|---|---|
bg-white |
bg-white dark:bg-zinc-900 |
text-gray-800 |
text-gray-800 dark:text-zinc-100 |
border-gray-200 |
border-gray-200 dark:border-zinc-800 |
bg-blue-100 text-blue-800 |
bg-blue-500/20 text-blue-400 |
That last one is my favorite trick. Instead of having separate light/dark badge colors, translucent backgrounds (bg-*-500/20) with bright text (text-*-400) look great in both modes. Stolen from Discordโs playbook, honestly.
Result: Every. Single. Page. Now dark. Login โ Dashboard โ Nodes โ Jobs โ Vulnerabilities โ Security Center โ Services โ Reports โ Settings โ
No more retina burns at 2 AM. Youโre welcome. ๐
๐ E30: Patch & Update Orchestration
This is the big kahuna. The one feature every IT admin asks about first: โCan it patch my machines?โ
Before v0.6.0, Octofleet could tell you about vulnerabilities. It could even remediate some of them through the remediation engine. But it didnโt have a proper patch management pipeline โ you know, the whole ring-based, staged rollout, compliance-tracking thing that makes SCCM admins feel warm and fuzzy.
Now it does.
What We Built
๐๏ธ Server Side (Phase 1)
Five new database tables form the backbone:
patch_catalogโ Your master list of patches with severity, KB numbers, and affected productspatch_ringsโ Deployment rings (Dev โ Pilot โ Broad โ Critical) with configurable delayspatch_deploymentsโ Scheduled rollouts targeting specific ringspatch_deployment_resultsโ Per-node results trackingpatch_catalog_nodesโ Which nodes need which patches
Twenty API endpoints under /api/v1/patches/ handle everything from catalog management to deployment approval workflows.
The frontend got a proper /patches page with four tabs: Catalog, Rings, Deployments, and Compliance. Thereโs a deployment wizard that walks you through selecting patches, picking a ring, setting a schedule, and kicking it off.
๐ฅ๏ธ Agent Side (Phase 2)
Hereโs where it gets spicy. PatchScanner.cs talks directly to the Windows Update Agent COM API:
var updateSession = new UpdateSession();
var updateSearcher = updateSession.CreateUpdateSearcher();
var searchResult = updateSearcher.Search("IsInstalled=0");
Every 6 hours, the agent scans for missing updates and reports back to the server. No WSUS required. No SCCM. Just the agent talking to Windows Update and telling headquarters whatโs missing.
The scanner categorizes patches by severity (Critical, Important, Moderate) and maps them to KB numbers so the server-side catalog stays in sync.
Is it SCCM? No. Is it enough for 90% of environments under 500 nodes? Absolutely. ๐ช
๐ E31: Configuration Baselines & Drift Management
If E30 is about โare my machines patched?โ, E31 is about โare my machines configured correctly?โ
Think Group Policy auditing, but vendor-agnostic and with actual drift detection.
The Concept
You define a baseline โ a collection of rules that describe how a machine should be configured. Things like:
- ๐ Password policy: minimum 12 characters
- ๐ก๏ธ Windows Firewall: enabled on all profiles
- ๐ซ Guest account: disabled
- ๐ป RDP: Network Level Authentication required
Then you assign that baseline to nodes or groups. Octofleet evaluates the rules against inventory data and tells you whoโs compliant and whoโs drifting.
Phase 1: The Engine
config_baselinesโ Named baselines (e.g., โWindows Server 2022 Hardeningโ)config_baseline_rulesโ Individual rules with expected values and evaluation logicconfig_baseline_assignmentsโ Which nodes/groups get which baselinesconfig_baseline_evaluationsโ Results from the last evaluation runconfig_drift_eventsโ Timeline of when drift was detected (and resolved)
The evaluation engine runs server-side against inventory data that agents already collect. No extra agent configuration needed.
Phase 2: CIS Benchmarks & Auto-Remediation
This is where it gets really cool. We built two CIS benchmark templates out of the box:
Windows Server 2022/2025 L1 (9 rules):
- Password history, max age, min length, complexity
- Account lockout threshold and duration
- Windows Firewall profiles
- Remote Desktop NLA
Windows 11 Enterprise L1 (6 rules):
- UAC enforcement
- Windows Defender real-time protection
- BitLocker drive encryption
- Audit policy configuration
Each rule has an evaluation type (registry, policy, service, feature) and many come with auto-remediation scripts. Click โRemediateโ and Octofleet generates the appropriate PowerShell command:
# For non-Server SKUs (Win10/11):
try {
Install-WindowsFeature Windows-Defender -IncludeManagementTools -ErrorAction Stop
} catch {
Add-WindowsCapability -Online -Name 'Microsoft.Windows.Defender~~~~' -ErrorAction Stop
}
That try/catch fallback? Yeah, thatโs because Install-WindowsFeature only exists on Server editions. Spent a fun hour debugging why remediation worked on HYPERV02 but crashed on desktop machines. The joys of cross-SKU PowerShell. ๐
๐งญ The Mega Dropdown Navigation
The old navbar had seven top-level items and a secondary tab bar on the Security page. It was fine when we had 15 pages. We now have 40+. Something had to give.
Enter the mega dropdown:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Security โ
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Monitoring โ Compliance โ
โ โโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ ๐ Dashboard โ ๐ Baselines โ
โ ๐ Findings โ ๐ CIS Benchmarks โ
โ ๐ก Events โ ๐ Drift Events โ
โ ๐ File Audit โ ๐ก๏ธ Posture โ
โ ๐ง Behavior โ ๐ Remediation โ
โ ๐๏ธ Activity โ โ๏ธ Retention โ
โ ๐ฆ Evidence โ ๐ฏ Policies โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Two columns. Section headers. 14 items organized by function. Persistent context โ the dropdown highlights which section youโre currently in.
The navbar now has six dropdowns: Fleet (emerald), Software (blue), Infra (amber), Security (red, 2-column mega), Ops (cyan), and Admin (purple). Each with its own color identity so you always know where you are.
Goodbye, tab bar. You served us well. ๐
๐ง The Job System Bug That Wasnโt a Bug
This oneโs a good story.
Users reported that jobs showed โdeviceโ instead of the actual hostname in the jobs list. Also, some jobs were stuck at 0/0 instances. Classic.
Round 1: I added a LEFT JOIN nodes to resolve hostnames. Committed. Deployed. Still showing โdevice.โ
Round 2: Turns out FastAPI has a last-registered-wins behavior for routes. The routers/jobs.py file was overriding the endpoint in main.py. My fix was in main.py. The wrong file. ๐คฆ
Round 3: Fixed routers/jobs.py with the proper JOIN. Now hostnames show up. But wait โ why are jobs stuck at 0/0 instances?
Round 4: The create_job() function in routers/jobs.py was inserting into the jobs tableโฆ but never creating job_instances. Jobs existed in the database, but agents had nothing to pick up. Itโs like placing an order at a restaurant but nobody sending it to the kitchen.
Round 5: Fixed instance creation. Jobs now show instances: 1. But agents still arenโt executing them?!
Round 6: The agent polls as win-baltasa, which resolves to BALTASA. But job_instances.node_id stores UUIDs like d7cc6f42-735a-409c-.... The query was comparing apples to UUIDs. Added a node lookup step that resolves hostname โ UUID before querying.
Six rounds. One โsimpleโ feature. Welcome to distributed systems.
Jobs now show actual hostnames, create proper instances, and agents actually pick them up. What a concept. ๐ช
๐ค Self-Updating Agents
Speaking of agents โ v0.6.0 includes the PatchScanner and PostureCollector, so all nodes needed an update. The auto-updater checks GitHub releases on startup, downloads the ZIP, extracts, and restarts itself.
The catch: When running as a Windows Service, the agent canโt cleanly restart itself (you canโt kill the process thatโs killing the process). In interactive/PowerShell mode? Works beautifully. As a service? The restart script spawns but the timing isโฆ unpredictable.
Current rollout status:
- โ DESKTOP-B4GCTCV โ self-updated to v0.6.0
- โ BALTASA โ manual PowerShell restart โ auto-updated
- โ HYPERV02 โ updated via restart job
- ๐ SCVMM, SQLSERVER1 โ in progress
For the remaining two, a quick Restart-Service OctofleetNodeAgent on the box does the trick. Not elegant, but it works.
๐ฅ๏ธ Hardware Fleet Dashboard
New page alert! /fleet/hardware now shows a fleet-wide hardware overview:
- ๐ฆ Total Storage: 12.44 TB across all nodes
- ๐ฝ Disk Health: 5 healthy, 4 unknown (virtual disks donโt report SMART)
- ๐งฎ CPU Breakdown: AMD Ryzen 9800X3D, Intel i7-13700, i9-13900K, i7-8550U
- โ ๏ธ Issues: Auto-detects drives above 90% capacity
All data pulled from the hardware_current table that agents populate automatically. Zero configuration required.
๐ฉน The Remediation Engine Saga
The vulnerability remediation pipeline got a lot of love this release. Hereโs a highlight reel of bugs found and squashed:
| Bug | Root Cause | Fix |
|---|---|---|
| Remediation jobs fail to create | node_id inserted as text, column is uuid |
Cast to ::uuid |
| Agent never picks up jobs | Status filter only matched approved, engine creates pending |
IN ('pending', 'approved') |
| Dashboard shows wrong counts | API returned completed, frontend expected success |
Return both key formats |
| SSE stream crashes | json.dumps() on raw UUID objects |
default=str serializer |
| 500 duplicate jobs | Double-click on โRemediateโ button | Cleaned up + added debounce |
winget returns exit code 1 |
โNo upgrade availableโ is technically exit 1 | Treat as success |
After remediation succeeds, vulnerabilities are now automatically marked as fixed in node_vulnerabilities. The vulnerability dashboard filters these out, so your numbers actually go down when you fix things. Revolutionary, I know. ๐
๐ By The Numbers
Since v0.5.6 (5 days ago):
| Metric | Count |
|---|---|
| Commits | 50+ |
| New API endpoints | 40+ |
| New DB tables | 10 |
| Files changed | 100+ |
| New frontend pages | 8 |
| E2E tests passing | 41/44 |
| Dark mode pages | 9/9 โ |
| Bugs squashed | ~20 |
| Coffee consumed | โโโโโ |
Whatโs Next?
The roadmap still has plenty of meat on it:
- E33 โ Software Metering & License Tracking (P2)
- E34 โ Network Discovery & Topology (P2)
- E35 โ Enterprise Reporting Suite (P3)
- Agent registry inventory for CIS registry rule evaluation
- Build agent v0.6.0 binary via CI (currently requires Windows + dotnet publish)
Oh, and main.py is still 11,500 lines. That number only goes in one direction around here, and itโs not the direction Iโd like. But thatโs a problem for future me.
The full release is tagged as v0.6.0 on GitHub. Star the repo if youโre into this kind of thing. Or donโt. Iโm not your dad. ๐
โ Benedikt