From e9e09e8821f79049f5cc8b826ec936431f8f0de5 Mon Sep 17 00:00:00 2001 From: jessikitty Date: Wed, 27 May 2026 10:56:00 +1000 Subject: [PATCH] docs: comprehensive README with diagnostics, fix plan, and migration guide --- README.md | 214 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 212 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1e3742c..71c9394 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,213 @@ -# ha-performance-fix +# Home Assistant Performance Fix & Infrastructure Migration -Home Assistant performance diagnostics, recorder exclude config, and LXC migration scripts for AdGuard and Guacamole \ No newline at end of file +## Problem Summary + +Home Assistant at `ha.hideawaygaming.com.au` (HAOS 2026.5.3) periodically becomes unresponsive. Because critical infrastructure services (AdGuard DNS, Tailscale VPN, Guacamole RDP) all run as HA add-ons inside the same VM, any HA freeze causes house-wide network and access failures. + +### Root Causes Identified + +| Issue | Impact | +|-------|--------| +| Memory at 87% (4.45 GB) | VM swaps under load → unresponsive | +| 2,330 entities, 775 unavailable (33%) | Wasted memory and CPU tracking stale entities | +| ~1,007 state changes/hour (16.8/min) | Recorder DB I/O bottleneck | +| browser_mod: 228 entities (200 stale) | Biggest source of entity bloat | +| iCloud3: 1,000+ state changes/4hr | Aggressive polling floods state machine | +| Frigate occupancy flapping ~97x/hr | Detection zones too sensitive | +| 3 time sensors × 60 changes/hr = 720/hr | Pointless recorder writes | +| Guacamole using 25% CPU / 9% RAM | Heavy add-on consuming HA resources | +| AdGuard (network DNS) inside HA | Single point of failure | + +--- + +## Fix Plan (Priority Order) + +### Phase 1: Immediate — Recorder Exclude (10 minutes) + +Apply `recorder_exclude.yaml` to stop recording high-churn, low-value entities. + +**Steps:** + +1. SSH into HAOS or use the File Editor add-on +2. Open `/config/configuration.yaml` +3. If you already have a `recorder:` section, merge the excludes from `recorder_exclude.yaml` into it +4. If you don't have one, copy the entire contents of `recorder_exclude.yaml` into `configuration.yaml` +5. Restart HA: Settings → System → Restart + +**Expected impact:** ~2,500 fewer state changes recorded per hour, significant reduction in disk I/O and memory usage. + +### Phase 2: Immediate — Entity Cleanup (20 minutes) + +**browser_mod stale sessions:** +1. Go to Developer Tools → Services +2. For each stale browser_mod entity, call the service to unregister it +3. Alternatively: Settings → Devices & Services → browser_mod → Remove stale device entries +4. Target: reduce from 228 to ~20-30 active entities + +**Plex media player cleanup:** +1. Settings → Devices & Services → Plex +2. Click through each device — delete any showing as "Unavailable" +3. Target: reduce from 59 to ~5-10 active clients + +**Pioneer VSX-832 duplicates:** +1. Settings → Devices & Services → Onkyo +2. You should see multiple "Pioneer VSX-832" devices +3. Keep only the working one (likely the one showing state "off" or "on") +4. Delete the rest (showing "unavailable") +5. Target: reduce from 7 to 1-2 entities + +**F1 Sensor (off-season):** +1. Settings → Devices & Services → F1 Sensor +2. Consider disabling the integration during off-season +3. Or leave it — the recorder exclude will prevent it writing history +4. 76 entities, 42 currently unavailable + +### Phase 3: Tune Noisy Integrations (15 minutes) + +**iCloud3 — reduce polling frequency:** +1. iCloud3 config (via HA integrations or config file) +2. Increase `inzone_interval` from default to 30-60 minutes +3. Increase general polling interval +4. This alone cuts ~1,000 state changes per 4 hours + +**Frigate — fix driveway zone flapping:** +1. In your Frigate config, for the driveway camera zones: + - Increase `min_area` on car detection (currently triggering on shadows/reflections) + - Add `inactivity_timeout: 30` to prevent rapid on/off toggling + - Consider disabling `cat` and `dog` detection on the driveway if not needed +2. The driveway_car_occupancy and driveway_pavement_car_occupancy are toggling ~97x/hour each + +**Time sensors — remove duplicates:** +1. If `sensor.time`, `sensor.time_2`, and `sensor.date_time` are defined in `configuration.yaml` under `sensor:` → `platform: time_date`, remove the duplicates +2. Keep only one if needed for automations, or rely on HA's built-in `now()` in templates instead + +**UpdatePowerUsageFast automation:** +1. Settings → Automations → UpdatePowerUsageFast +2. Change the time pattern trigger from every 1 minute to every 5 minutes +3. Cuts 192 automation runs per hour + +### Phase 4: Increase VM Memory (5 minutes) + +On the Proxmox host: +1. Shut down the HAOS VM (or hot-plug if supported) +2. Increase RAM from current allocation to **8 GB** +3. HAL-HOST has 134 GB total with 78% used — there's headroom +4. Start the VM + +### Phase 5: Migrate AdGuard to LXC (30 minutes) + +**This is the most important architectural change.** Network DNS must not depend on HA stability. + +See `setup-adguard-lxc.sh` — run on the Proxmox host. + +```bash +# Copy to Proxmox host +scp setup-adguard-lxc.sh root@10.0.0.x:/root/ + +# Run it (default CT ID 120, or pass custom) +ssh root@10.0.0.x +chmod +x /root/setup-adguard-lxc.sh +/root/setup-adguard-lxc.sh 120 +``` + +**Post-setup migration:** +1. Access new AdGuard at `http://10.0.0.53:80` +2. Complete the setup wizard +3. Export config from HA's AdGuard add-on web UI and import to new instance +4. Migrate filter lists, client settings, parental controls, DNS rewrites +5. Test: `nslookup google.com 10.0.0.53` +6. Update OPNsense DHCP: change DNS from `10.0.0.55` to `10.0.0.53` +7. Wait 24 hours, confirm stability +8. Stop HA AdGuard add-on +9. Optionally re-add HA AdGuard integration pointing to `10.0.0.53` for dashboard stats + +**NPM reverse proxy (optional):** +- Add proxy host in NPM (10.0.0.54): + - Domain: `adguard.hideawaygaming.com.au` + - Forward: `http://10.0.0.53:80` + - SSL via Let's Encrypt + +### Phase 6: Migrate Guacamole to LXC (30 minutes) + +See `setup-guacamole-lxc.sh` — run on the Proxmox host. + +```bash +scp setup-guacamole-lxc.sh root@10.0.0.x:/root/ +ssh root@10.0.0.x +chmod +x /root/setup-guacamole-lxc.sh +/root/setup-guacamole-lxc.sh 121 +``` + +**Post-setup migration:** +1. Access new Guacamole at `http://10.0.0.52:8080/guacamole/` +2. Login with `guacadmin` / `guacadmin` — **change password immediately** +3. Re-create your RDP connections (hostname, port 3389, credentials) +4. Re-create any user accounts +5. Set up NPM reverse proxy with WebSocket support +6. Test all RDP connections +7. Stop HA Guacamole add-on + +**NPM reverse proxy:** +- Domain: `guac.hideawaygaming.com.au` +- Forward: `http://10.0.0.52:8080` +- Custom location: `/guacamole/` +- **Enable WebSocket support** (critical for RDP streaming) + +--- + +## Network Architecture After Migration + +``` + Internet + │ + ┌────┴────┐ + │ OPNsense │ 10.0.0.254 + │ Gateway │ + └────┬────┘ + │ + ┌──────────────┼──────────────┐ + │ │ │ + ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ + │ AdGuard │ │ NPM │ │ HAOS │ + │ (LXC) │ │ (LXC) │ │ (VM) │ + │ .0.53 │ │ .0.54 │ │ .0.55 │ + │ DNS 53 │ │ HTTP/S │ │ HA only │ + └─────────┘ └─────────┘ └────┬────┘ + │ + ┌──────────────┬─────────────┘ + │ │ + ┌────┴────┐ ┌────┴────┐ + │ Guac │ │Tailscale│ + │ (LXC) │ │(remains │ + │ .0.52 │ │ in HA) │ + │ RDP GW │ └─────────┘ + └─────────┘ +``` + +Tailscale stays in HA since it's lightweight and tightly integrated with HA's remote access. AdGuard and Guacamole are now independent — HA can restart without taking down DNS or RDP access. + +--- + +## Expected Results + +| Metric | Before | After | +|--------|--------|-------| +| HA Memory | 87% (4.45 GB) | ~50-60% (with 8 GB allocated) | +| Entities | 2,330 (775 unavailable) | ~1,800 (fewer stale) | +| State changes/hr | ~1,007 | ~300-400 | +| Recorder writes/hr | ~1,007 | ~200-300 (excludes applied) | +| DNS failure on HA crash | Yes | No (independent LXC) | +| RDP failure on HA crash | Yes | No (independent LXC) | +| Guacamole CPU in HA | 25% | 0% (moved out) | +| Guacamole RAM in HA | 9% | 0% (moved out) | + +--- + +## Files in This Repository + +| File | Purpose | +|------|---------| +| `recorder_exclude.yaml` | Recorder exclude config — merge into `configuration.yaml` | +| `setup-adguard-lxc.sh` | Proxmox script to create AdGuard Home LXC | +| `setup-guacamole-lxc.sh` | Proxmox script to create Guacamole LXC | +| `README.md` | This file |