diff --git a/README.md b/README.md index 71c9394..984d055 100644 --- a/README.md +++ b/README.md @@ -2,160 +2,58 @@ ## Problem Summary -Home Assistant at `ha.hideawaygaming.com.au` (HAOS 2026.5.3) periodically becomes unresponsive. Because critical infrastructure services (AdGuard DNS, Tailscale VPN, Guacamole RDP) all run as HA add-ons inside the same VM, any HA freeze causes house-wide network and access failures. +Home Assistant at `ha.hideawaygaming.com.au` (HAOS 2026.5.3) periodically becomes unresponsive. Because critical infrastructure services (AdGuard DNS, Tailscale VPN, Guacamole RDP, Nginx Proxy Manager) all run as HA add-ons inside the same VM, any HA freeze causes house-wide network and access failures. -### Root Causes Identified +## Network Plan -| Issue | Impact | -|-------|--------| -| Memory at 87% (4.45 GB) | VM swaps under load → unresponsive | -| 2,330 entities, 775 unavailable (33%) | Wasted memory and CPU tracking stale entities | -| ~1,007 state changes/hour (16.8/min) | Recorder DB I/O bottleneck | -| browser_mod: 228 entities (200 stale) | Biggest source of entity bloat | -| iCloud3: 1,000+ state changes/4hr | Aggressive polling floods state machine | -| Frigate occupancy flapping ~97x/hr | Detection zones too sensitive | -| 3 time sensors × 60 changes/hr = 720/hr | Pointless recorder writes | -| Guacamole using 25% CPU / 9% RAM | Heavy add-on consuming HA resources | -| AdGuard (network DNS) inside HA | Single point of failure | +| Service | CT ID | IP | Port(s) | +|---------|-------|----|---------| +| OPNsense (gateway) | — | 10.0.0.254 | — | +| Proxmox (HAL-HOST) | — | 10.0.0.240 | 8006 | +| HAOS VM | — | 10.0.0.55 | 8123 | +| AdGuard Home (LXC) | 120 | 10.0.0.224 | 53, 80 | +| Guacamole (LXC) | 121 | 10.0.0.225 | 8080 | +| NPM (LXC) | 122 | 10.0.0.226 | 80, 443, 81 | ---- +## Execution Order -## Fix Plan (Priority Order) +Run the scripts on the Proxmox host (10.0.0.240) as root. -### Phase 1: Immediate — Recorder Exclude (10 minutes) - -Apply `recorder_exclude.yaml` to stop recording high-churn, low-value entities. - -**Steps:** - -1. SSH into HAOS or use the File Editor add-on -2. Open `/config/configuration.yaml` -3. If you already have a `recorder:` section, merge the excludes from `recorder_exclude.yaml` into it -4. If you don't have one, copy the entire contents of `recorder_exclude.yaml` into `configuration.yaml` -5. Restart HA: Settings → System → Restart - -**Expected impact:** ~2,500 fewer state changes recorded per hour, significant reduction in disk I/O and memory usage. - -### Phase 2: Immediate — Entity Cleanup (20 minutes) - -**browser_mod stale sessions:** -1. Go to Developer Tools → Services -2. For each stale browser_mod entity, call the service to unregister it -3. Alternatively: Settings → Devices & Services → browser_mod → Remove stale device entries -4. Target: reduce from 228 to ~20-30 active entities - -**Plex media player cleanup:** -1. Settings → Devices & Services → Plex -2. Click through each device — delete any showing as "Unavailable" -3. Target: reduce from 59 to ~5-10 active clients - -**Pioneer VSX-832 duplicates:** -1. Settings → Devices & Services → Onkyo -2. You should see multiple "Pioneer VSX-832" devices -3. Keep only the working one (likely the one showing state "off" or "on") -4. Delete the rest (showing "unavailable") -5. Target: reduce from 7 to 1-2 entities - -**F1 Sensor (off-season):** -1. Settings → Devices & Services → F1 Sensor -2. Consider disabling the integration during off-season -3. Or leave it — the recorder exclude will prevent it writing history -4. 76 entities, 42 currently unavailable - -### Phase 3: Tune Noisy Integrations (15 minutes) - -**iCloud3 — reduce polling frequency:** -1. iCloud3 config (via HA integrations or config file) -2. Increase `inzone_interval` from default to 30-60 minutes -3. Increase general polling interval -4. This alone cuts ~1,000 state changes per 4 hours - -**Frigate — fix driveway zone flapping:** -1. In your Frigate config, for the driveway camera zones: - - Increase `min_area` on car detection (currently triggering on shadows/reflections) - - Add `inactivity_timeout: 30` to prevent rapid on/off toggling - - Consider disabling `cat` and `dog` detection on the driveway if not needed -2. The driveway_car_occupancy and driveway_pavement_car_occupancy are toggling ~97x/hour each - -**Time sensors — remove duplicates:** -1. If `sensor.time`, `sensor.time_2`, and `sensor.date_time` are defined in `configuration.yaml` under `sensor:` → `platform: time_date`, remove the duplicates -2. Keep only one if needed for automations, or rely on HA's built-in `now()` in templates instead - -**UpdatePowerUsageFast automation:** -1. Settings → Automations → UpdatePowerUsageFast -2. Change the time pattern trigger from every 1 minute to every 5 minutes -3. Cuts 192 automation runs per hour - -### Phase 4: Increase VM Memory (5 minutes) - -On the Proxmox host: -1. Shut down the HAOS VM (or hot-plug if supported) -2. Increase RAM from current allocation to **8 GB** -3. HAL-HOST has 134 GB total with 78% used — there's headroom -4. Start the VM - -### Phase 5: Migrate AdGuard to LXC (30 minutes) - -**This is the most important architectural change.** Network DNS must not depend on HA stability. - -See `setup-adguard-lxc.sh` — run on the Proxmox host. +### 1. Apply recorder exclude (HA side) +Merge `recorder_exclude.yaml` into `/config/configuration.yaml`, restart HA. +### 2. Deploy AdGuard LXC ```bash -# Copy to Proxmox host -scp setup-adguard-lxc.sh root@10.0.0.x:/root/ - -# Run it (default CT ID 120, or pass custom) -ssh root@10.0.0.x -chmod +x /root/setup-adguard-lxc.sh -/root/setup-adguard-lxc.sh 120 +chmod +x setup-adguard-lxc.sh +./setup-adguard-lxc.sh ``` +- The script attempts SSH config migration from HAOS (no GUI export exists) +- If SSH fails, follow the manual migration steps printed at the end +- After setup: update OPNsense DHCP DNS from 10.0.0.55 → 10.0.0.224 -**Post-setup migration:** -1. Access new AdGuard at `http://10.0.0.53:80` -2. Complete the setup wizard -3. Export config from HA's AdGuard add-on web UI and import to new instance -4. Migrate filter lists, client settings, parental controls, DNS rewrites -5. Test: `nslookup google.com 10.0.0.53` -6. Update OPNsense DHCP: change DNS from `10.0.0.55` to `10.0.0.53` -7. Wait 24 hours, confirm stability -8. Stop HA AdGuard add-on -9. Optionally re-add HA AdGuard integration pointing to `10.0.0.53` for dashboard stats - -**NPM reverse proxy (optional):** -- Add proxy host in NPM (10.0.0.54): - - Domain: `adguard.hideawaygaming.com.au` - - Forward: `http://10.0.0.53:80` - - SSL via Let's Encrypt - -### Phase 6: Migrate Guacamole to LXC (30 minutes) - -See `setup-guacamole-lxc.sh` — run on the Proxmox host. - +### 3. Deploy NPM LXC ```bash -scp setup-guacamole-lxc.sh root@10.0.0.x:/root/ -ssh root@10.0.0.x -chmod +x /root/setup-guacamole-lxc.sh -/root/setup-guacamole-lxc.sh 121 +chmod +x setup-npm-lxc.sh +./setup-npm-lxc.sh ``` +- Migrates SQLite DB, Let's Encrypt certs, and custom configs from HA addon +- After setup: update OPNsense port forwards (80/443) from 10.0.0.55 → 10.0.0.226 -**Post-setup migration:** -1. Access new Guacamole at `http://10.0.0.52:8080/guacamole/` -2. Login with `guacadmin` / `guacadmin` — **change password immediately** -3. Re-create your RDP connections (hostname, port 3389, credentials) -4. Re-create any user accounts -5. Set up NPM reverse proxy with WebSocket support -6. Test all RDP connections -7. Stop HA Guacamole add-on +### 4. Deploy Guacamole LXC +```bash +chmod +x setup-guacamole-lxc.sh +./setup-guacamole-lxc.sh +``` +- Re-create RDP connections in the web UI +- Set up NPM proxy with WebSocket support -**NPM reverse proxy:** -- Domain: `guac.hideawaygaming.com.au` -- Forward: `http://10.0.0.52:8080` -- Custom location: `/guacamole/` -- **Enable WebSocket support** (critical for RDP streaming) +### 5. Cleanup HA +- Stop AdGuard, NPM, and Guacamole add-ons in HA +- Clean up browser_mod, Plex, Pioneer VSX-832 stale entities +- Increase HAOS VM memory to 8 GB +- Optionally re-add AdGuard as HA integration pointing to 10.0.0.224 ---- - -## Network Architecture After Migration +## Architecture After Migration ``` Internet @@ -165,49 +63,27 @@ chmod +x /root/setup-guacamole-lxc.sh │ Gateway │ └────┬────┘ │ - ┌──────────────┼──────────────┐ - │ │ │ - ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ - │ AdGuard │ │ NPM │ │ HAOS │ - │ (LXC) │ │ (LXC) │ │ (VM) │ - │ .0.53 │ │ .0.54 │ │ .0.55 │ - │ DNS 53 │ │ HTTP/S │ │ HA only │ - └─────────┘ └─────────┘ └────┬────┘ - │ - ┌──────────────┬─────────────┘ - │ │ - ┌────┴────┐ ┌────┴────┐ - │ Guac │ │Tailscale│ - │ (LXC) │ │(remains │ - │ .0.52 │ │ in HA) │ - │ RDP GW │ └─────────┘ - └─────────┘ + ┌───────────┬───────┴───────┬───────────┐ + │ │ │ │ +┌──┴──┐ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐ +│ AGH │ │ NPM │ │ HAOS │ │ Guac │ +│ LXC │ │ LXC │ │ VM │ │ LXC │ +│.224 │ │ .226 │ │ .55 │ │ .225 │ +│DNS │ │ HTTP/S│ │HA only│ │ RDP │ +└─────┘ └───────┘ └───┬───┘ └───────┘ + │ + ┌────┴────┐ + │Tailscale│ + │(in HA) │ + └─────────┘ ``` -Tailscale stays in HA since it's lightweight and tightly integrated with HA's remote access. AdGuard and Guacamole are now independent — HA can restart without taking down DNS or RDP access. - ---- - -## Expected Results - -| Metric | Before | After | -|--------|--------|-------| -| HA Memory | 87% (4.45 GB) | ~50-60% (with 8 GB allocated) | -| Entities | 2,330 (775 unavailable) | ~1,800 (fewer stale) | -| State changes/hr | ~1,007 | ~300-400 | -| Recorder writes/hr | ~1,007 | ~200-300 (excludes applied) | -| DNS failure on HA crash | Yes | No (independent LXC) | -| RDP failure on HA crash | Yes | No (independent LXC) | -| Guacamole CPU in HA | 25% | 0% (moved out) | -| Guacamole RAM in HA | 9% | 0% (moved out) | - ---- - -## Files in This Repository +## Files | File | Purpose | |------|---------| | `recorder_exclude.yaml` | Recorder exclude config — merge into `configuration.yaml` | -| `setup-adguard-lxc.sh` | Proxmox script to create AdGuard Home LXC | -| `setup-guacamole-lxc.sh` | Proxmox script to create Guacamole LXC | +| `setup-adguard-lxc.sh` | CT 120 — AdGuard Home with SSH config migration | +| `setup-guacamole-lxc.sh` | CT 121 — Guacamole via Docker Compose | +| `setup-npm-lxc.sh` | CT 122 — NPM with DB/cert migration from HA addon | | `README.md` | This file |