docs: update all IPs, add NPM, simplify README
This commit is contained in:
@@ -2,160 +2,58 @@
|
||||
|
||||
## Problem Summary
|
||||
|
||||
Home Assistant at `ha.hideawaygaming.com.au` (HAOS 2026.5.3) periodically becomes unresponsive. Because critical infrastructure services (AdGuard DNS, Tailscale VPN, Guacamole RDP) all run as HA add-ons inside the same VM, any HA freeze causes house-wide network and access failures.
|
||||
Home Assistant at `ha.hideawaygaming.com.au` (HAOS 2026.5.3) periodically becomes unresponsive. Because critical infrastructure services (AdGuard DNS, Tailscale VPN, Guacamole RDP, Nginx Proxy Manager) all run as HA add-ons inside the same VM, any HA freeze causes house-wide network and access failures.
|
||||
|
||||
### Root Causes Identified
|
||||
## Network Plan
|
||||
|
||||
| Issue | Impact |
|
||||
|-------|--------|
|
||||
| Memory at 87% (4.45 GB) | VM swaps under load → unresponsive |
|
||||
| 2,330 entities, 775 unavailable (33%) | Wasted memory and CPU tracking stale entities |
|
||||
| ~1,007 state changes/hour (16.8/min) | Recorder DB I/O bottleneck |
|
||||
| browser_mod: 228 entities (200 stale) | Biggest source of entity bloat |
|
||||
| iCloud3: 1,000+ state changes/4hr | Aggressive polling floods state machine |
|
||||
| Frigate occupancy flapping ~97x/hr | Detection zones too sensitive |
|
||||
| 3 time sensors × 60 changes/hr = 720/hr | Pointless recorder writes |
|
||||
| Guacamole using 25% CPU / 9% RAM | Heavy add-on consuming HA resources |
|
||||
| AdGuard (network DNS) inside HA | Single point of failure |
|
||||
| Service | CT ID | IP | Port(s) |
|
||||
|---------|-------|----|---------|
|
||||
| OPNsense (gateway) | — | 10.0.0.254 | — |
|
||||
| Proxmox (HAL-HOST) | — | 10.0.0.240 | 8006 |
|
||||
| HAOS VM | — | 10.0.0.55 | 8123 |
|
||||
| AdGuard Home (LXC) | 120 | 10.0.0.224 | 53, 80 |
|
||||
| Guacamole (LXC) | 121 | 10.0.0.225 | 8080 |
|
||||
| NPM (LXC) | 122 | 10.0.0.226 | 80, 443, 81 |
|
||||
|
||||
---
|
||||
## Execution Order
|
||||
|
||||
## Fix Plan (Priority Order)
|
||||
Run the scripts on the Proxmox host (10.0.0.240) as root.
|
||||
|
||||
### Phase 1: Immediate — Recorder Exclude (10 minutes)
|
||||
|
||||
Apply `recorder_exclude.yaml` to stop recording high-churn, low-value entities.
|
||||
|
||||
**Steps:**
|
||||
|
||||
1. SSH into HAOS or use the File Editor add-on
|
||||
2. Open `/config/configuration.yaml`
|
||||
3. If you already have a `recorder:` section, merge the excludes from `recorder_exclude.yaml` into it
|
||||
4. If you don't have one, copy the entire contents of `recorder_exclude.yaml` into `configuration.yaml`
|
||||
5. Restart HA: Settings → System → Restart
|
||||
|
||||
**Expected impact:** ~2,500 fewer state changes recorded per hour, significant reduction in disk I/O and memory usage.
|
||||
|
||||
### Phase 2: Immediate — Entity Cleanup (20 minutes)
|
||||
|
||||
**browser_mod stale sessions:**
|
||||
1. Go to Developer Tools → Services
|
||||
2. For each stale browser_mod entity, call the service to unregister it
|
||||
3. Alternatively: Settings → Devices & Services → browser_mod → Remove stale device entries
|
||||
4. Target: reduce from 228 to ~20-30 active entities
|
||||
|
||||
**Plex media player cleanup:**
|
||||
1. Settings → Devices & Services → Plex
|
||||
2. Click through each device — delete any showing as "Unavailable"
|
||||
3. Target: reduce from 59 to ~5-10 active clients
|
||||
|
||||
**Pioneer VSX-832 duplicates:**
|
||||
1. Settings → Devices & Services → Onkyo
|
||||
2. You should see multiple "Pioneer VSX-832" devices
|
||||
3. Keep only the working one (likely the one showing state "off" or "on")
|
||||
4. Delete the rest (showing "unavailable")
|
||||
5. Target: reduce from 7 to 1-2 entities
|
||||
|
||||
**F1 Sensor (off-season):**
|
||||
1. Settings → Devices & Services → F1 Sensor
|
||||
2. Consider disabling the integration during off-season
|
||||
3. Or leave it — the recorder exclude will prevent it writing history
|
||||
4. 76 entities, 42 currently unavailable
|
||||
|
||||
### Phase 3: Tune Noisy Integrations (15 minutes)
|
||||
|
||||
**iCloud3 — reduce polling frequency:**
|
||||
1. iCloud3 config (via HA integrations or config file)
|
||||
2. Increase `inzone_interval` from default to 30-60 minutes
|
||||
3. Increase general polling interval
|
||||
4. This alone cuts ~1,000 state changes per 4 hours
|
||||
|
||||
**Frigate — fix driveway zone flapping:**
|
||||
1. In your Frigate config, for the driveway camera zones:
|
||||
- Increase `min_area` on car detection (currently triggering on shadows/reflections)
|
||||
- Add `inactivity_timeout: 30` to prevent rapid on/off toggling
|
||||
- Consider disabling `cat` and `dog` detection on the driveway if not needed
|
||||
2. The driveway_car_occupancy and driveway_pavement_car_occupancy are toggling ~97x/hour each
|
||||
|
||||
**Time sensors — remove duplicates:**
|
||||
1. If `sensor.time`, `sensor.time_2`, and `sensor.date_time` are defined in `configuration.yaml` under `sensor:` → `platform: time_date`, remove the duplicates
|
||||
2. Keep only one if needed for automations, or rely on HA's built-in `now()` in templates instead
|
||||
|
||||
**UpdatePowerUsageFast automation:**
|
||||
1. Settings → Automations → UpdatePowerUsageFast
|
||||
2. Change the time pattern trigger from every 1 minute to every 5 minutes
|
||||
3. Cuts 192 automation runs per hour
|
||||
|
||||
### Phase 4: Increase VM Memory (5 minutes)
|
||||
|
||||
On the Proxmox host:
|
||||
1. Shut down the HAOS VM (or hot-plug if supported)
|
||||
2. Increase RAM from current allocation to **8 GB**
|
||||
3. HAL-HOST has 134 GB total with 78% used — there's headroom
|
||||
4. Start the VM
|
||||
|
||||
### Phase 5: Migrate AdGuard to LXC (30 minutes)
|
||||
|
||||
**This is the most important architectural change.** Network DNS must not depend on HA stability.
|
||||
|
||||
See `setup-adguard-lxc.sh` — run on the Proxmox host.
|
||||
### 1. Apply recorder exclude (HA side)
|
||||
Merge `recorder_exclude.yaml` into `/config/configuration.yaml`, restart HA.
|
||||
|
||||
### 2. Deploy AdGuard LXC
|
||||
```bash
|
||||
# Copy to Proxmox host
|
||||
scp setup-adguard-lxc.sh root@10.0.0.x:/root/
|
||||
|
||||
# Run it (default CT ID 120, or pass custom)
|
||||
ssh root@10.0.0.x
|
||||
chmod +x /root/setup-adguard-lxc.sh
|
||||
/root/setup-adguard-lxc.sh 120
|
||||
chmod +x setup-adguard-lxc.sh
|
||||
./setup-adguard-lxc.sh
|
||||
```
|
||||
- The script attempts SSH config migration from HAOS (no GUI export exists)
|
||||
- If SSH fails, follow the manual migration steps printed at the end
|
||||
- After setup: update OPNsense DHCP DNS from 10.0.0.55 → 10.0.0.224
|
||||
|
||||
**Post-setup migration:**
|
||||
1. Access new AdGuard at `http://10.0.0.53:80`
|
||||
2. Complete the setup wizard
|
||||
3. Export config from HA's AdGuard add-on web UI and import to new instance
|
||||
4. Migrate filter lists, client settings, parental controls, DNS rewrites
|
||||
5. Test: `nslookup google.com 10.0.0.53`
|
||||
6. Update OPNsense DHCP: change DNS from `10.0.0.55` to `10.0.0.53`
|
||||
7. Wait 24 hours, confirm stability
|
||||
8. Stop HA AdGuard add-on
|
||||
9. Optionally re-add HA AdGuard integration pointing to `10.0.0.53` for dashboard stats
|
||||
|
||||
**NPM reverse proxy (optional):**
|
||||
- Add proxy host in NPM (10.0.0.54):
|
||||
- Domain: `adguard.hideawaygaming.com.au`
|
||||
- Forward: `http://10.0.0.53:80`
|
||||
- SSL via Let's Encrypt
|
||||
|
||||
### Phase 6: Migrate Guacamole to LXC (30 minutes)
|
||||
|
||||
See `setup-guacamole-lxc.sh` — run on the Proxmox host.
|
||||
|
||||
### 3. Deploy NPM LXC
|
||||
```bash
|
||||
scp setup-guacamole-lxc.sh root@10.0.0.x:/root/
|
||||
ssh root@10.0.0.x
|
||||
chmod +x /root/setup-guacamole-lxc.sh
|
||||
/root/setup-guacamole-lxc.sh 121
|
||||
chmod +x setup-npm-lxc.sh
|
||||
./setup-npm-lxc.sh
|
||||
```
|
||||
- Migrates SQLite DB, Let's Encrypt certs, and custom configs from HA addon
|
||||
- After setup: update OPNsense port forwards (80/443) from 10.0.0.55 → 10.0.0.226
|
||||
|
||||
**Post-setup migration:**
|
||||
1. Access new Guacamole at `http://10.0.0.52:8080/guacamole/`
|
||||
2. Login with `guacadmin` / `guacadmin` — **change password immediately**
|
||||
3. Re-create your RDP connections (hostname, port 3389, credentials)
|
||||
4. Re-create any user accounts
|
||||
5. Set up NPM reverse proxy with WebSocket support
|
||||
6. Test all RDP connections
|
||||
7. Stop HA Guacamole add-on
|
||||
### 4. Deploy Guacamole LXC
|
||||
```bash
|
||||
chmod +x setup-guacamole-lxc.sh
|
||||
./setup-guacamole-lxc.sh
|
||||
```
|
||||
- Re-create RDP connections in the web UI
|
||||
- Set up NPM proxy with WebSocket support
|
||||
|
||||
**NPM reverse proxy:**
|
||||
- Domain: `guac.hideawaygaming.com.au`
|
||||
- Forward: `http://10.0.0.52:8080`
|
||||
- Custom location: `/guacamole/`
|
||||
- **Enable WebSocket support** (critical for RDP streaming)
|
||||
### 5. Cleanup HA
|
||||
- Stop AdGuard, NPM, and Guacamole add-ons in HA
|
||||
- Clean up browser_mod, Plex, Pioneer VSX-832 stale entities
|
||||
- Increase HAOS VM memory to 8 GB
|
||||
- Optionally re-add AdGuard as HA integration pointing to 10.0.0.224
|
||||
|
||||
---
|
||||
|
||||
## Network Architecture After Migration
|
||||
## Architecture After Migration
|
||||
|
||||
```
|
||||
Internet
|
||||
@@ -165,49 +63,27 @@ chmod +x /root/setup-guacamole-lxc.sh
|
||||
│ Gateway │
|
||||
└────┬────┘
|
||||
│
|
||||
┌──────────────┼──────────────┐
|
||||
│ │ │
|
||||
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
|
||||
│ AdGuard │ │ NPM │ │ HAOS │
|
||||
│ (LXC) │ │ (LXC) │ │ (VM) │
|
||||
│ .0.53 │ │ .0.54 │ │ .0.55 │
|
||||
│ DNS 53 │ │ HTTP/S │ │ HA only │
|
||||
└─────────┘ └─────────┘ └────┬────┘
|
||||
│
|
||||
┌──────────────┬─────────────┘
|
||||
│ │
|
||||
┌────┴────┐ ┌────┴────┐
|
||||
│ Guac │ │Tailscale│
|
||||
│ (LXC) │ │(remains │
|
||||
│ .0.52 │ │ in HA) │
|
||||
│ RDP GW │ └─────────┘
|
||||
└─────────┘
|
||||
┌───────────┬───────┴───────┬───────────┐
|
||||
│ │ │ │
|
||||
┌──┴──┐ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐
|
||||
│ AGH │ │ NPM │ │ HAOS │ │ Guac │
|
||||
│ LXC │ │ LXC │ │ VM │ │ LXC │
|
||||
│.224 │ │ .226 │ │ .55 │ │ .225 │
|
||||
│DNS │ │ HTTP/S│ │HA only│ │ RDP │
|
||||
└─────┘ └───────┘ └───┬───┘ └───────┘
|
||||
│
|
||||
┌────┴────┐
|
||||
│Tailscale│
|
||||
│(in HA) │
|
||||
└─────────┘
|
||||
```
|
||||
|
||||
Tailscale stays in HA since it's lightweight and tightly integrated with HA's remote access. AdGuard and Guacamole are now independent — HA can restart without taking down DNS or RDP access.
|
||||
|
||||
---
|
||||
|
||||
## Expected Results
|
||||
|
||||
| Metric | Before | After |
|
||||
|--------|--------|-------|
|
||||
| HA Memory | 87% (4.45 GB) | ~50-60% (with 8 GB allocated) |
|
||||
| Entities | 2,330 (775 unavailable) | ~1,800 (fewer stale) |
|
||||
| State changes/hr | ~1,007 | ~300-400 |
|
||||
| Recorder writes/hr | ~1,007 | ~200-300 (excludes applied) |
|
||||
| DNS failure on HA crash | Yes | No (independent LXC) |
|
||||
| RDP failure on HA crash | Yes | No (independent LXC) |
|
||||
| Guacamole CPU in HA | 25% | 0% (moved out) |
|
||||
| Guacamole RAM in HA | 9% | 0% (moved out) |
|
||||
|
||||
---
|
||||
|
||||
## Files in This Repository
|
||||
## Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `recorder_exclude.yaml` | Recorder exclude config — merge into `configuration.yaml` |
|
||||
| `setup-adguard-lxc.sh` | Proxmox script to create AdGuard Home LXC |
|
||||
| `setup-guacamole-lxc.sh` | Proxmox script to create Guacamole LXC |
|
||||
| `setup-adguard-lxc.sh` | CT 120 — AdGuard Home with SSH config migration |
|
||||
| `setup-guacamole-lxc.sh` | CT 121 — Guacamole via Docker Compose |
|
||||
| `setup-npm-lxc.sh` | CT 122 — NPM with DB/cert migration from HA addon |
|
||||
| `README.md` | This file |
|
||||
|
||||
Reference in New Issue
Block a user