Files
ha-performance-fix/README.md
T

8.4 KiB
Raw Blame History

Home Assistant Performance Fix & Infrastructure Migration

Problem Summary

Home Assistant at ha.hideawaygaming.com.au (HAOS 2026.5.3) periodically becomes unresponsive. Because critical infrastructure services (AdGuard DNS, Tailscale VPN, Guacamole RDP) all run as HA add-ons inside the same VM, any HA freeze causes house-wide network and access failures.

Root Causes Identified

Issue Impact
Memory at 87% (4.45 GB) VM swaps under load → unresponsive
2,330 entities, 775 unavailable (33%) Wasted memory and CPU tracking stale entities
~1,007 state changes/hour (16.8/min) Recorder DB I/O bottleneck
browser_mod: 228 entities (200 stale) Biggest source of entity bloat
iCloud3: 1,000+ state changes/4hr Aggressive polling floods state machine
Frigate occupancy flapping ~97x/hr Detection zones too sensitive
3 time sensors × 60 changes/hr = 720/hr Pointless recorder writes
Guacamole using 25% CPU / 9% RAM Heavy add-on consuming HA resources
AdGuard (network DNS) inside HA Single point of failure

Fix Plan (Priority Order)

Phase 1: Immediate — Recorder Exclude (10 minutes)

Apply recorder_exclude.yaml to stop recording high-churn, low-value entities.

Steps:

  1. SSH into HAOS or use the File Editor add-on
  2. Open /config/configuration.yaml
  3. If you already have a recorder: section, merge the excludes from recorder_exclude.yaml into it
  4. If you don't have one, copy the entire contents of recorder_exclude.yaml into configuration.yaml
  5. Restart HA: Settings → System → Restart

Expected impact: ~2,500 fewer state changes recorded per hour, significant reduction in disk I/O and memory usage.

Phase 2: Immediate — Entity Cleanup (20 minutes)

browser_mod stale sessions:

  1. Go to Developer Tools → Services
  2. For each stale browser_mod entity, call the service to unregister it
  3. Alternatively: Settings → Devices & Services → browser_mod → Remove stale device entries
  4. Target: reduce from 228 to ~20-30 active entities

Plex media player cleanup:

  1. Settings → Devices & Services → Plex
  2. Click through each device — delete any showing as "Unavailable"
  3. Target: reduce from 59 to ~5-10 active clients

Pioneer VSX-832 duplicates:

  1. Settings → Devices & Services → Onkyo
  2. You should see multiple "Pioneer VSX-832" devices
  3. Keep only the working one (likely the one showing state "off" or "on")
  4. Delete the rest (showing "unavailable")
  5. Target: reduce from 7 to 1-2 entities

F1 Sensor (off-season):

  1. Settings → Devices & Services → F1 Sensor
  2. Consider disabling the integration during off-season
  3. Or leave it — the recorder exclude will prevent it writing history
  4. 76 entities, 42 currently unavailable

Phase 3: Tune Noisy Integrations (15 minutes)

iCloud3 — reduce polling frequency:

  1. iCloud3 config (via HA integrations or config file)
  2. Increase inzone_interval from default to 30-60 minutes
  3. Increase general polling interval
  4. This alone cuts ~1,000 state changes per 4 hours

Frigate — fix driveway zone flapping:

  1. In your Frigate config, for the driveway camera zones:
    • Increase min_area on car detection (currently triggering on shadows/reflections)
    • Add inactivity_timeout: 30 to prevent rapid on/off toggling
    • Consider disabling cat and dog detection on the driveway if not needed
  2. The driveway_car_occupancy and driveway_pavement_car_occupancy are toggling ~97x/hour each

Time sensors — remove duplicates:

  1. If sensor.time, sensor.time_2, and sensor.date_time are defined in configuration.yaml under sensor:platform: time_date, remove the duplicates
  2. Keep only one if needed for automations, or rely on HA's built-in now() in templates instead

UpdatePowerUsageFast automation:

  1. Settings → Automations → UpdatePowerUsageFast
  2. Change the time pattern trigger from every 1 minute to every 5 minutes
  3. Cuts 192 automation runs per hour

Phase 4: Increase VM Memory (5 minutes)

On the Proxmox host:

  1. Shut down the HAOS VM (or hot-plug if supported)
  2. Increase RAM from current allocation to 8 GB
  3. HAL-HOST has 134 GB total with 78% used — there's headroom
  4. Start the VM

Phase 5: Migrate AdGuard to LXC (30 minutes)

This is the most important architectural change. Network DNS must not depend on HA stability.

See setup-adguard-lxc.sh — run on the Proxmox host.

# Copy to Proxmox host
scp setup-adguard-lxc.sh root@10.0.0.x:/root/

# Run it (default CT ID 120, or pass custom)
ssh root@10.0.0.x
chmod +x /root/setup-adguard-lxc.sh
/root/setup-adguard-lxc.sh 120

Post-setup migration:

  1. Access new AdGuard at http://10.0.0.53:80
  2. Complete the setup wizard
  3. Export config from HA's AdGuard add-on web UI and import to new instance
  4. Migrate filter lists, client settings, parental controls, DNS rewrites
  5. Test: nslookup google.com 10.0.0.53
  6. Update OPNsense DHCP: change DNS from 10.0.0.55 to 10.0.0.53
  7. Wait 24 hours, confirm stability
  8. Stop HA AdGuard add-on
  9. Optionally re-add HA AdGuard integration pointing to 10.0.0.53 for dashboard stats

NPM reverse proxy (optional):

  • Add proxy host in NPM (10.0.0.54):
    • Domain: adguard.hideawaygaming.com.au
    • Forward: http://10.0.0.53:80
    • SSL via Let's Encrypt

Phase 6: Migrate Guacamole to LXC (30 minutes)

See setup-guacamole-lxc.sh — run on the Proxmox host.

scp setup-guacamole-lxc.sh root@10.0.0.x:/root/
ssh root@10.0.0.x
chmod +x /root/setup-guacamole-lxc.sh
/root/setup-guacamole-lxc.sh 121

Post-setup migration:

  1. Access new Guacamole at http://10.0.0.52:8080/guacamole/
  2. Login with guacadmin / guacadminchange password immediately
  3. Re-create your RDP connections (hostname, port 3389, credentials)
  4. Re-create any user accounts
  5. Set up NPM reverse proxy with WebSocket support
  6. Test all RDP connections
  7. Stop HA Guacamole add-on

NPM reverse proxy:

  • Domain: guac.hideawaygaming.com.au
  • Forward: http://10.0.0.52:8080
  • Custom location: /guacamole/
  • Enable WebSocket support (critical for RDP streaming)

Network Architecture After Migration

                    Internet
                       │
                  ┌────┴────┐
                  │ OPNsense │  10.0.0.254
                  │ Gateway  │
                  └────┬────┘
                       │
        ┌──────────────┼──────────────┐
        │              │              │
   ┌────┴────┐   ┌────┴────┐   ┌────┴────┐
   │ AdGuard │   │   NPM   │   │  HAOS   │
   │  (LXC)  │   │  (LXC)  │   │  (VM)   │
   │ .0.53   │   │ .0.54   │   │ .0.55   │
   │ DNS 53  │   │ HTTP/S  │   │ HA only │
   └─────────┘   └─────────┘   └────┬────┘
                                     │
        ┌──────────────┬─────────────┘
        │              │
   ┌────┴────┐   ┌────┴────┐
   │  Guac   │   │Tailscale│
   │  (LXC)  │   │(remains │
   │ .0.52   │   │ in HA)  │
   │ RDP GW  │   └─────────┘
   └─────────┘

Tailscale stays in HA since it's lightweight and tightly integrated with HA's remote access. AdGuard and Guacamole are now independent — HA can restart without taking down DNS or RDP access.


Expected Results

Metric Before After
HA Memory 87% (4.45 GB) ~50-60% (with 8 GB allocated)
Entities 2,330 (775 unavailable) ~1,800 (fewer stale)
State changes/hr ~1,007 ~300-400
Recorder writes/hr ~1,007 ~200-300 (excludes applied)
DNS failure on HA crash Yes No (independent LXC)
RDP failure on HA crash Yes No (independent LXC)
Guacamole CPU in HA 25% 0% (moved out)
Guacamole RAM in HA 9% 0% (moved out)

Files in This Repository

File Purpose
recorder_exclude.yaml Recorder exclude config — merge into configuration.yaml
setup-adguard-lxc.sh Proxmox script to create AdGuard Home LXC
setup-guacamole-lxc.sh Proxmox script to create Guacamole LXC
README.md This file