Proxmox Cluster Node Offline (Red): How to Fix It
A Proxmox node showing red usually isn't down — the cluster has lost communication with it. The cause is almost always corosync (the cluster messaging layer) or the network between nodes.
Step 1 — Check quorum and the cluster services
pvecm status
systemctl status corosync pve-cluster
pvecm status shows whether the cluster has quorum and which nodes are visible. If corosync or pve-cluster (pmxcfs) isn't running, that node drops out of the cluster.
Step 2 — The usual cause: corosync network
Corosync needs low-latency reachability between nodes on the cluster ring. Check:
- Ring connectivity: can the nodes reach each other on the corosync link(s)? A flapping or saturated link drops nodes.
- Time sync: large clock skew (no NTP/chrony) breaks the cluster — confirm time is in sync across all nodes.
- MTU / switch changes: a recent network change on the cluster VLAN is a common trigger.
journalctl -u corosync -u pve-cluster -n 100 --no-pager
Step 3 — Restart the cluster stack on the affected node
systemctl restart corosync
systemctl restart pve-cluster
This re-joins the node without affecting running VMs. If /etc/pve was read-only (pmxcfs lost quorum), it returns to writable once quorum is restored.
Step 4 — Restore quorum if a node is gone
If you've permanently lost nodes and the survivors can't reach quorum, temporarily lower the expected votes so the cluster is usable, then fix the underlying network:
pvecm expected 1
Use this carefully — it's a recovery measure, not a fix. Repair the corosync network so real quorum returns.
How Tech Matrix solves this in ~60 seconds
A red node could be the network, corosync, time sync, or quorum — and they look the same in the GUI. Tech Matrix reads pvecm status, the corosync logs and the node's network together, tells you which it is, and gives the safe restart/recovery steps for your Proxmox version, with your approval.
Frequently asked questions
The node is up but the cluster lost communication with it — usually corosync or the cluster network. The VMs keep running; it's the cluster membership that's broken.
Run 'pvecm status' — it shows quorum state and visible nodes. 'systemctl status corosync pve-cluster' shows whether the cluster services are healthy.
pmxcfs makes /etc/pve read-only when the node loses quorum. Restore corosync communication/quorum and it returns to writable.