Ran into an Interesting Datacenter operations problem: most of our high availability stuff uses raft, which requires 2 out of 3 (or 3/5 etc) nodes to survive power failure to continue.

However, power is A/B and if the power line with the larger number of nodes fails, it takes down the cluster.

How would you solve this?

@aep forgive my ignorance, but I thought most (throughout oriented) server power supplies were 2N redundant, so you’d put each node on both incoming lines?

(I understand HPC is normally N+1 as the requirements are different when compute not movement is the biggest cost)

@coral yeah we just don't do that for multiple efficiency related reasons. The main thing being that our hardware is DC not AC.

Probably need a DC fail over. Annoying, since I was hoping to solve most problems with Software.

@aep suggests yes, but at the cost of every write you do taking n/2+1 communications to commit… probably not a good tradeoff va keeping a raft primary on a huge UPS

@aep oh, DC as in direct current not datacenter oops

@coral oh yeah lol. Sorry for the ambiguity. It's been worse when you talk about DC DC converters for Datacenters

