Follow

Ran into an Interesting Datacenter operations problem: most of our high availability stuff uses raft, which requires 2 out of 3 (or 3/5 etc) nodes to survive power failure to continue.

However, power is A/B and if the power line with the larger number of nodes fails, it takes down the cluster.

How would you solve this?

@aep forgive my ignorance, but I thought most (throughout oriented) server power supplies were 2N redundant, so you’d put each node on both incoming lines?

(I understand HPC is normally N+1 as the requirements are different when compute not movement is the biggest cost)

@coral yeah we just don't do that for multiple efficiency related reasons. The main thing being that our hardware is DC not AC.

Probably need a DC fail over. Annoying, since I was hoping to solve most problems with Software.

@aep epubs.siam.org/doi/10.1137/021 suggests yes, but at the cost of every write you do taking n/2+1 communications to commit… probably not a good tradeoff va keeping a raft primary on a huge UPS

@aep oh, DC as in direct current not datacenter oops

@coral oh yeah lol. Sorry for the ambiguity. It's been worse when you talk about DC DC converters for Datacenters

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!