Anyone got experience with DRBD?
I'm starting to feel like ceph has just too many bugs once you start pushing it.
@phenlix ceph issues are obscure and whatever issue you have will just result in IO hanging.
How does debugging rdbd failures feel like ?
@aep In my experience DRBD was very reliable, except for "split brain" condition sometimes, that required manual intervention. Nothing like I/O hanging.
@phenlix how did you get to know of that situation, how clear was it that this is the issue and how easy was it to resolve?
@aep We had monitoring with Nagios, and Pacemaker can detect it and be instructed to shutdown all instances of Postgres to prevent writes on both sides. In practice this is well documented, did not happen frequently, recovery was safe so we went in production with it (it was an ISP's billing system).
@aep I used it to back Postgres data in a Pacemaker HA cluster and it was solid.