drbd-user March 2010 archive
Main Archive Page > Month Archives  > drbd-user archives
drbd-user: [DRBD-user] fence-peer helper broken, returned 0

[DRBD-user] fence-peer helper broken, returned 0

From: Mikkel Raakilde Jakobsen <MRJ_at_nospam>
Date: Thu Mar 11 2010 - 13:34:27 GMT
To: <drbd-user@lists.linbit.com>

Hi,

We have the following setup:

Two physical servers installed with DRBD 8.3.2 and Heartbeat 2.1.3 on
CentOS 5.4. Everything installed via official RPM packages in CentOS'
repositories.
They have two bonded direct links between them for DRBD replication, and
two other bonded links for all other traffic (management, iSCSI etc.)

We can do hb_takeover from host to host without any issues.
When we power off the primary host, the other host tries to take over,
but never succeeds.

We see the following lines in the log several times, until heartbeat
gives up, and goes standby again:

block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code
0 (0x0)
block drbd0: fence-peer helper broken, returned 0
block drbd0: helper command: /sbin/drbdadm fence-peer minor-0

After the "failed" node gets powered on again, they are in a split-brain
condition.
We have tried compiling the latest DRBD and Heartbeat and using those,
but the error is the same.

Here is our drbd.conf:
resource r0 {
        protocol C;

        startup { wfc-timeout 0; }

        disk { on-io-error detach;
                no-disk-barrier;
                no-disk-flushes;
                no-md-flushes;
                fencing resource-only;
        }

        net {
                max-buffers 20000;
                max-epoch-size 20000;
                sndbuf-size 1M;
        }

        syncer { rate 2000M;
                 al-extents 1201; }

        on server1 {
                device /dev/drbd0;
                disk /dev/dm-1;
                address 172.16.0.127:7788;
                meta-disk internal;
        }

        on server2 {
                device /dev/drbd0;
                disk /dev/dm-1;
                address 172.16.0.227:7788;
                meta-disk internal;
        }

Here is our ha.cf:
use_logd yes
keepalive 1
deadtime 10
warntime 10
initdead 20
udpport 694
ucast bond0.20 10.0.0.127
auto_failback off
node server1 server2

uuidfrom nodename
respawn hacluster /usr/lib/heartbeat/ipfail
ping 10.0.0.1
deadping 20

How can we solve this problem?

Best Regards,

Mikkel R. Jakobsen
Systems Consultant
DANSUPPORT A/S
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user