drbd-user March 2010 archive
Main Archive Page > Month Archives  > drbd-user archives
drbd-user: Re: [DRBD-user] DRBD crash with bad network

Re: [DRBD-user] DRBD crash with bad network

From: Lars Ellenberg <lars.ellenberg_at_nospam>
Date: Wed Mar 31 2010 - 19:21:45 GMT
To: drbd-user@lists.linbit.com

On Tue, Mar 30, 2010 at 10:34:06AM +0200, Maxence DUNNEWIND wrote:
> Hi,
>
> I have a cluster of 10 servers with many drbd devices. The drbd version is
> 8.3.7, module loaded with :
> drbd minor_count=128 usermode_helper=/bin/true
> (because I use it with ganeti).
>
> I have about 40 drbd devices per node (primary and secondaries). Our provider
> has lot of network issues, which sometimes cause drbd to disconnect/reconnect
> very often : about 500 NetworkFailure in 1 hour before the last crash :
> # grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00"
> 483

So you are using DRBD with ganeti in a cloud?
Which cloud?

> Then the crash log :

The most interessting line is before that.

> Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2

> Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: drbd0_worker Tainted: G W 2.6.30-2-amd64 #1 X8STi
> Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9

Hard out of memory?
did you google for "2.6.30 cache_alloc_refill",
and checked that you are not affected by any of those?

-- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed _______________________________________________ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user