|Main Archive Page > Month Archives > drbd-user archives|
On Tue, Mar 30, 2010 at 10:34:06AM +0200, Maxence DUNNEWIND wrote:
> I have a cluster of 10 servers with many drbd devices. The drbd version is
> 8.3.7, module loaded with :
> drbd minor_count=128 usermode_helper=/bin/true
> (because I use it with ganeti).
> I have about 40 drbd devices per node (primary and secondaries). Our provider
> has lot of network issues, which sometimes cause drbd to disconnect/reconnect
> very often : about 500 NetworkFailure in 1 hour before the last crash :
> # grep "Connected -> NetworkFailure" /var/log/messages|grep -c "Mar 30 00"
So you are using DRBD with ganeti in a cloud?
> Then the crash log :
The most interessting line is before that.
> Mar 30 00:52:48 z2-6 kernel: [1685605.588315] CPU 2
> Mar 30 00:52:48 z2-6 kernel: [1685605.589086] Pid: 21781, comm: drbd0_worker Tainted: G W 2.6.30-2-amd64 #1 X8STi
> Mar 30 00:52:48 z2-6 kernel: [1685605.594280] RIP: 0010:[<ffffffff802bbc80>] [<ffffffff802bbc80>] cache_alloc_refill+0xf6/0x1f9
Hard out of memory?
did you google for "2.6.30 cache_alloc_refill",
and checked that you are not affected by any of those?
-- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed _______________________________________________ drbd-user mailing list firstname.lastname@example.org http://lists.linbit.com/mailman/listinfo/drbd-user