drbd-user March 2010 archive
Main Archive Page > Month Archives  > drbd-user archives
drbd-user: [DRBD-user] Problems with oos Sectors after verify

[DRBD-user] Problems with oos Sectors after verify

From: Henning Bitsch <newsletter_at_nospam>
Date: Wed Mar 17 2010 - 07:08:20 GMT
To: drbd-user@lists.linbit.com

Hi,

>

I have a problem running drbd 8.3.7-1 on Debian Lenny (2.6.26-AMD64-Xen).
I have six drbd devices with a total of 3 TB. Both nodes are Supermicro AMD
Opteron boxes (one 12 core, one 4 core) with a dedicated 1 GBit connection for
DRBD and Adaptec 5800 Raid controllers. One side is a NVIDIA forcedeth NIC,
the other side an Intel e1000. Protocol is C. The dom0 has 2 GByte of RAM.

Basically two symptoms can be observed but I am not sure if they are related:

1. Data Integrity errors
I get occasional data integrity errors (checksummed with crc32c) on both nodes
in the cluster.

[ 8961.266879] block drbd3: Digest integrity check FAILED.
[22846.253694] block drbd3: Digest integrity check FAILED.
[23557.272471] block drbd3: Digest integrity check FAILED.

Like recommended before I did the standard procedures (disable offloading,
memtest, replacing cables, replacing one of the boxes) but without success.
The errors are only reported for devices wich the respective node is
secondary for.

2. oos after verify
I always get a few oos sectors after verifying any device which has been used
previously. These are no false positives, the sectors are in fact different:

2,5c2,5
< 0000010: 0000 0000 0800 0000 0000 00ff 0000 0000 ................
< 0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
< 0000030: 0000 0000 ffff ffff ffff ffff 0000 0000 ................
< 0000040: 0000 0400 0000 0000 0000 0000 0000 0000 ................
--- > 0000010: 0000 0000 0800 0000 0000 19ff 0000 0000 ................ > 0000020: 0000 002b 0000 0000 0000 0000 0000 0000 ...+............ > 0000030: 0000 002b ffff ffff ffff ffff 0000 0000 ...+............ > 0000040: 0000 0400 0000 0000 0002 8668 0000 0000 ...........h.... 8c8 < 0000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................ --- > 0000070: 0000 0f03 0000 0000 0000 0001 0000 0000 ................ After dis/reconnect/resyncing the device, they are identical again. This happens with random sectors and basically every verify. Here my relevant global config for drbd. startup { wfc-timeout 60; degr-wfc-timeout 300; } disk { on-io-error detach; } net { cram-hmac-alg sha1; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; data-integrity-alg crc32c; max-buffers 3000; max-epoch-size 8000; } syncer { rate 25M; verify-alg crc32c; csums-alg crc32c; al-extents 257; } I tweaked the tcp settings using sysctl net.ipv4.tcp_rmem = 131072 131072 16777216 net.ipv4.tcp_wmem = 131072 131072 16777216 net.core.rmem_max = 10485760 net.core.wmem_max = 10485760 net.ipv4.tcp_mem = 96000 128000 256000 I am not sure in which direction to search next and would be happy about any suggestions. Thanks. Regards, Henning COM+ IT Consulting _______________________________________________ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user