drbd-user March 2013 archive
Main Archive Page > Month Archives  > drbd-user archives
drbd-user: Re: [DRBD-user] Primary fully unavailable with "

Re: [DRBD-user] Primary fully unavailable with "time expired" errors

From: AZ 9901 <az9901_at_nospam>
Date: Sun Mar 10 2013 - 15:21:34 GMT
To: David Coulson <david@davidcoulson.net>

David,

Thank you for your answer !

This log entry arrived just after (and is certainly due to the fact that) I closed network communication between srv2-1 and srv2-2 :
I connected to secondary server and used iptables to stop communication between the two servers.
Just after that, primary server was reachable again !
But according to logs, issue started 2 days before.

However, to answer your question, the network between the 2 servers is the private dedicated network OVH uses between its 2 data-centers RBX & SGB :
http://www.ovh.co.uk/dedicated_servers/data_centre_selection.xml
I have a 100Mbps connection between the 2 servers.

Best regards,

Ben

Le 10 mars 2013 à 16:01, David Coulson a écrit :

> What is your network between the two systems?
>
> Feb 19 19:20:56 srv2-2 kernel: block drbd1: PingAck did not arrive in time.
>
> That means DRBD couldn't communicate between the nodes.
>
> David
>
> On 3/10/13 10:59 AM, AZ 9901 wrote:
>> Le 5 mars 2013 à 07:21, AZ 9901 a écrit :
>>
>>> // I made some errors in my previous mail, here they are corrected
>>>
>>> Hello,
>>>
>>> I faced a big issue with DRBD.
>>>
>>> OS : Linux Debian 6
>>> Kernel : 2.6.32-46
>>> DRBD : 8.3.14
>>>
>>> My primary server (srv2-2) was totally unreachable, it only replied to ping.
>>> Apache, SSH etc... were not replying anymore.
>>>
>>> So I connected to my secondary server (srv2-1) and closed network communication between both.
>>> This made srv2-2 available again !
>>> I decided however to change srv2-1 from Secondary to Primary and to reboot srv2-2.
>>>
>>> Following are logs from srv2-2 and srv2-1, with some comments.
>>> srv2-2 : http://pastebin.com/raw.php?i=zkHV5Tr9
>>> srv2-1 : http://pastebin.com/raw.php?i=WX4vNR6d
>>>
>>> on srv2-2, sar tells me that some of my CPU cores were 100% used (100% iowait) during all the time frame in which I had "time expired" errors.
>>>
>>> Could you help me please ?
>>>
>>> Thank you very much,
>>>
>>> Ben
>>>
>>
>> Hello,
>>
>> Any help on this problem ?
>>
>> To help further, here is my configuration : http://pastebin.com/raw.php?i=UJ7npfBD
>>
>> Thank you very much,
>>
>> Best regards,
>>
>> Ben
>>
>>
>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user@lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>

_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user