samba-users August 2010 archive
Main Archive Page > Month Archives  > samba-users archives
samba-users: Re: [Samba] Tracking down random server drops...

Re: [Samba] Tracking down random server drops...

From: Josh Coombs <jcoombs_at_nospam>
Date: Thu Aug 26 2010 - 14:56:33 GMT
To: samba@lists.samba.org

I'm turning my logging up to lvl 3, it looks like I'm seeing the symptom,
not the cause the more I look at this. Based on last night's failures I
have the following pattern:

At 9:24pm ntbackup on ravenclaw begins writing to the existing file
RAVECLAW.bkf, so far nothing logged at lvl 2 yet...

at 7:04am log.ravenclaw shows:
[2010/08/26 07:04:35.538218, 2] smbd/sesssetup.c:1390(setup_new_vc_session)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all
old resources.
[2010/08/26 07:04:35.576723, 2] smbd/sesssetup.c:1390(setup_new_vc_session)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all
old resources.
[2010/08/26 07:04:35.697538, 2] auth/auth.c:304(check_ntlm_password)
  check_ntlm_password: authentication for user [backup] -> [backup] ->
[backup] succeeded
[2010/08/26 07:04:35.700101, 1] smbd/service.c:1070(make_connection_snum)
  ravenclaw (192.168.1.8) connect to service backups initially as user
backup (uid=34, gid=34) (pid 30350)
[2010/08/26 07:04:35.701039, 2] smbd/open.c:631(open_file)
  backup opened file A7B4G5J1.BJ3 read=Yes write=Yes (numopen=1)
[2010/08/26 07:04:35.702556, 2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK

The open and closed chatter for that file repeats until 7:06:47am...
[2010/08/26 07:06:36.254759, 2] smbd/close.c:656(close_normal_file)
  backup closed file A7B4G5J1.BJ3 (numopen=0) NT_STATUS_OK
[2010/08/26 07:06:47.640491, 1] smbd/service.c:1251(close_cnum)
  ravenclaw (192.168.1.8) closed connection to service backups
[2010/08/26 07:08:12.795026, 0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.795070, 0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Connection
reset by peer
[2010/08/26 07:08:12.814075, 0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.816315, 0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.816325, 0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.816333, 0] smbd/process.c:79(srv_send_smb)
  Error writing 55 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.817211, 0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.817219, 0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.817226, 2] smbd/process.c:2202(keepalive_fn)
  Keepalive failed - exiting.
[2010/08/26 07:08:12.817339, 0] lib/util_sock.c:675(write_data)
[2010/08/26 07:08:12.817348, 0]
lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  write_data: write failure in writing to client 0.0.0.0. Error Broken pipe
[2010/08/26 07:08:12.817355, 0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Transport endpoint is not
connected)
[2010/08/26 07:08:12.941965, 2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.941991, 2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942010, 2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942020, 2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942037, 2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942046, 2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942065, 2] lib/messages_local.c:289(message_notify)
  message to process 30350 failed - No such process
[2010/08/26 07:08:12.942075, 2]
lib/messages_local.c:379(messaging_tdb_send)
  pid 30350 doesn't exist - deleting messages record
[2010/08/26 07:08:12.942136, 2] smbd/close.c:656(close_normal_file)
  backup closed file Windows Servers/Ravenclaw/RAVENCLAW.bkf (numopen=0)
NT_STATUS_OK
[2010/08/26 07:08:12.942159, 1] smbd/service.c:1251(close_cnum)
  ravenclaw (192.168.1.8) closed connection to service backups

This is followed by the cleanup messages in log.smbd:
[2010/08/26 07:08:13.139151, 1] smbd/server.c:267(remove_child_pid)
  Scheduled cleanup of brl and lock database after unclean shutdown
[2010/08/26 07:08:33.152490, 1] smbd/server.c:240(cleanup_timeout_fn)
  Cleaning up brl and lock database after unclean shutdown

Ravenclaw reports the backup failed at 7:06am, error writing. So something
happened between Samba and the 2k3 box at 7:04am that triggered the new
session, but I don't have anything to indicate a reason. I have eventlog
entries on Ravenclaw from after the fact noting Delayed Write Failed, but
nothing leading up to the problem.

Josh C
-- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/options/samba