|Main Archive Page > Month Archives > openssh-unix-dev archives|
Thank you for taking the time to respond! Further comments below:
On Tue, May 31, 2011 at 8:25 AM, Gert Doering <email@example.com> wrote:
> On Mon, May 30, 2011 at 10:32:24PM +0100, Cal Leeming [Simplicity Media Ltd] wrote:
>> So, it turns out that it is actually OpenSSH which is broken, after
> I would not second this. To me, this very much looks like:
>> On 30/05/2011 21:56, Cal Leeming [Simplicity Media Ltd] wrote:
>> > Just did some testing..
>> >root@vicky:~# cat /var/log/auth.log | grep "Set"
>> >May 30 21:41:05 vicky sshd: Set /proc/self/oom_adj from -17 to -17
>> >May 30 21:41:07 vicky sshd: Set /proc/self/oom_adj to -17
> ... it's reading out the old value, saving it, setting it to "-17" (for
> the sshd listener process, that one is not to be killed), and later on
> *restoring* the old value (for all child processes). See the comments
> in platform.c
> The log messages look weird because the value is -17 already when sshd
> starts - so it's adjusting "-17 to -17" and later on "restoring -17" -
> looks stupid, but that's computers for you. But what it tells you is
> that the value isn't set by sshd to "-17" but that sshd inherited that
> from whoever started it.
Could you point out the line of code where oom_adj_save is set to the
original value, because I've looked everywhere, and from what I can
tell, it's only ever set to INT_MIN, and no where else is it changed.
(C is not my strongest language tho, so I most likely have overlooked
something). This is where I got thrown off.
> The question here is why sshd is sometimes started with -17 and sometimes
> with 0 - that's the bug, not that sshd keeps what it's given.
> (Ask yourself: if sshd had no idea about oom_adj at all, would this make
> it buggy by not changing the value?)
This was what I was trying to pinpoint down before. I had came to this
conclusion myself, sent it to the Debian bug list, and they dismissed
on the grounds that it was an openssh problem...
So far, the buck has been passed from kernel-mm to debian-kernel, to
openssh, and now back to debian-kernel lol. The most annoying thing,
is that you can't get this bug to happen unless you physically test on
a machine which requires the bnx2 firmwire, so I get the feeling this
won't get resolved :/
> Anyway, as a workaround for your system, you can certainly set
> oom_adj_save = 0;
> in the beginning of port-linux.c / oom_adjust_restore(), to claim that
> "hey, this was the saved value to start with" and "restore" oom_adj to 0
> then - but that's just hiding the bug, not fixing it.
I'm disappointed this wasn't the correct fix, I honestly thought I had
patched it right :(
But, on the other hand, ssh users should really never have a default
oom_adj of -17, so maybe 0 should be set as default anyway? If this is
not the case, could you give reasons why??
> USENET is *not* the non-clickable part of WWW!
> Gert Doering - Munich, Germany firstname.lastname@example.org
> fax: +49-89-35655025 email@example.com
openssh-unix-dev mailing list