openssh-unix-dev May 2011 archive
Main Archive Page > Month Archives  > openssh-unix-dev archives
openssh-unix-dev: Re: port-linux.c bug with oom_adjust_restore()

Re: port-linux.c bug with oom_adjust_restore() - causes real bad oom_adj - which can cause DoS conditions.

From: Cal Leeming [Simplicity Media Ltd] <cal.leeming_at_nospam>
Date: Tue May 31 2011 - 12:42:26 GMT
To: Gert Doering <gert@greenie.muc.de>

On 31/05/2011 13:25, Gert Doering wrote:
> Hi,
>
> On Tue, May 31, 2011 at 12:11:13PM +0100, Cal Leeming [Simplicity Media Ltd] wrote:
>> Could you point out the line of code where oom_adj_save is set to the
>> original value, because I've looked everywhere, and from what I can
>> tell, it's only ever set to INT_MIN, and no where else is it changed.
>> (C is not my strongest language tho, so I most likely have overlooked
>> something). This is where I got thrown off.
> oom_adjust_setup() does this:
>
> if ((fp = fopen(oom_adj_path, "r+")) != NULL) {
> if (fscanf(fp, "%d",&oom_adj_save) != 1)
> verbose("error reading %s: %s", oom_adj_path,
> strerror(errno));
>
> the "fscanf()" call will read an integer ("%d") from the file named,
> and write that number into the variable being pointed to (&oom_adj_save).
>
> The loop is a bit tricky to read as it takes different paths into
> account, and will exit after the first successful update.
>
> fscanf() will return the number of successful conversions, so if it
> was able to read "one number", the return value is "1", and it will
> jump to the else block
>
> else {
> rewind(fp);
> if (fprintf(fp, "%d\n", value)<= 0)
> verbose("error writing %s: %s",
> oom_adj_path, strerror(errno));
> else
> verbose("Set %s from %d to %d",
> oom_adj_path, oom_adj_save, value);
> }
>
> where it will overwrite what is in that file with the new value
> ("value"), and then print the "Set ... from -17 to -17" message that
> you saw.

Ah, thank you for explaining this. Makes a lot more sense now :)

>
>>> The question here is why sshd is sometimes started with -17 and sometimes
>>> with 0 - that's the bug, not that sshd keeps what it's given.
>>>
>>> (Ask yourself: if sshd had no idea about oom_adj at all, would this make
>>> it buggy by not changing the value?)
>> This was what I was trying to pinpoint down before. I had came to this
>> conclusion myself, sent it to the Debian bug list, and they dismissed
>> on the grounds that it was an openssh problem...
> I must admit that I have no idea what is causing it, but from the logs,
> it very much looks like sshd is started with "-17" in there - but only
> in the problem case.
>
>
>> So far, the buck has been passed from kernel-mm to debian-kernel, to
>> openssh, and now back to debian-kernel lol. The most annoying thing,
>> is that you can't get this bug to happen unless you physically test on
>> a machine which requires the bnx2 firmwire, so I get the feeling this
>> won't get resolved :/
> And *that* strongly points to a bug in the bnx2 stuff - if other programs
> change their behaviour based on the existance of a given driver, that
> does not smell very healthy.

Agreed.. I was thinking of adding some debug into the fs/proc/ code
which does a kprint on every oom_adj read/write, but I couldn't figure
out how to extract the pid from the task (pointer?).

> [..]
>>> Anyway, as a workaround for your system, you can certainly set
>>>
>>> oom_adj_save = 0;
>>>
>>> in the beginning of port-linux.c / oom_adjust_restore(), to claim that
>>> "hey, this was the saved value to start with" and "restore" oom_adj to 0
>>> then - but that's just hiding the bug, not fixing it.
>> I'm disappointed this wasn't the correct fix, I honestly thought I had
>> patched it right :(
> Well, that's the short hand - "just ignore everything that happened at
> init / save time, and forcibly write back '0', no matter what was there
> before".
>
>> But, on the other hand, ssh users should really never have a default
>> oom_adj of -17, so maybe 0 should be set as default anyway? If this is
>> not the case, could you give reasons why??
> Well, I would say "the default value in there is a matter of local policy",
> so what if someone wants to make sure that whatever is run from sshd is
> always privileged regarding oom, even if a local firefox etc. is running
> amock and you need to ssh-in and kill the GUI stuff...
>
> One might opt to run sshd (and all its children) at "-5" (slightly special
> treatment), or "0" (no special treatment), or even at "-17" - but that's
> local policy.

Ah, okay that's make sense.

>
> Mmmh.
>
> Since this seems to be inherited, it might even work if you just change
> the sshd startup script, and insert
>
> echo 0>/proc/self/oom_adj
>
> in there, right before it starts the sshd... "local policy at work".

Yeah I was going to do this, but then I thought "well if this problem is
occurring for openssh, then what else could it be affecting?". As you
pointed out above, having the oom_adj changed based on the existence of
a driver is really not good.

I will paste this convo trail into the debian ticket, and hopefully
it'll help convince them this issue needs fixing.

> gert

Thanks again for taking the time to reply!

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev