full-disclosure-uk January 2010 archive
Main Archive Page > Month Archives  > full-disclosure-uk archives
full-disclosure-uk: Re: [Full-disclosure] Two MSIE 6.0/7.0 NULL

Re: [Full-disclosure] Two MSIE 6.0/7.0 NULL pointer crashes

From: Dan Kaminsky <dan_at_nospam>
Date: Thu Jan 21 2010 - 16:12:17 GMT
To: Michal Zalewski <lcamtuf@coredump.cx>

On Thu, Jan 21, 2010 at 1:53 AM, Michal Zalewski <lcamtuf@coredump.cx> wrote:
>> Testing takes time. That's why both Microsoft and Mozilla test.
> Testing almost never legitimately takes months or years, unless the
> process is severely broken; contrary to the popular claims,
> personally, I have serious doubts that QA is a major bottleneck when
> it comes to security response - certainly not as often as portrayed.

There are a lot of factors that go into how long it takes to run QA. Here's a few (I'll leave out the joys of multivendor for now):

  1. How widespread is the deployment? A little while ago, Google had an XSS on Google Maps. An hour later, they didn't. About a decade ago, AOL Instant Messenger had a remote code execution vulnerability. Eight hours later, they didn't. Say what you will about centralization, but it *really* makes it easier and safer to deploy a fix, because the environment is totally known, and you have only one environment. There are a couple dimensions at play here:
  2. How many versions do you need to patch?
  3. How many different deployment classes are there? If your developers are making a bunch of enterprise assumptions (there's a domain, there's group policy, there's an IT guy, etc) and the fix is going to Grandma, let me tell you, something's not going to work
  4. What's at stake? Your D-Link router has very different deployment characteristics than your Juniper router.
  5. How complicated is the fix? Throwing in a one-liner to make sure an integer doesn't overflow is indeed relatively straightforward. But imagine an oldschool application drenched in strcpy, where you've lost context of the length of that buffer five functions ago. Or imagine the modern browser bug, where you're going up against an attacker who *by design* has a Turing complete capability to manipulate your object tree, complete with control over time. Or, worst of all, take a design flaw like Marsh Ray's TLS renegotiation bug. People are still fiddling around with figuring out how to fix that bug right, months later. Complexity introduces three issues:
  6. You have to fix the entire flaw, and related flaws. We've all seen companies who deploy fixes like "if this argument contains alert(1), abort". Yeah, that's not enough.
  7. You have to not introduce new flaws with your fix -- complexity doesn't stop increasing vulnerability just because you're doing a security fix.
  8. The system needs to work entirely the same after. That means you don't get to significantly degrade performance, usability, reliability, or especially compatibility. Particularly with design bugs, other systems grow to depend on their presence. No software lives in a vacuum, so you have to actually _find_ these other pieces of code, and make sure things still work.
  9. How many people do you actually expect to deploy your patch? There's this entire continuum between "only the other developers on SVN", through "the people who call to complain", to "everybody who clicks 'I accept the risk of patching'", to "my entire customer base with zero user interaction whatsoever". A patch with problems 0.005% of the time is acceptable if 1000 people are deploying, but not if 1,000,000 people are deploying. Note that security research is very strongly correlated with deployment numbers, to the point that vulnerability count is much more correlated with popularity than code quality. So you have this interesting situation where the more your fix is pushed, the more scrutiny there will be upon it.

Now, you can consider these all excuses. Believe me, QA people have no shortage of guys who look down on them and their problems. But certainly different bugs have different characteristics, and assuming that all things can be fixed in the same time window with the same output quality is just factually untrue. You might as well be claiming the next version of HTML5 will include an <antigravity> tag that will make your laptop float in the air and spin around.

There is a balancing act. Years is, of course, ridiculous. In many situations, so too are a couple of weeks. If the goal is to achieve the best quality patches, then you want the issue _prioritized heavily_, but not _on public fire_. The latter encourages people to skip testing, and you know of course what happens when you skip testing?

You end up with Gigabit Ethernet drivers that can't actually handle all frame lengths. (Epic Intel find from Fabian Yamaguchi. Wow.)

Responsible disclosure has its risks. You really can be jerked around by a company, *especially* one that hasn't experienced the weirdness of an external security disclosure before. But engineering realities don't go away just because the suits finally got out of the way. Testing matters.

Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/