ipsec October 2009 archive
Main Archive Page > Month Archives  > ipsec archives
ipsec: Re: [IPsec] #22 Simultaneous IKE SA rekey text

Re: [IPsec] #22 Simultaneous IKE SA rekey text

From: Tero Kivinen <kivinen_at_nospam>
Date: Wed Oct 21 2009 - 12:37:38 GMT
To: David Wierbowski <wierbows@us.ibm.com>


David Wierbowski writes:
> I'm not sure this makes RFC4718 incorrect. It just makes it incomplete.

Ok, but that still means we need to find a way to fix that problem before we can use that solution in IKEv2bis.

> > This solution might cause peers to stay in live lock state, causing
> > the whole IKE SA to be unusable. I.e. host A starts IKE SA rekey and
> > host B starts Create Child SA. Host B replies NO_PROPOSAL_CHOSEN to
> > host A's IKE SA rekey, and Host A replies NO_ADDITIONAL_SAS to Host
> > B's Create Child SA request. Both ends process replies, and notices
> > they failed, thus both start again, causing both ends to be trying
> > these operations as fast as they can. This situation will stay as it
> > is unless something kicks hosts out of sync.
> >
> > Or returning NO_ADDITIONAL_SAS might cause other end to delete the
> > whole IKE SA and start from scratch.
>
> I do not like that RFC 4718 used NO_PROPOSAL_CHOSEN as the indicator that
> a rekey is being rejected because there are outstanding requests. To me
> a new notify would have made sense.

True, but as RFC4718 tried to be so it does not modify IKEv2, it could not define new error code. In IKEv2bis we can do this, so I think we should define new error code something like "TEMPORAL_FAILURE" which means there was some kind of temporal error (i.e. the problem will disappear without anybody changing policy) and other end should try again after short timeout.

This error code could have uses for other places too.

NO_PROPOSAL_CHOSEN has indication that the problem will NOT disappear unless someone changes something (i.e. proposals or policy or traffic selectors etc). So some implementations might (and should) use much larger timeout before trying again with exactly same parameters.

> Given that RFC 4718 did use NO_PROPOSAL_CHOSEN it seems to me that
> when HOST A is rekeying the IKE_SA it should assume the peer is busy
> when it receives NO_PROPOSAL_CHOSEN and should continue to attempt
> to periodically rekey the IKE SA again.

Yes.

> I do not agree that when Host B receives NO_ADDITIONAL_SAS that it
> should retry the operation using the same IKE SA.

True, if it follows the RFC4306, it should tear down the whole IKE SA, and start from beginning:



4. Conformance Requirements
...

                        If the responder rejects the CREATE_CHILD_SA    request with a NO_ADDITIONAL_SAS notification, the implementation    MUST be capable of instead closing the old SA and creating a new one. ...


This would be very unfortunate operation to be done in this case, as it would tear down the whole IKE SA, and all the IPsec SAs along with it. I do not think we can use NO_ADDITIONAL_SAS with the current definition anywhere else because of this.

If on the other hand the host B which receives NO_ADDITIONAL_SAS does not tear down the whole IKE SA, but decides to keep the existing IKE SA up and running, there is no text anywhere saying it cannot start create child exchange again in future. Most likely it will do that whenever next packet requiring IPsec SA to be created is received, thus if there is constant stream of packets which would require protection it will trigger new create child exchange immediately.

If we want that when host B receives NO_ADDITIONAL_SAS or when it rejects the IKE SA rekey with NO_PROPOSAL_CHOSEN (or with new TEMPORAL_FAILURE) then it needs to mark the IKE SA in some kind of on hold state, which means no new exchanges can be started on it, that needs to be explictly mentioned.

> As such I do not think there is a live lock state. What should be
> done is up to the implementation. An implementation could assume the
> other end is in the process of rekeying or deleting the IKE SA and
> delay taking any action or it could take immendiate action. If it
> takes immediate action it would need to do so on a new IKE SA.

How long should it delay those operations? Forever? Does that include DPD? If so how is the other end going to get rid of the IKE SA if Host A crashes and forgets everything about the IKE SA, as there will not be any more exchanges from Host A from that on etc.

As the behavior of the nodes affects interoperability we should define what to do in this case.

> > This is not in RFC4306, this is just one proposal given in RFC4718
> > which might be used, but as I noted above, it can cause live lock
> > loop, thus it is not really acceptable.
>
> I think it is appropriate to add this to the new draft. If you are
> concerned about the lock state then a warning should be added stating
> that when you receive NO_ADDITIONAL_SAS that you should not attempt to
> retry that operation on the same IKE SA, although that seems
> self-evident.

Yes, I would want to have some kind of text describing that, and also describing how long does this limit for retry take effect, and I assume that if the other end does not rekey or delete the IKE SA for certain timeout then the other node which received NO_ADDITIONAL_SAS should delete the IKE SA and start over.

> I'm not convinced it is broken, I'm just convinced that if you
> attempt to retry an operation on the same IKE SA that you received
> NO_ADDITIONAL_SAS on that you can get into a lock state. To reduce that
> concern we can come up with a new REKEYING_IKE_SA notification, but that's
> likely to cause problems with old implementations, so better to stick with
> what RFC 4718 proposed.

Adding new error notifications cannot be problem for complient implementations as RFC4306 says:



3.10.1. Notify Message Types
...

   Types in the range 0 - 16383 are intended for reporting errors. An    implementation receiving a Notify payload with one of these types    that it does not recognize in a response MUST assume that the    corresponding request has failed entirely. ...


Thus every complient implementation MUSST assume that corresponding request has failed if they receive unrecognized error notify on response. Thus every implementation should handle new error messages just as we wanted, i.e. assume the IKE SA rekey failed.

>
> > The text above implies that regardless what you do you should be able
> > to allow other end to start exchanges and process them. I.e. IKEv2
> > protocol tries to be specified in such way that both ends can start
> > exchanges at any times and expect them to either fail or succeed and
> > get reply back, but not stay in situation where you do not know,
> > whether other end processed your request or not.
> >
> > If you delete the IKE SA immediately that will happen.
>
> You can never guarantee you are going to get a response back to a
> request. I do not see what makes this situation any different.

If I do not get response back (after dozen retransmissions over a period of at least several minutes) to a request on IKEv2 protocol that means the IKE SA is dead, and I silently discard it (i.e. assume other end is dead).

Only case where you might not reply back is if the other end has deleted the IKE SA or if the network is broken. In both cases the correct fix is to remove the IKE SA and start over and it does not matter whether your request got other end or not.

But that is not true with rekey case as some of the operations you do on the old IKE SA do affect the state of the new IKE SA, thus you need to know whether other end processed your request or not.

> I understand that RFC 4718 is just one proposal, but it's one that I
> expect some vendors tried to implement. I doubt there are many that are
> currently delaying the deletion of the IKE SA.

As our implementation does that, I guess all our customers implementations do it... :-)

But I do not think that is an issue here.

Simultaneous rekeys are not things that happen that often (if ever outside laboratory tests :-), so even if there are old implementations which do not do what IKEv2bis document will say, that shouldn't really matter.

So I think we need to write some text that will work, and not be to concerned about what current implementations are now doing (I am sure all implementations out there are going to need minor modifications anyways when IKEv2bis comes out).

> I'm not convinced yet that RFC 4718 is broken or at least that it cannot
> be made to work.

I think it can be made to work, I do consider it broken or at least underspecified as it is now, as it might lead live locks, but adding text to it might solve the problem (before I see the text and solution I cannot say for sure it will solve the problem).

> > Implementation needs to still have the code that detects the
> > simultaneous rekey, and other end might still use this delay, thus you
> > need to be able to cope with the case where this happens.
> > Implementations need to be able to handle both cases regardless
> > whether we use SHOULD or MAY, only thing that is different is whether
> > they allow other end finish exchanges or not.
>
> Agreed, but I still think delaying the deletion is at most a MAY.

Ok. -- kivinen@iki.fi _______________________________________________ IPsec mailing list IPsec@ietf.org https://www.ietf.org/mailman/listinfo/ipsec