ipsec September 2009 archive
Main Archive Page > Month Archives  > ipsec archives
ipsec: Re: [IPsec] Ikev2 HA message Id Issue

Re: [IPsec] Ikev2 HA message Id Issue

From: Yoav Nir <ynir_at_nospam>
Date: Thu Sep 03 2009 - 15:06:52 GMT
To: "Kalyani Garigipati (kagarigi)" <kagarigi@cisco.com>

Hi Kalyani

Of the two, I prefer the 2nd solution, as it is simpler. Reusing message IDs is not that bad, and you can decrease the change by including (in the RESET_MESSAGE_ID notification) a random number as the starting message ID.

What I'm not so sure, is that there is a real problem here that needs to be solved now.

The IKEv2 documents totally ignore both high-availability solutions and load-sharing solutions. They are just out of scope. So the documents don't specify what data needs to be synched, or how is failover detected and accomplished on the LAN or the WAN.

To get there, you'd need to address issues of routing, signaling (between the peers) and AAA for both IKE and IPsec traffic. That's a tall order that you probably don't want to tackle just now (maybe the WG would want this as the next "big" document)

So to have a high-availability or load-sharing solution that participates in IKE/IPsec, an implementation needs to somehow pretend that both of these are actually one gateway. This can be done with smart switches, multicast addresses and synchronization links, which are out of scope for the WG (for now)

I can think of three levels of synchronizing IKE data between the peers.

  1. Synchronize just the IKE SA at creating and deletion
  2. Synchronize the IKE SA whenever the counters are updated
  3. Synchronize both IKE and Child SAs whenever a packet is sent.

#3 is obviously impractical. #1 doesn't work, because after fail-over, the redundant gateway cannot take over the IKE SA, because it doesn't know the message ID counters. #2 can be made to work, although you need to either immediately rekey all the child SAs, or else skip ahead in the IPsec retrans counters. This solution is practical enough, because most gateways have very few Child SAs per IKE SA. With IKEv2 you can have complex traffic selectors that allow one SA to cover all the domain protected by a gateway. So you might have a few SAs because of different QoS classes, and maybe a few if your implementation prefers to deal with whole ranges or subnets, but really, a single IKE SA with thousands of concurrent Child SAs is more of a lab thing than a practical thing.

Both your solutions ask peer gateways to assist the HA pair with their fail-over process. To the other gateway it seems as if the (single) peer is requesting information about current message ID numbers (and windows) or else for a reset. It seems strange that the first thing we would do for HA support is to help a private extension to the architecture work better, when that private extension is not really documented.

What do you plan to do in your cluster, if the peer does not support this extension?

You might also want to ask Paul and Yaron to present this on the Interim meeting on 22-Sep.


On Sep 3, 2009, at 4:06 PM, Kalyani Garigipati (kagarigi) wrote:

Hi ,

In Ikev2 HA, there is an issue with the message Id and window size. Standby device-----------------------active device----------------------------------Peer device

The active device participating in the exchange with the peer will update its message id counters as per the exchanges done. This info cannot be synced to the stand-by device for every exchange done since that would take up all the bandwidth and is not an efficient way.

The stand-by device when it becomes active will start with the message Id as 1 and this will not be accepted by the peer, since its message Id counters are different. Hence a solution is required to sync the message Id counters to the standby device.

  1. A solution for this is to get the required info from the peer device since it maintains all these counters. The abstract details of how this can be done are given in the attached document.
  2. An alternative solution for this could be to send a new notify called (RESET_MESSAGE_ID) to the peer device as soon as the standby comes up. But this may lead to Reuse of message Idís within the same SA which is not desirable.

I think solution 1 should be implemented with Ikev2. Please give your comments


IPsec mailing list