|Main Archive Page > Month Archives > ipsec archives|
Of the two, I prefer the 2nd solution, as it is simpler. Reusing message IDs is not that bad, and you can decrease the change by including (in the RESET_MESSAGE_ID notification) a random number as the starting message ID.
What I'm not so sure, is that there is a real problem here that needs to be solved now.
The IKEv2 documents totally ignore both high-availability solutions and load-sharing solutions. They are just out of scope. So the documents don't specify what data needs to be synched, or how is failover detected and accomplished on the LAN or the WAN.
To get there, you'd need to address issues of routing, signaling (between the peers) and AAA for both IKE and IPsec traffic. That's a tall order that you probably don't want to tackle just now (maybe the WG would want this as the next "big" document)
So to have a high-availability or load-sharing solution that participates in IKE/IPsec, an implementation needs to somehow pretend that both of these are actually one gateway. This can be done with smart switches, multicast addresses and synchronization links, which are out of scope for the WG (for now)
I can think of three levels of synchronizing IKE data between the peers.
#3 is obviously impractical. #1 doesn't work, because after fail-over, the redundant gateway cannot take over the IKE SA, because it doesn't know the message ID counters. #2 can be made to work, although you need to either immediately rekey all the child SAs, or else skip ahead in the IPsec retrans counters. This solution is practical enough, because most gateways have very few Child SAs per IKE SA. With IKEv2 you can have complex traffic selectors that allow one SA to cover all the domain protected by a gateway. So you might have a few SAs because of different QoS classes, and maybe a few if your implementation prefers to deal with whole ranges or subnets, but really, a single IKE SA with thousands of concurrent Child SAs is more of a lab thing than a practical thing.
Both your solutions ask peer gateways to assist the HA pair with their fail-over process. To the other gateway it seems as if the (single) peer is requesting information about current message ID numbers (and windows) or else for a reset. It seems strange that the first thing we would do for HA support is to help a private extension to the architecture work better, when that private extension is not really documented.
What do you plan to do in your cluster, if the peer does not support this extension?
You might also want to ask Paul and Yaron to present this on the Interim meeting on 22-Sep.
On Sep 3, 2009, at 4:06 PM, Kalyani Garigipati (kagarigi) wrote:
In Ikev2 HA, there is an issue with the message Id and window size. Standby device-----------------------active device----------------------------------Peer device
The active device participating in the exchange with the peer will update its message id counters as per the exchanges done. This info cannot be synced to the stand-by device for every exchange done since that would take up all the bandwidth and is not an efficient way.
The stand-by device when it becomes active will start with the message Id as 1 and this will not be accepted by the peer, since its message Id counters are different. Hence a solution is required to sync the message Id counters to the standby device.
I think solution 1 should be implemented with Ikev2. Please give your comments