Internet-Draft | IKEv2 Key Distribution | October 2025 |
Antony & Klassert | Expires 17 April 2026 | [Page] |
This document specifies method for IPsec key derivation and distribution within large groups using an extended Internet Key Exchange Protocol Version 2 (IKEv2). A Group Key Management Server (GKMS) provisions both a shared group key and proxy messages between nodes to derive pairwise keys pair unique to IPsec peers, enabling use of IPsec Encapsulating Security Payload (ESP) among multiple nodes. Each communicating pair derives a session-specific encryption keys from two inputs: the group key provided by the GKMS and an individual host-pair keys derived via IKEv2 exchange. The final key is derived by combining, XOR, these components, ensuring uniqueness per pair keys only known to the IPsec peers while maintaining fewer IKEv2 states on nodes, scalable state management for dynamic group membership. Some of the control functions are such Node authentication, deleting single SA is performed by the the GKMS.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 17 April 2026.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Modern distributed environments operating under a single administrative domain, such as AI or machine learning (ML) clusters, often comprise a large number of worker nodes that exchange sensitive data over IP networks. To ensure confidentiality and integrity of these communications, the IPsec Encapsulating Security Payload (ESP), [RFC4303], can be employed.¶
In this architecture, a Group Key Management Server (GKMS) functions as a trusted authority responsible for authenticating participating nodes and facilitating the derivation of cryptographic material within the group. Each node enrolls with the GKMS to obtain credentials and authorization for secure participation in the group communication framework.¶
Each pair of nodes within the group MUST derive and use a unique pairwise encryption key for ESP sessions. These keys are generated via an asymmetric derivation process coordinated by the GKMS, enabling controlled key distribution without requiring direct peer-to-peer IKE exchanges among all nodes. This method provides a scalable and efficient key management model that supports large, dynamic AI/ML clusters, while only communication pairs has the encryption keys.¶
The final ESP session key is derived from the current Group Key, generated by the GKMS and shared with the current group members, per session derived key.¶
An IPsec peer can delegate a majority of the IKEv2 control plane functions to the Group Key Management Server (GKMS), except key derviation For example, deletion of a Security Association (SA) from all group members can be centrally managed by the GKMS. When a node reboots, it is expected to delete its existing SAs with peers. Sending an IKEv2 Delete exchange individually with a large number of nodes can introduce significant overhead. Instead, the node can delegate the SA deletion responsibility to the GKMS, reducing control-plane complexity and improving operational efficiency.¶
One proposal is to relax the requirement for unique keys and permit all nodes to use a common key for ESP sessions. This can be achieved by setting the peer-to-peer key component to zero, resulting in the final ESP key being derived solely from the Group Key provided by the GKMS. An additional alternative is to allow for a mixed model, where some nodes insist on unique key generation for their communications, while other nodes default to the group key. Although this approach is not currently favored, it is described here to support further discussion and exploration of the related security and operational trade-off in [I-D.xia-ipsecme-eesp-stateless-encryption]¶
This document uses the following terms defined in [RFC4301]: Encapsulating Security Payload (ESP), Security Association (SA), Security Policy Database (SPD).¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174]when, and only when, they appear in all capitals, as shown here.¶
A single administrative domain may consist of multiple nodes that need to communicate securely among themselves. The method described in [I-D.xia-ipsecme-eesp-stateless-encryption] proposes using one shared encryption key across all nodes. In contrast, this document defines a model in which each pair of nodes independently derives and uses a unique encryption key. This approach enhances confidentiality, limits the scope of key compromise, and allows scalable group operation.¶
Similar use cases can be found in large-scale distributed systems such as Google [PSP] and [Falcon], as well as [UEC-TSS], which employ similar concepts in the context of key derivation and key distribution without replicating full IKE exchanges between every pair of nodes. Additionally, [I-D.ietf-ipsecme-eesp] has highlighted scenarios where IKE is used solely for key derivation within data center environments, further motivating scalable approaches to key management in such settings.¶
We considered adapting concepts from Message Layer Security (MLS) [RFC9420] and Double Ratchet key update mechanisms, which ensure that only the communicating endpoints possess the necessary key material. However, these approaches introduce continuous operational overhead due to their frequent key updates and state evolution requirements. For the IPsec use case envisioned here, key derivation overhead must remain minimal, and once the keys are derived, there should be no operational need to periodically refresh the derivation material, except in cases of explicit rekeying or membership change events.¶
[I-D.ietf-ipsecme-g-ikev2] provides similar functionality, but is designed for multicast environments and requires each IPsec peer to exchange identity information over IKEv2. While there are potential conceptual overlaps with the approach described in this document, one primary drawback noted by the authors is that in G-IKEv2 the Group Controller/Key Server (GCKS) retains all cryptographic keys in a single location. This concentration of key material introduces a significant vulnerability, as compromise of the GCKS could expose the entire group’s secure communications. G-IKEv2 use symmetric key wrapping I guess more than one peer has the same keys?¶
If each node were to establish IKE sessions directly with every other node in the group, it would require maintaining N−1 control-plane associations per participant, along with the full state required for IKE negotiation and cryptographic identity management, including certificate handling. This approach imposes substantial operational and computational overhead, particularly as N grows large. It also increases the complexity of the IKE software stack and does not scale well to large deployments, making it impractical for dynamic or high-density network environments. When there is node removed from from the group some one has to announce that and update the group.¶
When one node, for example Alice, wishes to communicate securely with another node, Bob, using IPsec and ESP over IPv4 or IPv6, Alice sends a request to the Group Key Management Server (GKMS). The GKMS proxies the message to Bob and relays Bob’s response back to Alice.¶
Nodes Alice and Bob perform a Diffie-Hellman (DH) a.k.a Key Exchange(KE), as defined in [RFC7296], deriving symmetric keying material through a suitable. Key Derivation Function (KDF). A portion of this KE material is used to ensure that the GKMS cannot act as a man-in-the-middle. The resulting peer-to-peer key is long-lived and is combined, using an XOR operation, with the group key derived from the GKMS. The resulting composite key is used for ESP encryption, providing both scalability and strong key separation properties.¶
An early consideration was to use a key wrapping mechanism, such as public key wrapping or symmetric key wrapping, as described in [I-D.ietf-ipsecme-g-ikev2]. At present, the preferred method is to employ Diffie-Hellman (DH) key exchange to derive keys securely, avoiding the need to transmit raw key material. A public key signed could be also feasible, provided it includes strong authentication and verification steps. Another candidate mechanism for evaluation is [RSA-KEY-WRAPING], which may offer different operational trade-offs.¶
A node would use same key to IPsec with every other node. The GKMS would cache the node key and hand out. One disadvantages the ESP key is per node and not per node pair.¶
This is an essential in this configuration. This server or servers can authenticate each node on¶
It is a random key generated by the GKMS and distributed to all current members of the group. When GKMS remove a any members Group Keys must be re-generated and distributed.¶
When one or more IPsec peers are removed from the group by the Group Key Management Server (GKMS), the GKMS generates and distributes a new Group Key to all remaining authorized members. Upon receiving the new Group Key, each node MUST promptly derive updated ESP keys following the specified key derivation procedure. This process effectively revokes access for the removed peers, as they do not possess the new keying material. The previous keys MAY remain valid for a brief transition period to allow for synchronized rekeying.¶
[AA Note: An atomic rekey mechanism may be required to prevent race conditions during key transitions. In such a model, the GKMS would first distribute the new Group Key to all, or most, members and then send an atomic “activation” message indicating the time or event when the new key becomes active. This prevents scenarios where one node (e.g. Alice) begins transmitting with the new key before another peer (e.g., Bob) has received it, thereby maintaining synchronization across all group members.]¶
Time-based rekeying operates in a manner similar to member removal. At predefined intervals, the GKMS generates a new group key and securely distributes it to all active nodes. Upon receipt of the new keying material, each node derives the updated pairwise keys as specified by the group key derivation procedure. This mechanism ensures forward secrecy over time and limits the cryptoperiod of any given key without requiring disruption of existing group associations. The GKMS may keep track of SA estiablished time.¶
In the packet-based rekey model, the GKMS does not maintain counters for cryptographic operations, such as the number of packets processed or bytes protected by a given Security Association (SA). Instead, this mechanism is triggered locally by IPsec peers when a usage threshold is reached. The peer detecting the threshold condition initiates the rekey process, typically corresponding to the most active SA. This approach allows rekeying to occur dynamically based on traffic volume, providing an adaptive balance between security requirements and operational efficiency.¶
When adding new member the GKMS would share the current Group Key with the new Group member. Also share policies such as, DH/KE parameters. ESP crypto suite. KE groups will be an ordered list with first elemnt used to initiate Key Exchange (KE), [RFC9370] and [RFC7296].¶
The GKMS service within an administrative domain may be deployed in a redundant configuration to enhance both scalability and availability. Multiple GKMS instances can operate in parallel, enabling load distribution and fault tolerance. To maintain consistent cryptographic state across the domain, all GKMS instances MUST share the same set of Group Keys and associated management information. This ensures that any node can securely obtain or refresh keying material from any available GKMS without disruption to established security associations.¶
Each IPsec Peer which would like to send and receive ESP would have save 2 * N Keys minimum, for uninterrupted messages during Rekey and removal group member the node would have store 4 * Keys.¶
[PSP] | NIC Masterer Key + N send keys | un specified |
IKEv2 [RFC7296] | 2 x N or 4 x N(during rekey) | IKEv2 |
GKMS, this draft | ?? | ?? |
[UEC-TSS] | ?? | ?? |
[I-D.xia-ipsecme-eesp-stateless-encryption] | 2 Keys shared with group seerver | ?? |
Each pair of node need one Round Trip Time to derive keys. While the IKEv2 and ESP parameters are choosen by the GKMS server.¶
[AA NOTE we still have to work out the details] An IKEv2 IPsec peer is typically required to send Delete messages to each IKEv2 peer to cleanly terminate Security Associations (SAs), waiting for responses and retransmitting if necessary in case of message loss. This process, mandated for protocol [RFC7296] robustness, can introduce significant operational complexity and delay, particularly in large group environments.¶
As an alternative, the node may delegate SA deletion to the GKMS. In this model, the node transmits a list of IPsec peers and their corresponding SPIs to the GKMS, which then coordinates sending Delete messages to each peer. This centralization streamlines control plane operations and reduces overhead for individual nodes.¶
Message scalability for large number of nodes in the group think of thousands to possibly millions of nodes. Number of messages exchanges to derive unique key.¶
ACKs TBD¶
TBD¶
TBD¶