Internet-Draft IKEv2 Key Distribution October 2025
Antony & Klassert Expires 17 April 2026 [Page]
Workgroup:
IP Security Maintenance and Extensions
Published:
Intended Status:
Standards Track
Expires:
Authors:
A. Antony
secunet
S. Klassert
secunet

IKEv2 Scalable Group Key Distribution

Abstract

This document specifies method for IPsec key derivation and distribution within large groups using an extended Internet Key Exchange Protocol Version 2 (IKEv2). A Group Key Management Server (GKMS) provisions both a shared group key and proxy messages between nodes to derive pairwise keys pair unique to IPsec peers, enabling use of IPsec Encapsulating Security Payload (ESP) among multiple nodes. Each communicating pair derives a session-specific encryption keys from two inputs: the group key provided by the GKMS and an individual host-pair keys derived via IKEv2 exchange. The final key is derived by combining, XOR, these components, ensuring uniqueness per pair keys only known to the IPsec peers while maintaining fewer IKEv2 states on nodes, scalable state management for dynamic group membership. Some of the control functions are such Node authentication, deleting single SA is performed by the the GKMS.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 17 April 2026.

Table of Contents

1. Introduction

Modern distributed environments operating under a single administrative domain, such as AI or machine learning (ML) clusters, often comprise a large number of worker nodes that exchange sensitive data over IP networks. To ensure confidentiality and integrity of these communications, the IPsec Encapsulating Security Payload (ESP), [RFC4303], can be employed.

In this architecture, a Group Key Management Server (GKMS) functions as a trusted authority responsible for authenticating participating nodes and facilitating the derivation of cryptographic material within the group. Each node enrolls with the GKMS to obtain credentials and authorization for secure participation in the group communication framework.

Each pair of nodes within the group MUST derive and use a unique pairwise encryption key for ESP sessions. These keys are generated via an asymmetric derivation process coordinated by the GKMS, enabling controlled key distribution without requiring direct peer-to-peer IKE exchanges among all nodes. This method provides a scalable and efficient key management model that supports large, dynamic AI/ML clusters, while only communication pairs has the encryption keys.

The final ESP session key is derived from the current Group Key, generated by the GKMS and shared with the current group members, per session derived key.

An IPsec peer can delegate a majority of the IKEv2 control plane functions to the Group Key Management Server (GKMS), except key derviation For example, deletion of a Security Association (SA) from all group members can be centrally managed by the GKMS. When a node reboots, it is expected to delete its existing SAs with peers. Sending an IKEv2 Delete exchange individually with a large number of nodes can introduce significant overhead. Instead, the node can delegate the SA deletion responsibility to the GKMS, reducing control-plane complexity and improving operational efficiency.

1.1. Using Group Keys for ESP Sessions

One proposal is to relax the requirement for unique keys and permit all nodes to use a common key for ESP sessions. This can be achieved by setting the peer-to-peer key component to zero, resulting in the final ESP key being derived solely from the Group Key provided by the GKMS. An additional alternative is to allow for a mixed model, where some nodes insist on unique key generation for their communications, while other nodes default to the group key. Although this approach is not currently favored, it is described here to support further discussion and exploration of the related security and operational trade-off in [I-D.xia-ipsecme-eesp-stateless-encryption]

1.2. Terminology

This document uses the following terms defined in [RFC4301]: Encapsulating Security Payload (ESP), Security Association (SA), Security Policy Database (SPD).

2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174]when, and only when, they appear in all capitals, as shown here.

3. Usecase

A single administrative domain may consist of multiple nodes that need to communicate securely among themselves. The method described in [I-D.xia-ipsecme-eesp-stateless-encryption] proposes using one shared encryption key across all nodes. In contrast, this document defines a model in which each pair of nodes independently derives and uses a unique encryption key. This approach enhances confidentiality, limits the scope of key compromise, and allows scalable group operation.

Similar use cases can be found in large-scale distributed systems such as Google [PSP] and [Falcon], as well as [UEC-TSS], which employ similar concepts in the context of key derivation and key distribution without replicating full IKE exchanges between every pair of nodes. Additionally, [I-D.ietf-ipsecme-eesp] has highlighted scenarios where IKE is used solely for key derivation within data center environments, further motivating scalable approaches to key management in such settings.

We considered adapting concepts from Message Layer Security (MLS) [RFC9420] and Double Ratchet key update mechanisms, which ensure that only the communicating endpoints possess the necessary key material. However, these approaches introduce continuous operational overhead due to their frequent key updates and state evolution requirements. For the IPsec use case envisioned here, key derivation overhead must remain minimal, and once the keys are derived, there should be no operational need to periodically refresh the derivation material, except in cases of explicit rekeying or membership change events.

3.1. Notes

[I-D.ietf-ipsecme-g-ikev2] provides similar functionality, but is designed for multicast environments and requires each IPsec peer to exchange identity information over IKEv2. While there are potential conceptual overlaps with the approach described in this document, one primary drawback noted by the authors is that in G-IKEv2 the Group Controller/Key Server (GCKS) retains all cryptographic keys in a single location. This concentration of key material introduces a significant vulnerability, as compromise of the GCKS could expose the entire group’s secure communications. G-IKEv2 use symmetric key wrapping I guess more than one peer has the same keys?

3.2. Limitations of Full Mesh IKE Keying

If each node were to establish IKE sessions directly with every other node in the group, it would require maintaining N−1 control-plane associations per participant, along with the full state required for IKE negotiation and cryptographic identity management, including certificate handling. This approach imposes substantial operational and computational overhead, particularly as N grows large. It also increases the complexity of the IKE software stack and does not scale well to large deployments, making it impractical for dynamic or high-density network environments. When there is node removed from from the group some one has to announce that and update the group.

4. DH or Key Exchange to Derive Keys

When one node, for example Alice, wishes to communicate securely with another node, Bob, using IPsec and ESP over IPv4 or IPv6, Alice sends a request to the Group Key Management Server (GKMS). The GKMS proxies the message to Bob and relays Bob’s response back to Alice.

Nodes Alice and Bob perform a Diffie-Hellman (DH) a.k.a Key Exchange(KE), as defined in [RFC7296], deriving symmetric keying material through a suitable. Key Derivation Function (KDF). A portion of this KE material is used to ensure that the GKMS cannot act as a man-in-the-middle. The resulting peer-to-peer key is long-lived and is combined, using an XOR operation, with the group key derived from the GKMS. The resulting composite key is used for ESP encryption, providing both scalability and strong key separation properties.

5. Symetric key wrapped in node Public Key

An early consideration was to use a key wrapping mechanism, such as public key wrapping or symmetric key wrapping, as described in [I-D.ietf-ipsecme-g-ikev2]. At present, the preferred method is to employ Diffie-Hellman (DH) key exchange to derive keys securely, avoiding the need to transmit raw key material. A public key signed could be also feasible, provided it includes strong authentication and verification steps. Another candidate mechanism for evaluation is [RSA-KEY-WRAPING], which may offer different operational trade-offs.

A node would use same key to IPsec with every other node. The GKMS would cache the node key and hand out. One disadvantages the ESP key is per node and not per node pair.

6. Group Key Management Server

This is an essential in this configuration. This server or servers can authenticate each node on

6.1. Group Key

It is a random key generated by the GKMS and distributed to all current members of the group. When GKMS remove a any members Group Keys must be re-generated and distributed.

6.2. Remove group members

When one or more IPsec peers are removed from the group by the Group Key Management Server (GKMS), the GKMS generates and distributes a new Group Key to all remaining authorized members. Upon receiving the new Group Key, each node MUST promptly derive updated ESP keys following the specified key derivation procedure. This process effectively revokes access for the removed peers, as they do not possess the new keying material. The previous keys MAY remain valid for a brief transition period to allow for synchronized rekeying.

[AA Note: An atomic rekey mechanism may be required to prevent race conditions during key transitions. In such a model, the GKMS would first distribute the new Group Key to all, or most, members and then send an atomic “activation” message indicating the time or event when the new key becomes active. This prevents scenarios where one node (e.g. Alice) begins transmitting with the new key before another peer (e.g., Bob) has received it, thereby maintaining synchronization across all group members.]

6.3. Time-Based Rekey

Time-based rekeying operates in a manner similar to member removal. At predefined intervals, the GKMS generates a new group key and securely distributes it to all active nodes. Upon receipt of the new keying material, each node derives the updated pairwise keys as specified by the group key derivation procedure. This mechanism ensures forward secrecy over time and limits the cryptoperiod of any given key without requiring disruption of existing group associations. The GKMS may keep track of SA estiablished time.

6.4. Packet-Based Rekey

In the packet-based rekey model, the GKMS does not maintain counters for cryptographic operations, such as the number of packets processed or bytes protected by a given Security Association (SA). Instead, this mechanism is triggered locally by IPsec peers when a usage threshold is reached. The peer detecting the threshold condition initiates the rekey process, typically corresponding to the most active SA. This approach allows rekeying to occur dynamically based on traffic volume, providing an adaptive balance between security requirements and operational efficiency.

6.5. Adding a new member:

When adding new member the GKMS would share the current Group Key with the new Group member. Also share policies such as, DH/KE parameters. ESP crypto suite. KE groups will be an ordered list with first elemnt used to initiate Key Exchange (KE), [RFC9370] and [RFC7296].

6.6. GKMS Redundancy

The GKMS service within an administrative domain may be deployed in a redundant configuration to enhance both scalability and availability. Multiple GKMS instances can operate in parallel, enabling load distribution and fault tolerance. To maintain consistent cryptographic state across the domain, all GKMS instances MUST share the same set of Group Keys and associated management information. This ensures that any node can securely obtain or refresh keying material from any available GKMS without disruption to established security associations.

7. Node

7.1. Memory estimations

Each IPsec Peer which would like to send and receive ESP would have save 2 * N Keys minimum, for uninterrupted messages during Rekey and removal group member the node would have store 4 * Keys.

Table 1: Memory used for key storage
[PSP] NIC Masterer Key + N send keys un specified
IKEv2 [RFC7296] 2 x N or 4 x N(during rekey) IKEv2
GKMS, this draft ?? ??
[UEC-TSS] ?? ??
[I-D.xia-ipsecme-eesp-stateless-encryption] 2 Keys shared with group seerver ??

7.2. Scaling of messages

Each pair of node need one Round Trip Time to derive keys. While the IKEv2 and ESP parameters are choosen by the GKMS server.

8. Node deleting SA

[AA NOTE we still have to work out the details] An IKEv2 IPsec peer is typically required to send Delete messages to each IKEv2 peer to cleanly terminate Security Associations (SAs), waiting for responses and retransmitting if necessary in case of message loss. This process, mandated for protocol [RFC7296] robustness, can introduce significant operational complexity and delay, particularly in large group environments.

As an alternative, the node may delegate SA deletion to the GKMS. In this model, the node transmits a list of IPsec peers and their corresponding SPIs to the GKMS, which then coordinates sending Delete messages to each peer. This centralization streamlines control plane operations and reduces overhead for individual nodes.

9. Operational Considerations

Message scalability for large number of nodes in the group think of thousands to possibly millions of nodes. Number of messages exchanges to derive unique key.

10. Acknowledgments

ACKs TBD

11. Security Considerations

TBD

12. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC4301]
Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, , <https://www.rfc-editor.org/info/rfc4301>.
[RFC4303]
Kent, S., "IP Encapsulating Security Payload (ESP)", RFC 4303, DOI 10.17487/RFC4303, , <https://www.rfc-editor.org/info/rfc4303>.
[RFC7296]
Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T. Kivinen, "Internet Key Exchange Protocol Version 2 (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, , <https://www.rfc-editor.org/info/rfc7296>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/info/rfc8174>.
[RFC9370]
Tjhai, CJ., Tomlinson, M., Bartlett, G., Fluhrer, S., Van Geest, D., Garcia-Morchon, O., and V. Smyslov, "Multiple Key Exchanges in the Internet Key Exchange Protocol Version 2 (IKEv2)", RFC 9370, DOI 10.17487/RFC9370, , <https://www.rfc-editor.org/info/rfc9370>.

13. Informative References

[Falcon]
Google, "Google Falcon", <https://netdevconf.info/0x18/docs/netdev-0x18-paper43-talk-slides/Introduction%20to%20Falcon%20Reliable%20Transport.pdf>.
[I-D.ietf-ipsecme-eesp]
Klassert, S., Antony, A., and C. Hopps, "Enhanced Encapsulating Security Payload (EESP)", Work in Progress, Internet-Draft, draft-ietf-ipsecme-eesp-01, , <https://datatracker.ietf.org/doc/html/draft-ietf-ipsecme-eesp-01>.
[I-D.ietf-ipsecme-g-ikev2]
Smyslov, V. and B. Weis, "Group Key Management using IKEv2", Work in Progress, Internet-Draft, draft-ietf-ipsecme-g-ikev2-23, , <https://datatracker.ietf.org/doc/html/draft-ietf-ipsecme-g-ikev2-23>.
[I-D.xia-ipsecme-eesp-stateless-encryption]
Xia, L. and W. Jiang, "Stateless Encryption Scheme of Enhanced Encapsulating Security Payload (EESP)", Work in Progress, Internet-Draft, draft-xia-ipsecme-eesp-stateless-encryption-01, , <https://datatracker.ietf.org/doc/html/draft-xia-ipsecme-eesp-stateless-encryption-01>.
[PSP]
Google, "PSP Architecture Specification", <https://github.com/google/psp/blob/main/doc/PSP_Arch_Spec.pdf>.
[RFC9420]
Barnes, R., Beurdouche, B., Robert, R., Millican, J., Omara, E., and K. Cohn-Gordon, "The Messaging Layer Security (MLS) Protocol", RFC 9420, DOI 10.17487/RFC9420, , <https://www.rfc-editor.org/info/rfc9420>.
[RSA-KEY-WRAPING]
Google, "Key wrapping", <https://cloud.google.com/kms/docs/key-wrapping#rsaes_oaep_sha_1_2_aes_kwp>.
[UEC-TSS]
Ultra Ethernet Consortium, "Ultra Ethernet Specification v1.0.1", <https://ultraethernet.org/wp-content/uploads/sites/20/2025/10/UE-Specification-1.0.1.pdf>.

Appendix A. Additional Stuff

TBD

Authors' Addresses

Antony Antony
secunet Security Networks AG
Steffen Klassert
secunet Security Networks AG