rfc9681.original | rfc9681.txt | |||
---|---|---|---|---|
Network Working Group B. Decraene | Internet Engineering Task Force (IETF) B. Decraene | |||
Internet-Draft Orange | Request for Comments: 9681 Orange | |||
Intended status: Experimental L. Ginsberg | Category: Experimental L. Ginsberg | |||
Expires: 14 November 2024 Cisco Systems | ISSN: 2070-1721 Cisco Systems | |||
T. Li | T. Li | |||
Juniper Networks, Inc. | Juniper Networks, Inc. | |||
G. Solignac | G. Solignac | |||
M. Karasek | M. Karasek | |||
Cisco Systems | Cisco Systems | |||
G. Van de Velde | G. Van de Velde | |||
Nokia | Nokia | |||
T. Przygienda | T. Przygienda | |||
Juniper | Juniper | |||
13 May 2024 | October 2024 | |||
IS-IS Fast Flooding | IS-IS Fast Flooding | |||
draft-ietf-lsr-isis-fast-flooding-11 | ||||
Abstract | Abstract | |||
Current Link State Protocol Data Unit (PDU) flooding rates are much | Current Link State PDU flooding rates are much slower than what | |||
slower than what modern networks can support. The use of IS-IS at | modern networks can support. The use of IS-IS at larger scale | |||
larger scale requires faster flooding rates to achieve desired | requires faster flooding rates to achieve desired convergence goals. | |||
convergence goals. This document discusses the need for faster | This document discusses the need for faster flooding, the issues | |||
flooding, the issues around faster flooding, and some example | around faster flooding, and some example approaches to achieve faster | |||
approaches to achieve faster flooding. It also defines protocol | flooding. It also defines protocol extensions relevant to faster | |||
extensions relevant to faster flooding. | flooding. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
provisions of BCP 78 and BCP 79. | published for examination, experimental implementation, and | |||
evaluation. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document defines an Experimental Protocol for the Internet | |||
and may be updated, replaced, or obsoleted by other documents at any | community. This document is a product of the Internet Engineering | |||
time. It is inappropriate to use Internet-Drafts as reference | Task Force (IETF). It represents the consensus of the IETF | |||
material or to cite them other than as "work in progress." | community. It has received public review and has been approved for | |||
publication by the Internet Engineering Steering Group (IESG). Not | ||||
all documents approved by the IESG are candidates for any level of | ||||
Internet Standard; see Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 14 November 2024. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9681. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2024 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction | |||
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 | 2. Requirements Language | |||
3. Historical Behavior . . . . . . . . . . . . . . . . . . . . . 4 | 3. Historical Behavior | |||
4. Flooding Parameters TLV . . . . . . . . . . . . . . . . . . . 5 | 4. Flooding Parameters TLV | |||
4.1. LSP Burst Size sub-TLV . . . . . . . . . . . . . . . . . 6 | 4.1. LSP Burst Size Sub-TLV | |||
4.2. LSP Transmission Interval sub-TLV . . . . . . . . . . . . 6 | 4.2. LSP Transmission Interval Sub-TLV | |||
4.3. LSPs Per PSNP sub-TLV . . . . . . . . . . . . . . . . . . 6 | 4.3. LSPs per PSNP Sub-TLV | |||
4.4. Flags sub-TLV . . . . . . . . . . . . . . . . . . . . . . 7 | 4.4. Flags Sub-TLV | |||
4.5. Partial SNP Interval sub-TLV . . . . . . . . . . . . . . 7 | 4.5. PSNP Interval Sub-TLV | |||
4.6. Receive Window sub-TLV . . . . . . . . . . . . . . . . . 8 | 4.6. Receive Window Sub-TLV | |||
4.7. Operation on a LAN interface . . . . . . . . . . . . . . 8 | 4.7. Operation on a LAN Interface | |||
5. Performance improvement on the receiver . . . . . . . . . . . 9 | 5. Performance Improvement on the Receiver | |||
5.1. Rate of LSP Acknowledgments . . . . . . . . . . . . . . . 9 | 5.1. Rate of LSP Acknowledgments | |||
5.2. Packet Prioritization on Receive . . . . . . . . . . . . 10 | 5.2. Packet Prioritization on Receive | |||
6. Congestion and Flow Control . . . . . . . . . . . . . . . . . 11 | 6. Congestion and Flow Control | |||
6.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 11 | 6.1. Overview | |||
6.2. Congestion and Flow Control algorithm . . . . . . . . . . 11 | 6.2. Congestion and Flow Control Algorithm | |||
6.3. Transmitter Based Congestion Control Approach . . . . . . 19 | 6.3. Transmitter-Based Congestion Control Approach | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 | 7. IANA Considerations | |||
7.1. Flooding Parameters TLV . . . . . . . . . . . . . . . . . 21 | 7.1. Flooding Parameters TLV | |||
7.2. Registry: IS-IS Sub-TLV for Flooding Parameters TLV . . . 21 | 7.2. Registry: IS-IS Sub-TLV for Flooding Parameters TLV | |||
7.3. Registry: IS-IS Bit Values for Flooding Parameters Flags | 7.3. Registry: IS-IS Bit Values for Flooding Parameters Flags | |||
Sub-TLV . . . . . . . . . . . . . . . . . . . . . . . . . 22 | Sub-TLV | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 23 | 8. Security Considerations | |||
9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 | 9. References | |||
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 | 9.1. Normative References | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 | 9.2. Informative References | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 24 | Acknowledgments | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 25 | Contributors | |||
Appendix A. Changes / Author Notes . . . . . . . . . . . . . . . 25 | Authors' Addresses | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 26 | ||||
1. Introduction | 1. Introduction | |||
Link state IGPs such as Intermediate-System-to-Intermediate-System | Link state IGPs such as Intermediate System to Intermediate System | |||
(IS-IS) depend upon having consistent Link State Databases (LSDB) on | (IS-IS) depend upon having consistent Link State Databases (LSDBs) on | |||
all Intermediate Systems (ISs) in the network in order to provide | all Intermediate Systems (ISs) in the network in order to provide | |||
correct forwarding of data packets. When topology changes occur, | correct forwarding of data packets. When topology changes occur, | |||
new/updated Link State PDUs (LSPs) are propagated network-wide. The | new/updated Link State PDUs (LSPs) are propagated network-wide. The | |||
speed of propagation is a key contributor to convergence time. | speed of propagation is a key contributor to convergence time. | |||
IS-IS base specification [ISO10589] does not use flow or congestion | IS-IS base specification [ISO10589] does not use flow or congestion | |||
control but static flooding rates. Historically, flooding rates have | control but static flooding rates. Historically, flooding rates have | |||
been conservative - on the order of 10s of LSPs/second. This is the | been conservative -- on the order of tens of LSPs per second. This | |||
result of guidance in the base specification and early deployments | is the result of guidance in the base specification and early | |||
when the CPU and interface speeds were much slower and the area scale | deployments when the CPU and interface speeds were much slower and | |||
much smaller than they are today. | the area scale was much smaller than they are today. | |||
As IS-IS is deployed in greater scale both in the number of nodes in | As IS-IS is deployed in greater scale both in the number of nodes in | |||
an area and in the number of neighbors per node, the impact of the | an area and in the number of neighbors per node, the impact of the | |||
historic flooding rates becomes more significant. Consider the | historic flooding rates becomes more significant. Consider the | |||
bringup or failure of a node with 1000 neighbors. This will result | bring-up or failure of a node with 1000 neighbors. This will result | |||
in a minimum of 1000 LSP updates. At typical LSP flooding rates used | in a minimum of 1000 LSP updates. At typical LSP flooding rates used | |||
today (33 LSPs/second), it would take more than 30 seconds simply to | today (33 LSPs per second), it would take more than 30 seconds simply | |||
send the updated LSPs to a given neighbor. Depending on the diameter | to send the updated LSPs to a given neighbor. Depending on the | |||
of the network, achieving a consistent LSDB on all nodes in the | diameter of the network, achieving a consistent LSDB on all nodes in | |||
network could easily take a minute or more. | the network could easily take a minute or more. | |||
Increasing the LSP flooding rate therefore becomes an essential | Therefore, increasing the LSP flooding rate becomes an essential | |||
element of supporting greater network scale. | element of supporting greater network scale. | |||
Improving the LSP flooding rate is complementary to protocol | Improving the LSP flooding rate is complementary to protocol | |||
extensions that reduce LSP flooding traffic by reducing the flooding | extensions that reduce LSP flooding traffic by reducing the flooding | |||
topology such as Mesh Groups [RFC2973] or Dynamic Flooding | topology such as Mesh Groups [RFC2973] or Dynamic Flooding [RFC9667]. | |||
[I-D.ietf-lsr-dynamic-flooding] . Reduction of the flooding topology | Reduction of the flooding topology does not alter the number of LSPs | |||
does not alter the number of LSPs required to be exchanged between | required to be exchanged between two nodes, so increasing the overall | |||
two nodes, so increasing the overall flooding speed is still | flooding speed is still beneficial when such extensions are in use. | |||
beneficial when such extensions are in use. It is also possible that | It is also possible that the flooding topology can be reduced in ways | |||
the flooding topology can be reduced in ways that prefer the use of | that prefer the use of neighbors that support improved flooding | |||
neighbors that support improved flooding performance. | performance. | |||
With the goal of supporting faster flooding, this document introduces | With the goal of supporting faster flooding, this document introduces | |||
the signaling of additional flooding related parameters Section 4, | the signaling of additional flooding related parameters (Section 4), | |||
specifies some performance improvements on the receiver Section 5 and | specifies some performance improvements on the receiver (Section 5) | |||
introduces the use of flow and/or congestion control Section 6. | and introduces the use of flow and/or congestion control (Section 6). | |||
2. Requirements Language | 2. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
3. Historical Behavior | 3. Historical Behavior | |||
The base specification for IS-IS [ISO10589] was first published in | The base specification for IS-IS [ISO10589] was first published in | |||
1992 and updated in 2002. The update made no changes in regards to | 1992 and updated in 2002. The update made no changes in regards to | |||
suggested timer values. Convergence targets at the time were on the | suggested timer values. Convergence targets at the time were on the | |||
order of seconds and the specified timer values reflect that. Here | order of seconds, and the specified timer values reflect that. Here | |||
are some examples: | are some examples: | |||
minimumLSPGenerationInterval - This is the minimum time interval | | minimumLSPGenerationInterval - This is the minimum time interval | |||
between generation of Link State PDUs. A source Intermediate | | between generation of Link State PDUs. A source Intermediate | |||
system shall wait at least this long before re-generating one | | system shall wait at least this long before regenerating one of | |||
of its own Link State PDUs. | | its own Link State PDUs. [...] | |||
| | ||||
The recommended value is 30 seconds. | | A reasonable value is 30 s. | |||
| | ||||
minimumLSPTransmissionInterval - This is the amount of time an | | minimumLSPTransmissionInterval - This is the amount of time an | |||
Intermediate system shall wait before further propagating | | Intermediate system shall wait before further propagating | |||
another Link State PDU from the same source system. | | another Link State PDU from the same source system. [...] | |||
| | ||||
The recommended value is 5 seconds. | | A reasonable value is 5 s. | |||
| | ||||
partialSNPInterval - This is the amount of time between periodic | | partialSNPInterval - This is the amount of time between periodic | |||
action for transmission of Partial Sequence Number PDUs. | | action for transmission of Partial Sequence Number PDUs. It | |||
It shall be less than minimumLSPTransmissionInterval. | | shall be less than minimumLSPTransmissionInterval. [...] | |||
| | ||||
The recommended value is 2 seconds. | | A reasonable value is 2 s. | |||
Most relevant to a discussion of the LSP flooding rate is the | Most relevant to a discussion of the LSP flooding rate is the | |||
recommended interval between the transmission of two different LSPs | recommended interval between the transmission of two different LSPs | |||
on a given interface. | on a given interface. | |||
For broadcast interfaces, [ISO10589] defined: | For broadcast interfaces, [ISO10589] states: | |||
minimumBroadcastLSPTransmissionInterval - the minimum interval | | minimumBroadcastLSPTransmissionInterval indicates the minimum | |||
between PDU arrivals which can be processed by the slowest | | interval between PDU arrivals which can be processed by the | |||
Intermediate System on the LAN. | | slowest Intermediate System on the LAN. | |||
The default value was defined as 33 milliseconds. It is permitted to | The default value was defined as 33 milliseconds. It is permitted to | |||
send multiple LSPs "back-to-back" as a burst, but this was limited to | send multiple LSPs back to back as a burst, but this was limited to | |||
10 LSPs in a one second period. | 10 LSPs in a one-second period. | |||
Although this value was specific to LAN interfaces, this has commonly | Although this value was specific to LAN interfaces, this has commonly | |||
been applied by implementations to all interfaces though that was not | been applied by implementations to all interfaces though that was not | |||
the original intent of the base specification. In fact | the original intent of the base specification. In fact, | |||
Section 12.1.2.4.3 states: | Section 12.1.2.4.3 of [ISO10589] states: | |||
On point-to-point links the peak rate of arrival is limited only | | On point-to-point links the peak rate of arrival is limited only | |||
by the speed of the data link and the other traffic flowing on | | by the speed of the data link and the other traffic flowing on | |||
that link. | | that link. | |||
Although modern implementations have not strictly adhered to the 33 | Although modern implementations have not strictly adhered to the | |||
millisecond interval, it is commonplace for implementations to limit | 33-millisecond interval, it is commonplace for implementations to | |||
the flooding rate to the same order of magnitude: tens of | limit the flooding rate to the same order of magnitude: tens of | |||
milliseconds, and not the single digits or fractions of milliseconds | milliseconds, and not the single digits or fractions of milliseconds | |||
that are needed today. | that are needed today. | |||
In the past 20 years, significant work on achieving faster | In the past 20 years, significant work on achieving faster | |||
convergence, more specifically sub-second convergence, has resulted | convergence, more specifically sub-second convergence, has resulted | |||
in implementations modifying a number of the above timers in order to | in implementations modifying a number of the above timers in order to | |||
support faster signaling of topology changes. For example, | support faster signaling of topology changes. For example, | |||
minimumLSPGenerationInterval has been modified to support millisecond | minimumLSPGenerationInterval has been modified to support millisecond | |||
intervals, often with a backoff algorithm applied to prevent LSP | intervals, often with a backoff algorithm applied to prevent LSP | |||
generation storms in the event of rapid successive oscillations. | generation storms in the event of rapid successive oscillations. | |||
However, the flooding rate has not been fundamentally altered. | However, the flooding rate has not been fundamentally altered. | |||
4. Flooding Parameters TLV | 4. Flooding Parameters TLV | |||
This document defines a new Type-Length-Value tuple (TLV) called the | This document defines a new Type-Length-Value (TLV) tuple called the | |||
"Flooding Parameters TLV" that may be included in IS to IS Hellos | "Flooding Parameters TLV" that may be included in IS-IS Hellos (IIHs) | |||
(IIH) or Partial Sequence Number PDUs (PSNPs). It allows IS-IS | or Partial Sequence Number PDUs (PSNPs). It allows IS-IS | |||
implementations to advertise flooding-related parameters and | implementations to advertise flooding-related parameters and | |||
capabilities which may be used by the peer to support faster | capabilities that may be used by the peer to support faster flooding. | |||
flooding. | ||||
Type: 21 | ||||
Length: variable, the size in octets of the Value field | ||||
Value: One or more sub-TLVs | Type: 21 | |||
Length: variable; the size in octets of the Value field | ||||
Value: one or more sub-TLVs | ||||
Several sub-TLVs are defined in this document. The support of any | Several sub-TLVs are defined in this document. The support of any | |||
sub-TLV is OPTIONAL. | sub-TLV is OPTIONAL. | |||
For a given IS-IS adjacency, the Flooding Parameters TLV does not | For a given IS-IS adjacency, the Flooding Parameters TLV does not | |||
need to be advertised in each IIH or PSNP. An IS uses the latest | need to be advertised in each IIH or PSNP. An IS uses the latest | |||
received value for each parameter until a new value is advertised by | received value for each parameter until a new value is advertised by | |||
the peer. However, as IIHs and PSNPs are not reliably exchanged, and | the peer. However, as IIHs and PSNPs are not reliably exchanged and | |||
may never be received, parameters SHOULD be sent even if there is no | may never be received, parameters SHOULD be sent even if there is no | |||
change in value since the last transmission. For a parameter that | change in value since the last transmission. For a parameter that | |||
has never been advertised, an IS uses its local default value. That | has never been advertised, an IS uses its local default value. That | |||
value SHOULD be configurable on a per-node basis and MAY be | value SHOULD be configurable on a per-node basis and MAY be | |||
configurable on a per-interface basis. | configurable on a per-interface basis. | |||
4.1. LSP Burst Size sub-TLV | 4.1. LSP Burst Size Sub-TLV | |||
The LSP Burst Size sub-TLV advertises the maximum number of LSPs that | The LSP Burst Size sub-TLV advertises the maximum number of LSPs that | |||
the node can receive without an intervening delay between LSP | the node can receive without an intervening delay between LSP | |||
transmissions. | transmissions. | |||
Type: 1 | Type: 1 | |||
Length: 4 octets | ||||
Length: 4 octets | Value: number of LSPs that can be received back to back | |||
Value: number of LSPs that can be received back-to-back. | ||||
4.2. LSP Transmission Interval sub-TLV | 4.2. LSP Transmission Interval Sub-TLV | |||
The LSP Transmission Interval sub-TLV advertises the minimum | The LSP Transmission Interval sub-TLV advertises the minimum | |||
interval, in micro-seconds, between LSPs arrivals which can be | interval, in microseconds, between LSPs arrivals that can be | |||
sustained on this receiving interface. | sustained on this receiving interface. | |||
Type: 2 | Type: 2 | |||
Length: 4 octets | ||||
Length: 4 octets | Value: minimum interval, in microseconds, between two consecutive | |||
LSPs received after LSP Burst Size LSPs have been received | ||||
Value: minimum interval, in micro-seconds, between two consecutive | ||||
LSPs received after LSP Burst Size LSPs have been received | ||||
The LSP Transmission Interval is an advertisement of the receiver's | The LSP Transmission Interval is an advertisement of the receiver's | |||
sustainable LSP reception rate. This rate may be safely used by a | sustainable LSP reception rate. This rate may be safely used by a | |||
sender which do not support the flow control or congestion algorithm. | sender that does not support the flow control or congestion | |||
It may also be used as the minimal safe rate by flow control or | algorithm. It may also be used as the minimal safe rate by flow | |||
congestion algorithms in unexpected cases, e.g., when the receiver is | control or congestion algorithms in unexpected cases, e.g., when the | |||
not acknowledging LSPs anymore. | receiver is not acknowledging LSPs anymore. | |||
4.3. LSPs Per PSNP sub-TLV | 4.3. LSPs per PSNP Sub-TLV | |||
The LSP per PSNP (LPP) sub-TLV advertises the number of received LSPs | The LSP per PSNP (LPP) sub-TLV advertises the number of received LSPs | |||
that triggers the immediate sending of a PSNP to acknowledge them. | that triggers the immediate sending of a PSNP to acknowledge them. | |||
Type: 3 | Type: 3 | |||
Length: 2 octets | ||||
Length: 2 octets | Value: number of LSPs acknowledged per PSNP | |||
Value: number of LSPs acknowledged per PSNP | ||||
A node advertising this sub-TLV with a value for LPP MUST send a PSNP | A node advertising this sub-TLV with a value for LPP MUST send a PSNP | |||
once LPP LSPs have been received and need to be acknowledged. | once LPP LSPs have been received and need to be acknowledged. | |||
4.4. Flags sub-TLV | 4.4. Flags Sub-TLV | |||
The sub-TLV Flags advertises a set of flags. | The sub-TLV Flags advertises a set of flags. | |||
Type: 4 | Type: 4 | |||
Length: Indicates the length in octets (1-8) of the Value field. | ||||
Length: Indicates the length in octets (1-8) of the Value field. The | The length SHOULD be the minimum required to send all bits | |||
length SHOULD be the minimum required to send all bits that are set. | that are set. | |||
Value: list of flags | ||||
Value: List of flags. | ||||
0 1 2 3 4 5 6 7 ... | 0 1 2 3 4 5 6 7 ... | |||
+-+-+-+-+-+-+-+-+... | +-+-+-+-+-+-+-+-+... | |||
|O| ... | |O| ... | |||
+-+-+-+-+-+-+-+-+... | +-+-+-+-+-+-+-+-+... | |||
An LSP receiver sets the O-flag to indicate to the LSP sender that it | An LSP receiver sets the O-flag (Ordered acknowledgment) to indicate | |||
will acknowledge the LSPs in the order as received. A PSNP | to the LSP sender that it will acknowledge the LSPs in the order as | |||
acknowledging N LSPs is acknowledging the N oldest LSPs received. | received. A PSNP acknowledging N LSPs is acknowledging the N oldest | |||
The order inside the PSNP is meaningless. If the sender keeps track | LSPs received. The order inside the PSNP is meaningless. If the | |||
of the order of LSPs sent, this indication allows a fast detection of | sender keeps track of the order of LSPs sent, this indication allows | |||
the loss of an LSP. This MUST NOT be used to alter the | for fast detection of the loss of an LSP. This MUST NOT be used to | |||
retransmission timer for any LSP. This MAY be used to trigger a | alter the retransmission timer for any LSP. This MAY be used to | |||
congestion signal. | trigger a congestion signal. | |||
4.5. Partial SNP Interval sub-TLV | ||||
The Partial SNP Interval sub-TLV advertises the amount of time in | ||||
milliseconds between periodic action for transmission of Partial | ||||
Sequence Number PDUs. This time will trigger the sending of a PSNP | ||||
even if the number of unacknowledged LSPs received on a given | ||||
interface does not exceed LPP (Section 4.3). The time is measured | ||||
from the reception of the first unacknowledged LSP. | ||||
Type: 5 | 4.5. PSNP Interval Sub-TLV | |||
Length: 2 octets | The PSNP Interval sub-TLV advertises the amount of time in | |||
milliseconds between periodic action for transmission of PSNPs. This | ||||
time will trigger the sending of a PSNP even if the number of | ||||
unacknowledged LSPs received on a given interface does not exceed LPP | ||||
(Section 4.3). The time is measured from the reception of the first | ||||
unacknowledged LSP. | ||||
Value: partialSNPInterval in milliseconds | Type: 5 | |||
Length: 2 octets | ||||
Value: partialSNPInterval in milliseconds | ||||
A node advertising this sub-TLV SHOULD send a PSNP at least once per | A node advertising this sub-TLV SHOULD send a PSNP at least once per | |||
Partial SNP Interval if one or more unacknowledged LSPs have been | PSNP Interval if one or more unacknowledged LSPs have been received | |||
received on a given interface. | on a given interface. | |||
4.6. Receive Window sub-TLV | 4.6. Receive Window Sub-TLV | |||
The Receive Window (RWIN) sub-TLV advertises the maximum number of | The Receive Window (RWIN) sub-TLV advertises the maximum number of | |||
unacknowledged LSPs that the node can receive for a given adjacency. | unacknowledged LSPs that the node can receive for a given adjacency. | |||
Type: 6 | Type: 6 | |||
Length: 2 octets | ||||
Length: 2 octets | Value: maximum number of unacknowledged LSPs | |||
Value: maximum number of unacknowledged LSPs | ||||
4.7. Operation on a LAN interface | 4.7. Operation on a LAN Interface | |||
On a LAN interface, all LSPs are link-level multicasts. Each LSP | On a LAN interface, all LSPs are link-level multicasts. Each LSP | |||
sent will be received by all ISs on the LAN and each IS will receive | sent will be received by all ISs on the LAN, and each IS will receive | |||
LSPs from all transmitters. In this section, we clarify how the | LSPs from all transmitters. In this section, we clarify how the | |||
flooding parameters should be interpreted in the context of a LAN. | flooding parameters should be interpreted in the context of a LAN. | |||
An LSP receiver on a LAN will communicate its desired flooding | An LSP receiver on a LAN will communicate its desired flooding | |||
parameters using a single Flooding Parameters TLV, which will be | parameters using a single Flooding Parameters TLV, which will be | |||
received by all LSP transmitters. The flooding parameters sent by | received by all LSP transmitters. The flooding parameters sent by | |||
the LSP receiver MUST be understood as instructions from the LSP | the LSP receiver MUST be understood as instructions from the LSP | |||
receiver to each LSP transmitter about the desired maximum transmit | receiver to each LSP transmitter about the desired maximum transmit | |||
characteristics of each transmitter. The receiver is aware that | characteristics of each transmitter. The receiver is aware that | |||
there are multiple transmitters that can send LSPs to the receiver | there are multiple transmitters that can send LSPs to the receiver | |||
skipping to change at page 8, line 41 ¶ | skipping to change at line 353 ¶ | |||
advertising more conservative values, e.g., a higher LSP Transmission | advertising more conservative values, e.g., a higher LSP Transmission | |||
Interval. When the transmitters receive the LSP Transmission | Interval. When the transmitters receive the LSP Transmission | |||
Interval value advertised by an LSP receiver, the transmitters should | Interval value advertised by an LSP receiver, the transmitters should | |||
rate-limit LSPs according to the advertised flooding parameters. | rate-limit LSPs according to the advertised flooding parameters. | |||
They should not apply any further interpretation to the flooding | They should not apply any further interpretation to the flooding | |||
parameters advertised by the receiver. | parameters advertised by the receiver. | |||
A given LSP transmitter will receive multiple flooding parameter | A given LSP transmitter will receive multiple flooding parameter | |||
advertisements from different receivers that may include different | advertisements from different receivers that may include different | |||
flooding parameter values. A given transmitter SHOULD use the most | flooding parameter values. A given transmitter SHOULD use the most | |||
convervative value on a per-parameter basis. For example, if the | conservative value on a per-parameter basis. For example, if the | |||
transmitter receives multiple LSP Burst Size values, it should use | transmitter receives multiple LSP Burst Size values, it should use | |||
the smallest value. | the smallest value. | |||
The Designated Intermediate System (DIS) plays a special role in the | The Designated Intermediate System (DIS) plays a special role in the | |||
operation of flooding on the LAN as it is responsible for responding | operation of flooding on the LAN as it is responsible for responding | |||
to PSNPs sent on the LAN circuit which are used to request LSPs that | to PSNPs sent on the LAN circuit that are used to request LSPs that | |||
the sender of the PSNP does not have. If the DIS does not support | the sender of the PSNP does not have. If the DIS does not support | |||
faster flooding, this will impact the maximum flooding speed which | faster flooding, this will impact the maximum flooding speed that | |||
could occur on a LAN. Use of LAN priority to prefer a node which | could occur on a LAN. Use of LAN priority to prefer a node that | |||
supports faster flooding in the DIS election may be useful. | supports faster flooding in the DIS election may be useful. | |||
NOTE: The focus of work used to develop the example algorithms | Note: The focus of work used to develop the example algorithms | |||
discussed later in this document focused on operation over point-to- | discussed later in this document focused on operation over point-to- | |||
point interfaces. A full discussion of how best to do faster | point interfaces. A full discussion of how best to do faster | |||
flooding on a LAN interface is therefore out of scope for this | flooding on a LAN interface is therefore out of scope for this | |||
document. | document. | |||
5. Performance improvement on the receiver | 5. Performance Improvement on the Receiver | |||
This section defines two behaviors that SHOULD be implemented on the | This section defines two behaviors that SHOULD be implemented on the | |||
receiver. | receiver. | |||
5.1. Rate of LSP Acknowledgments | 5.1. Rate of LSP Acknowledgments | |||
On point-to-point networks, PSNPs provide acknowledgments for | On point-to-point networks, PSNPs provide acknowledgments for | |||
received LSPs. [ISO10589] suggests that some delay be used when | received LSPs. [ISO10589] suggests using some delay when sending | |||
sending PSNPs. This provides some optimization as multiple LSPs can | PSNPs. This provides some optimization as multiple LSPs can be | |||
be acknowledged by a single PSNP. | acknowledged by a single PSNP. | |||
Faster LSP flooding benefits from a faster feedback loop. This | Faster LSP flooding benefits from a faster feedback loop. This | |||
requires a reduction in the delay in sending PSNPs. | requires a reduction in the delay in sending PSNPs. | |||
For the generation of PSNPs, the receiver SHOULD use a | For the generation of PSNPs, the receiver SHOULD use a | |||
partialSNPInterval smaller than the one defined in [ISO10589]. The | partialSNPInterval smaller than the one defined in [ISO10589]. The | |||
choice of this lower value is a local choice. It may depend on the | choice of this lower value is a local choice. It may depend on the | |||
available processing power of the node, the number of adjacencies, | available processing power of the node, the number of adjacencies, | |||
and the requirement to synchronize the LSDB more quickly. 200 ms | and the requirement to synchronize the LSDB more quickly. 200 ms | |||
seems to be a reasonable value. | seems to be a reasonable value. | |||
In addition to the timer-based partialSNPInterval, the receiver | In addition to the timer-based partialSNPInterval, the receiver | |||
SHOULD keep track of the number of unacknowledged LSPs per circuit | SHOULD keep track of the number of unacknowledged LSPs per circuit | |||
and level. When this number exceeds a preset threshold of LSPs Per | and level. When this number exceeds a preset threshold of LSPs per | |||
PSNP (LPP), the receiver SHOULD immediately send a PSNP without | PSNP (LPP), the receiver SHOULD immediately send a PSNP without | |||
waiting for the PSNP timer to expire. In the case of a burst of | waiting for the PSNP timer to expire. In the case of a burst of | |||
LSPs, this allows for more frequent PSNPs, giving faster feedback to | LSPs, this allows more frequent PSNPs, giving faster feedback to the | |||
the sender. Outside of the burst case, the usual time-based PSNP | sender. Outside of the burst case, the usual time-based PSNP | |||
approach comes into effect. | approach comes into effect. | |||
The smaller the LPP, the faster the feedback to the sender and | The smaller the LPP is, the faster the feedback to the sender and | |||
possibly the higher the rate if the rate is limited by the end to end | possibly the higher the rate if the rate is limited by the end-to-end | |||
RTT (link RTT + time to acknowledge). This may result in an increase | RTT (link RTT + time to acknowledge). This may result in an increase | |||
in the number of PSNPs sent which may increase CPU and IO load on | in the number of PSNPs sent, which may increase CPU and IO load on | |||
both the sender and receiver. The LPP should be less than or equal | both the sender and receiver. The LPP should be less than or equal | |||
to 90 as this is the maximum number of LSPs that can be acknowledged | to 90 as this is the maximum number of LSPs that can be acknowledged | |||
in a PSNP at common MTU sizes, hence waiting longer would not reduce | in a PSNP at common MTU sizes; hence, waiting longer would not reduce | |||
the number of PSNPs sent but would delay the acknowledgements. LPP | the number of PSNPs sent but would delay the acknowledgments. LPP | |||
should not be chosen too high as the congestion control starts with a | should not be chosen too high as the congestion control starts with a | |||
congestion window of LPP+1. Based on experimental evidence, 15 | congestion window of LPP + 1. Based on experimental evidence, 15 | |||
unacknowledged LSPs is a good value assuming that the Receive Window | unacknowledged LSPs is a good value, assuming that the Receive Window | |||
is at least 30. More frequent PSNPs gives the transmitter more | is at least 30. More frequent PSNPs give the transmitter more | |||
feedback on receiver progress, allowing the transmitter to continue | feedback on receiver progress, allowing the transmitter to continue | |||
transmitting while not burdening the receiver with undue overhead. | transmitting while not burdening the receiver with undue overhead. | |||
By deploying both the time-based and the threshold-based PSNP | By deploying both the time-based and the threshold-based PSNP | |||
approaches, the receiver can be adaptive to both LSP bursts and | approaches, the receiver can be adaptive to both LSP bursts and | |||
infrequent LSP updates. | infrequent LSP updates. | |||
As PSNPs also consume link bandwidth, packet-queue space, and | As PSNPs also consume link bandwidth, packet-queue space, and | |||
protocol-processing time on receipt, the increased sending of PSNPs | protocol-processing time on receipt, the increased sending of PSNPs | |||
should be taken into account when considering the rate at which LSPs | should be taken into account when considering the rate at which LSPs | |||
skipping to change at page 10, line 29 ¶ | skipping to change at line 438 ¶ | |||
There are three classes of PDUs sent by IS-IS: | There are three classes of PDUs sent by IS-IS: | |||
* Hellos | * Hellos | |||
* LSPs | * LSPs | |||
* Complete Sequence Number PDUs (CSNPs) and PSNPs | * Complete Sequence Number PDUs (CSNPs) and PSNPs | |||
Implementations today may prioritize the reception of Hellos over | Implementations today may prioritize the reception of Hellos over | |||
LSPs and Sequence Number PDUs (SNPs) in order to prevent a burst of | LSPs and Sequence Number PDUs (SNPs) in order to prevent a burst of | |||
LSP updates from triggering an adjacency timeout which in turn would | LSP updates from triggering an adjacency timeout, which in turn would | |||
require additional LSPs to be updated. | require additional LSPs to be updated. | |||
CSNPs and PSNPs serve to trigger or acknowledge the transmission of | CSNPs and PSNPs serve to trigger or acknowledge the transmission of | |||
specified LSPs. On a point-to-point link, PSNPs acknowledge the | specified LSPs. On a point-to-point link, PSNPs acknowledge the | |||
receipt of one or more LSPs. For this reason, [ISO10589] specifies a | receipt of one or more LSPs. For this reason, [ISO10589] specifies a | |||
delay (partialSNPInterval) before sending a PSNP so that the number | delay (partialSNPInterval) before sending a PSNP so that the number | |||
of PSNPs required to be sent is reduced. On receipt of a PSNP, the | of PSNPs required to be sent is reduced. On receipt of a PSNP, the | |||
set of LSPs acknowledged by that PSNP can be marked so that they do | set of LSPs acknowledged by that PSNP can be marked so that they do | |||
not need to be retransmitted. | not need to be retransmitted. | |||
If a PSNP is dropped on reception, the set of LSPs advertised in the | If a PSNP is dropped on reception, the set of LSPs advertised in the | |||
PSNP cannot be marked as acknowledged and this results in needless | PSNP cannot be marked as acknowledged, and this results in needless | |||
retransmissions that will further delay transmission of other LSPs | retransmissions that further delay transmission of other LSPs that | |||
that are yet to be transmitted. It may also make it more likely that | are yet to be transmitted. It may also make it more likely that a | |||
a receiver becomes overwhelmed by LSP transmissions. | receiver becomes overwhelmed by LSP transmissions. | |||
Therefore implementations SHOULD prioritize IS-IS PDUs on the way | Therefore, implementations SHOULD prioritize IS-IS PDUs on the way | |||
from the incoming interface to the IS-IS process. The relative | from the incoming interface to the IS-IS process. The relative | |||
priority of packets in decreasing order SHOULD be: Hellos, SNPs, | priority of packets in decreasing order SHOULD be: Hellos, SNPs, and | |||
LSPs. Implementations MAY also prioritize IS-IS packets over other | LSPs. Implementations MAY also prioritize IS-IS packets over other | |||
protocols which are less critical for the router or network, less | protocols, which are less critical for the router or network, less | |||
sensitive to delay or more bursty (e.g., BGP). | sensitive to delay, or more bursty (e.g., BGP). | |||
6. Congestion and Flow Control | 6. Congestion and Flow Control | |||
6.1. Overview | 6.1. Overview | |||
Ensuring the goodput between two entities is a layer-4 responsibility | Ensuring the goodput between two entities is a Layer 4 responsibility | |||
as per the OSI model. A typical example is the TCP protocol defined | as per the OSI model. A typical example is the TCP protocol defined | |||
in [RFC9293] that provides flow control, congestion control, and | in [RFC9293] that provides flow control, congestion control, and | |||
reliability. | reliability. | |||
Flow control creates a control loop between a transmitter and a | Flow control creates a control loop between a transmitter and a | |||
receiver so that the transmitter does not overwhelm the receiver. | receiver so that the transmitter does not overwhelm the receiver. | |||
TCP provides a means for the receiver to govern the amount of data | TCP provides a means for the receiver to govern the amount of data | |||
sent by the sender through the use of a sliding window. | sent by the sender through the use of a sliding window. | |||
Congestion control prevents the set of transmitters from overwhelming | Congestion control prevents the set of transmitters from overwhelming | |||
the path of the packets between two IS-IS implementations. This path | the path of the packets between two IS-IS implementations. This path | |||
typically includes a point-to-point link between two IS-IS neighbors | typically includes a point-to-point link between two IS-IS neighbors, | |||
which is usually over-sized compared to the capability of the IS-IS | which is usually oversized compared to the capability of the IS-IS | |||
speakers, but potentially some internal elements inside each neighbor | speakers, but potentially also includes some internal elements inside | |||
such as switching fabric, line card CPU, and forwarding plane buffers | each neighbor such as switching fabric, line card CPU, and forwarding | |||
that may experience congestion. These resources may be shared across | plane buffers that may experience congestion. These resources may be | |||
multiple IS-IS adjacencies for the system and it is the | shared across multiple IS-IS adjacencies for the system, and it is | |||
responsibility of congestion control to ensure that these are shared | the responsibility of congestion control to ensure that these are | |||
reasonably. | shared reasonably. | |||
Reliability provides loss detection and recovery. IS-IS already has | Reliability provides loss detection and recovery. IS-IS already has | |||
mechanisms to ensure the reliable transmission of LSPs. This is not | mechanisms to ensure the reliable transmission of LSPs. This is not | |||
changed by this document. | changed by this document. | |||
The following two sections provide two Flow and/or Congestion control | Sections 6.2 and 6.3 provide two flow and/or congestion control | |||
algorithms that may be implemented by taking advantage of the | algorithms that may be implemented by taking advantage of the | |||
extensions defined in this document. The signal that these IS-IS | extensions defined in this document. The signal that these IS-IS | |||
extensions defined in Section 4 and Section 5 provide are generic and | extensions (defined in Sections 4 and 5) provide is generic and is | |||
are designed to support different sender-side algorithms. A sender | designed to support different sender-side algorithms. A sender can | |||
can unilaterally choose a different algorithm to use. | unilaterally choose a different algorithm to use. | |||
6.2. Congestion and Flow Control algorithm | 6.2. Congestion and Flow Control Algorithm | |||
6.2.1. Flow control | 6.2.1. Flow Control | |||
A flow control mechanism creates a control loop between a single | A flow control mechanism creates a control loop between a single | |||
instance of a transmitter and a single receiver. This section uses a | transmitter and a single receiver. This section uses a mechanism | |||
mechanism similar to the TCP receive window to allow the receiver to | similar to the TCP receive window to allow the receiver to govern the | |||
govern the amount of data sent by the sender. This receive window | amount of data sent by the sender. This receive window (RWIN) | |||
('rwin') indicates an allowed number of LSPs that the sender may | indicates an allowed number of LSPs that the sender may transmit | |||
transmit before waiting for an acknowledgment. The size of the | before waiting for an acknowledgment. The size of the receive | |||
receive window, in units of LSPs, is initialized with the value | window, in units of LSPs, is initialized with the value advertised by | |||
advertised by the receiver in the Receive Window sub-TLV. If no | the receiver in the Receive Window sub-TLV. If no value is | |||
value is advertised, the transmitter should initialize rwin with its | advertised, the transmitter should initialize RWIN with its locally | |||
locally configured value for this neighbor. | configured value for the associated neighbor. | |||
When the transmitter sends a set of LSPs to the receiver, it | When the transmitter sends a set of LSPs to the receiver, it | |||
subtracts the number of LSPs sent from rwin. If the transmitter | subtracts the number of LSPs sent from RWIN. If the transmitter | |||
receives a PSNP, then rwin is incremented for each acknowledged LSP. | receives a PSNP, then RWIN is incremented for each acknowledged LSP. | |||
The transmitter must ensure that the value of rwin never goes | The transmitter must ensure that the value of RWIN never goes | |||
negative. | negative. | |||
The RWIN value is of importance when the RTT is the limiting factor | The RWIN value is of importance when the RTT is the limiting factor | |||
for the throughput. In this case the optimal size is the desired LSP | for the throughput. In this case, the optimal size is the desired | |||
rate multiplied by the RTT. The RTT being the addition of the link | LSP rate multiplied by the RTT. The RTT is the addition of the link | |||
RTT plus the time taken by the receiver to acknowledge the first | RTT plus the time taken by the receiver to acknowledge the first | |||
received LSP in its PSNP. 50 or 100 may be reasonable default | received LSP in its PSNP. The values 50 or 100 may be reasonable | |||
numbers. As an example, a RWIN of 100 requires a control plane input | default numbers for RTT. As an example, an RWIN of 100 requires a | |||
buffer of 150 kbytes per neighbor assuming an IS-IS MTU of 1500 | control plane input buffer of 150 kbytes per neighbor (assuming an | |||
octets and limits the throughput to 10000 LSPs per second and per | IS-IS MTU of 1500 octets) and limits the throughput to 10000 LSPs per | |||
neighbor for a link RTT of 10 ms. With the same RWIN, the throughput | second and per neighbor for a link RTT of 10 ms. With the same RWIN, | |||
limitation is 2000 LSP per second when the RTT is 50ms. That's the | the throughput limitation is 2000 LSPs per second when the RTT is 50 | |||
maximum throughput assuming no other limitations such as CPU | ms. That's the maximum throughput assuming no other limitations such | |||
limitations. | as CPU limitations. | |||
Equally RTT is of importance for the performance. That is why the | Equally, RTT is of importance for the performance. That is why the | |||
performance improvements on the receiver specified in section | performance improvements on the receiver specified in Section 5 are | |||
Section 5 are important to achieve good throughput. If the receiver | important to achieve good throughput. If the receiver does not | |||
does not support those performance improvements, in the worst case | support those performance improvements, in the worst case (small RWIN | |||
(small RWIN and high RTT) the throughput will be limited by the LSP | and high RTT) the throughput will be limited by the LSP Transmission | |||
Transmission Interval as defined in section Section 4.2. | Interval as defined in Section 4.2. | |||
6.2.1.1. Operation on a point to point interface | 6.2.1.1. Operation on a Point-to-Point Interface | |||
By sending the Receive Window sub-TLV, a node advertises to its | By sending the Receive Window sub-TLV, a node advertises to its | |||
neighbor its ability to receive that many un-acknowledged LSPs from | neighbor its ability to receive that many unacknowledged LSPs from | |||
the neighbor. This is akin to a receive window or sliding window in | the neighbor. This is akin to a receive window or sliding window in | |||
flow control. In some implementations, this value should reflect the | flow control. In some implementations, this value should reflect the | |||
IS-IS socket buffer size. Special care must be taken to leave space | IS-IS socket buffer size. Special care must be taken to leave space | |||
for CSNPs and PSNPs and IIHs if they share the same input queue. In | for CSNPs, PSNPs, and IIHs if they share the same input queue. In | |||
this case, this document suggests advertising an LSP Receive Window | this case, this document suggests advertising an LSP Receive Window | |||
corresponding to half the size of the IS-IS input queue. | corresponding to half the size of the IS-IS input queue. | |||
By advertising an LSP Transmission Interval sub-TLV, a node | By advertising an LSP Transmission Interval sub-TLV, a node | |||
advertises its ability to receive LSPs separated by at least the | advertises its ability to receive LSPs separated by at least the | |||
advertised value, outside of LSP bursts. | advertised value, outside of LSP bursts. | |||
By advertising an LSP Burst Size sub-TLV, a node advertises its | By advertising an LSP Burst Size sub-TLV, a node advertises its | |||
ability to receive that number of LSPs back-to-back. | ability to receive that number of LSPs back to back. | |||
The LSP transmitter MUST NOT exceed these parameters. After having | The LSP transmitter MUST NOT exceed these parameters. After having | |||
sent a full burst of LSPs, it MUST send the subsequent LSPs with a | sent a full burst of LSPs, it MUST send the subsequent LSPs with a | |||
minimum of LSP Transmission Interval between LSP transmissions. For | minimum of LSP Transmission Interval between LSP transmissions. For | |||
CPU scheduling reasons, this rate MAY be averaged over a small | CPU scheduling reasons, this rate MAY be averaged over a small | |||
period, e.g., 10-30ms. | period, e.g., 10-30 ms. | |||
If either the LSP transmitter or receiver does not adhere to these | If either the LSP transmitter or receiver does not adhere to these | |||
parameters, for example because of transient conditions, this doesn't | parameters, for example, because of transient conditions, this | |||
result in a fatal condition for IS-IS operation. In the worst case, | doesn't result in a fatal condition for IS-IS operation. In the | |||
an LSP is lost at the receiver and this situation is already remedied | worst case, an LSP is lost at the receiver, and this situation is | |||
by mechanisms in [ISO10589]. After a few seconds, neighbors will | already remedied by mechanisms in [ISO10589]. After a few seconds, | |||
exchange PSNPs (for point-to-point interfaces) or CSNPs (for | neighbors will exchange PSNPs (for point-to-point interfaces) or | |||
broadcast interfaces) and recover from the lost LSPs. This worst | CSNPs (for broadcast interfaces) and recover from the lost LSPs. | |||
case should be avoided as those additional seconds impact convergence | This worst case should be avoided as those additional seconds impact | |||
time since the LSDB is not fully synchronized. Hence it is better to | convergence time since the LSDB is not fully synchronized. Hence, it | |||
err on the conservative side and to under-run the receiver rather | is better to err on the conservative side and to under-run the | |||
than over-run it. | receiver rather than over-run it. | |||
6.2.1.2. Operation on a broadcast LAN interface | 6.2.1.2. Operation on a Broadcast LAN Interface | |||
Flow and congestion control on a LAN interface is out of scope for | Flow and congestion control on a LAN interface is out of scope for | |||
this document. | this document. | |||
6.2.2. Congestion Control | 6.2.2. Congestion Control | |||
Whereas flow control prevents the sender from overwhelming the | Whereas flow control prevents the sender from overwhelming the | |||
receiver, congestion control prevents senders from overwhelming the | receiver, congestion control prevents senders from overwhelming the | |||
network. For an IS-IS adjacency, the network between two IS-IS | network. For an IS-IS adjacency, the network between two IS-IS | |||
neighbors is relatively limited in scope and includes a single link | neighbors is relatively limited in scope and includes a single link | |||
which is typically over-sized compared to the capability of the IS-IS | that is typically oversized compared to the capability of the IS-IS | |||
speakers. In situations where the probability of LSP drop is low, | speakers. In situations where the probability of LSP drop is low, | |||
flow control Section 6.2.1 is expected to give good results, without | flow control (Section 6.2.1) is expected to give good results, | |||
the need to implement congestion control. Otherwise, adding | without the need to implement congestion control. Otherwise, adding | |||
congestion control will help handling congestion of LSPs in the | congestion control will help handling congestion of LSPs in the | |||
receiver. | receiver. | |||
This section describes one sender-side congestion control algorithm | This section describes one sender-side congestion control algorithm | |||
largely inspired by the TCP congestion control algorithm [RFC5681]. | largely inspired by the TCP congestion control algorithm [RFC5681]. | |||
The proposed algorithm uses a variable congestion window 'cwin'. It | The proposed algorithm uses a variable congestion window 'cwin'. It | |||
plays a role similar to the receive window described above. The main | plays a role similar to the receive window described above. The main | |||
difference is that cwin is adjusted dynamically according to various | difference is that cwin is adjusted dynamically according to various | |||
events described below. | events described below. | |||
6.2.2.1. Core algorithm | 6.2.2.1. Core Algorithm | |||
In its simplest form, the congestion control algorithm looks like the | In its simplest form, the congestion control algorithm looks like the | |||
following: | following: | |||
+---------------+ | +---------------+ | |||
| | | | | | |||
| v | | v | |||
| +----------------------+ | | +----------------------+ | |||
| | Congestion avoidance | | | | Congestion avoidance | | |||
| + ---------------------+ | | + ---------------------+ | |||
| | | | | | |||
| | Congestion signal | | | Congestion signal | |||
----------------+ | ----------------+ | |||
Figure 1 | Figure 1 | |||
The algorithm starts with cwin = cwin0 = LPP + 1. In the congestion | The algorithm starts with cwin = cwin0 = LPP + 1. In the congestion | |||
avoidance phase, cwin increases as LSPs are acked: for every acked | avoidance phase, cwin increases as LSPs are acked: for every acked | |||
LSP, cwin += 1 / cwin without exceeding RWIN. When LSPs are | LSP, cwin += 1 / cwin without exceeding RWIN. When LSPs are | |||
exchanged, cwin LSPs will be acknowledged in 1 RTT, meaning cwin(t) = | exchanged, cwin LSPs will be acknowledged in 1 RTT, meaning cwin(t) = | |||
t/RTT + cwin0. Since the RTT is low in many IS-IS deployments, the | t/RTT + cwin0. Since the RTT is low in many IS-IS deployments, the | |||
sending rate can reach fast rates in short periods of time. | sending rate can reach fast rates in short periods of time. | |||
When updating cwin, it must not become higher than the number of LSPs | When updating cwin, it must not become higher than the number of LSPs | |||
waiting to be sent, otherwise the sending will not be paced by the | waiting to be sent, otherwise the sending will not be paced by the | |||
receiving of acks. Said differently, tx pressure is needed to | receiving of acks. Said differently, transmission pressure is needed | |||
maintain and increase cwin. | to maintain and increase cwin. | |||
When the congestion signal is triggered, cwin is set back to its | When the congestion signal is triggered, cwin is set back to its | |||
initial value and the congestion avoidance phase starts again. | initial value, and the congestion avoidance phase starts again. | |||
6.2.2.2. Congestion signals | 6.2.2.2. Congestion Signals | |||
The congestion signal can take various forms. The more reactive the | The congestion signal can take various forms. The more reactive the | |||
congestion signals, the fewer LSPs will be lost due to congestion. | congestion signals, the fewer LSPs will be lost due to congestion. | |||
However, overly aggressive congestion signals will cause a sender to | However, overly aggressive congestion signals will cause a sender to | |||
keep a very low sending rate even without actual congestion on the | keep a very low sending rate even without actual congestion on the | |||
path. | path. | |||
Two practical signals are given below. | Two practical signals are given below. | |||
Delay: When receiving acknowledgements, a sender estimates the | 1. Delay: When receiving acknowledgments, a sender estimates the | |||
acknowledgement time of the receiver. Based on this estimation, it | acknowledgment time of the receiver. Based on this estimation, | |||
can infer that a packet was lost, and infer congestion on the path. | it can infer that a packet was lost and that the path is | |||
congested. | ||||
There can be a timer per LSP, but this can become costly for | There can be a timer per LSP, but this can become costly for | |||
implementations. It is possible to use only a single timer t1 for | implementations. It is possible to use only a single timer t1 | |||
all LSPs: during t1, sent LSPs are recorded in a list list_1. Once | for all LSPs: during t1, sent LSPs are recorded in a list list_1. | |||
the RTT is over, list_1 is kept and another list list_2 is used to | Once the RTT is over, list_1 is kept and another list, list_2, is | |||
store the next LSPs. LSPs are removed from the lists when acked. At | used to store the next LSPs. LSPs are removed from the lists | |||
the end of the second t1 period, every LSP in list_1 should have been | when acked. At the end of the second t1 period, every LSP in | |||
acked, so list_1 is checked to be empty. list_1 can then be reused | list_1 should have been acked, so list_1 is checked to be empty. | |||
for the next RTT. | list_1 can then be reused for the next RTT. | |||
There are multiple strategies to set the timeout value t1. It should | There are multiple strategies to set the timeout value t1. It | |||
be based on measurements of the maximum acknowledgement time (MAT) of | should be based on measurements of the maximum acknowledgment | |||
each PSNP. The simplest one is to use three times the RTT. | time (MAT) of each PSNP. Using three times the RTT is the | |||
Alternatively an exponential moving average of the MATs, like | simplest strategy; alternatively, an exponential moving average | |||
[RFC6298]. A more elaborate one is to take a running maximum of the | of the MATs, as described in [RFC6298], can be used. A more | |||
MATs over a period of a few seconds. This value should include a | elaborate one is to take a running maximum of the MATs over a | |||
margin of error to avoid false positives (e.g., estimated MAT measure | period of a few seconds. This value should include a margin of | |||
variance) which would have a significant impact on performance. | error to avoid false positives (e.g., estimated MAT measure | |||
variance), which would have a significant impact on performance. | ||||
Loss: if the receiver has signaled the O-flag (Ordered | 2. Loss: if the receiver has signaled the O-flag (see Section 4.4), | |||
acknowledgement) Section 4.4, a sender MAY record its sending order | a sender MAY record its sending order and check that | |||
and check that acknowledgements arrive in the same order. If not, | acknowledgments arrive in the same order. If not, some LSPs are | |||
some LSPs are missing and this MAY be used to trigger a congestion | missing, and this MAY be used to trigger a congestion signal. | |||
signal. | ||||
6.2.2.3. Refinement | 6.2.2.3. Refinement | |||
With the algorithm presented above, if congestion is detected, cwin | With the algorithm presented above, if congestion is detected, cwin | |||
goes back to its initial value, and does not use the information | goes back to its initial value and does not use the information | |||
gathered in previous congestion avoidance phases. | gathered in previous congestion avoidance phases. | |||
It is possible to use a fast recovery phase once congestion is | It is possible to use a fast recovery phase once congestion is | |||
detected, to avoid going through this linear rate of growth from | detected and to avoid going through this linear rate of growth from | |||
scratch. When congestion is detected, a fast recovery threshold | scratch. When congestion is detected, a fast recovery threshold | |||
frthresh is set to frthresh = cwin / 2. In this fast recovery phase, | frthresh is set to frthresh = cwin / 2. In this fast recovery phase, | |||
for every acked LSP, cwin += 1. Once cwin reaches frthresh, the | for every acked LSP, cwin += 1. Once cwin reaches frthresh, the | |||
algorithm goes back to the congestion avoidance phase. | algorithm goes back to the congestion avoidance phase. | |||
+---------------+ | +---------------+ | |||
| | | | | | |||
| v | | v | |||
| +----------------------+ | | +----------------------+ | |||
| | Congestion avoidance | | | | Congestion avoidance | | |||
| + ---------------------+ | | + ---------------------+ | |||
| | | | | | |||
| | Congestion signal | | | Congestion signal | |||
| | | | | | |||
| +----------------------+ | | +----------------------+ | |||
| | Fast recovery | | | | Fast recovery | | |||
| +----------------------+ | | +----------------------+ | |||
| | | | | | |||
| | frthresh reached | | | frthresh reached | |||
----------------+ | ----------------+ | |||
Figure 2 | Figure 2 | |||
6.2.2.4. Remarks | 6.2.2.4. Remarks | |||
This algorithm's performance is dependent on the LPP value. Indeed, | This algorithm's performance is dependent on the LPP value. Indeed, | |||
the smaller LPP is, the more information is available for the | the smaller the LPP is, the more information is available for the | |||
congestion control algorithm to perform well. However, it also | congestion control algorithm to perform well. However, it also | |||
increases the resources spent on sending PSNPs, so a trade-off must | increases the resources spent on sending PSNPs, so a trade-off must | |||
be made. This document recommends to use an LPP of 15 or less. If a | be made. This document recommends using an LPP of 15 or less. If a | |||
Receive Window is advertised, LPP SHOULD be lower and the best | Receive Window is advertised, LPP SHOULD be lower, and the best | |||
performance is achieved when LPP is an integer fraction of the | performance is achieved when LPP is an integer fraction of the | |||
Receive Window. | Receive Window. | |||
Note that this congestion control algorithm benefits from the | Note that this congestion control algorithm benefits from the | |||
extensions proposed in this document. The advertisement of a receive | extensions proposed in this document. The advertisement of a receive | |||
window from the receiver (Section 6.2.1) avoids the use of an | window from the receiver (Section 6.2.1) avoids the use of an | |||
arbitrary maximum value by the sender. The faster acknowledgment of | arbitrary maximum value by the sender. The faster acknowledgment of | |||
LSPs (Section 5.1) allows for a faster control loop and hence a | LSPs (Section 5.1) allows for a faster control loop and hence a | |||
faster increase of the congestion window in the absence of | faster increase of the congestion window in the absence of | |||
congestion. | congestion. | |||
6.2.3. Pacing | 6.2.3. Pacing | |||
As discussed in [RFC9002], Section 7.7 a sender SHOULD pace sending | As discussed in [RFC9002], Section 7.7, a sender SHOULD pace sending | |||
of all in-flight LSPs based on input from the congestion controller. | of all in-flight LSPs based on input from the congestion controller. | |||
Sending multiple packets without any delay between them creates a | Sending multiple packets without any delay between them creates a | |||
packet burst that might cause short-term congestion and losses. | packet burst that might cause short-term congestion and losses. | |||
Senders MUST either use pacing or limit such bursts. Senders SHOULD | Senders MUST either use pacing or limit such bursts. Senders SHOULD | |||
limit bursts to LSP Burst Size. | limit bursts to LSP Burst Size. | |||
Senders can implement pacing as they choose. A perfectly paced | Senders can implement pacing as they choose. A perfectly paced | |||
sender spreads packets evenly over time. For a window-based | sender spreads packets evenly over time. For a window-based | |||
congestion controller, such as the one in this section, that rate can | congestion controller, such as the one in this section, that rate can | |||
be computed by averaging the congestion window over the RTT. | be computed by averaging the congestion window over the RTT. | |||
Expressed as an inter-packet interval in units of time: | Expressed as an inter-packet interval in units of time: | |||
interval = (SRTT / cwin) / N | interval = (SRTT / cwin) / N | |||
SRTT is the smoothed round-trip time [RFC6298] | SRTT is the Smoothed Round-Trip Time [RFC6298]. | |||
Using a value for N that is small, but at least 1 (for example, 1.25) | Using a value for N that is small, but at least 1 (for example, | |||
ensures that variations in RTT do not result in underutilization of | 1.25), ensures that variations in RTT do not result in | |||
the congestion window. | underutilization of the congestion window. | |||
Practical considerations, such as scheduling delays and computational | Practical considerations, such as scheduling delays and computational | |||
efficiency, can cause a sender to deviate from this rate over time | efficiency, can cause a sender to deviate from this rate over time | |||
periods that are much shorter than an RTT. | periods that are much shorter than an RTT. | |||
One possible implementation strategy for pacing uses a leaky bucket | One possible implementation strategy for pacing uses a leaky bucket | |||
algorithm, where the capacity of the "bucket" is limited to the | algorithm, where the capacity of the "bucket" is limited to the | |||
maximum burst size and the rate that the "bucket" fills is determined | maximum burst size, and the rate that the "bucket" fills is | |||
by the above function. | determined by the above function. | |||
6.2.4. Determining values to be advertised in the Flooding Parameters | 6.2.4. Determining Values to be Advertised in the Flooding Parameters | |||
TLV | TLV | |||
The values that a receiver advertises do not need to be perfect. If | The values that a receiver advertises do not need to be perfect. If | |||
the values are too low then the transmitter will not use the full | the values are too low, then the transmitter will not use the full | |||
bandwidth or available CPU resources. If the values are too high | bandwidth or available CPU resources. If the values are too high, | |||
then the receiver may drop some LSPs during the first RTT and this | then the receiver may drop some LSPs during the first RTT, and this | |||
loss will reduce the usable receive window and the protocol | loss will reduce the usable receive window, and the protocol | |||
mechanisms will allow the adjacency to recover. Flooding slower than | mechanisms will allow the adjacency to recover. Flooding slower than | |||
both nodes can support will hurt performance, as will consistently | both nodes can support will hurt performance as will consistently | |||
overloading the receiver. | overloading the receiver. | |||
6.2.4.1. Static values | 6.2.4.1. Static Values | |||
The values advertised need not be dynamic as feedback is provided by | The values advertised need not be dynamic, as feedback is provided by | |||
the acknowledgment of LSPs in SNP messages. Acknowledgments provide | the acknowledgment of LSPs in SNP messages. Acknowledgments provide | |||
a feedback loop on how fast the LSPs are processed by the receiver. | a feedback loop on how fast the LSPs are processed by the receiver. | |||
They also signal that the LSPs can be removed from receive window, | They also signal that the LSPs can be removed from the receive | |||
explicitly signaling to the sender that more LSPs may be sent. By | window, explicitly signaling to the sender that more LSPs may be | |||
advertising relatively static parameters, we expect to produce | sent. By advertising relatively static parameters, we expect to | |||
overall flooding behavior similar to what might be achieved by | produce overall flooding behavior similar to what might be achieved | |||
manually configuring per-interface LSP rate-limiting on all | by manually configuring per-interface LSP rate-limiting on all | |||
interfaces in the network. The advertised values could be based, for | interfaces in the network. The advertised values could be based, for | |||
example, on offline tests of the overall LSP-processing speed for a | example, on offline tests of the overall LSP-processing speed for a | |||
particular set of hardware and the number of interfaces configured | particular set of hardware and the number of interfaces configured | |||
for IS-IS. With such a formula, the values advertised in the | for IS-IS. With such a formula, the values advertised in the | |||
Flooding Parameters TLV would only change when additional IS-IS | Flooding Parameters TLV would only change when additional IS-IS | |||
interfaces are configured. | interfaces are configured. | |||
Static values are dependent on the CPU generation, class of router | Static values are dependent on the CPU generation, class of router, | |||
and network scaling, typically the number of adjacent neighbors. | and network scaling, typically the number of adjacent neighbors. | |||
Examples at the time of publication are provided below. LSP Burst | Examples at the time of publication are provided below. The LSP | |||
Size could be in the range 5 to 20. From a router perspective, this | Burst Size could be in the range 5 to 20. From a router perspective, | |||
value typically depends on the queue(s) size(s) on the I/O path from | this value typically depends on the queue(s) size(s) on the I/O path | |||
the packet forwarding engine to the control plane which is very | from the packet forwarding engine to the control plane, which is very | |||
platform dependent. It also depends upon how many IS-IS neighbors | platform-dependent. It also depends upon how many IS-IS neighbors | |||
share this I/O path as typically all neighbors will send the same | share this I/O path, as typically all neighbors will send the same | |||
LSPs at the same time. It may also depend on other incoming control | LSPs at the same time. It may also depend on other incoming control | |||
plane traffic sharing that I/O path, how bursty they are, and how | plane traffic that is sharing that I/O path, how bursty they are, and | |||
many incoming IS-IS packets are prioritized over other incoming | how many incoming IS-IS packets are prioritized over other incoming | |||
control plane traffic. As indicated in Section 3, the historical | control plane traffic. As indicated in Section 3, the historical | |||
behavior from [ISO10589] allows a value of 10 hence 10 seems | behavior from [ISO10589] allows a value of 10; hence, 10 seems | |||
conservative. From a network operation perspective, it would be | conservative. From a network operation perspective, it would be | |||
beneficial for the burst size to be equal to or higher than the | beneficial for the burst size to be equal to or higher than the | |||
number of LSPs which may be originated by a single failure. For a | number of LSPs that may be originated by a single failure. For a | |||
node failure, this is equal to the number of IS-IS neighbors of the | node failure, this is equal to the number of IS-IS neighbors of the | |||
failed node. LSP Transmission Interval could be in the range of 1 ms | failed node. The LSP Transmission Interval could be in the range of | |||
to 33 ms. As indicated in Section 3, the historical behavior from | 1 ms to 33 ms. As indicated in Section 3, the historical behavior | |||
[ISO10589] is 33ms hence is conservative. The LSP Transmission | from [ISO10589] is 33 ms; hence, 33 ms is conservative. The LSP | |||
Interval is an advertisement of the receiver's sustainable LSP | Transmission Interval is an advertisement of the receiver's | |||
reception rate taking into account all aspects and in particular the | sustainable LSP reception rate taking into account all aspects and | |||
control plane CPU and the I/O bandwidth. It's expected to improve | particularly the control plane CPU and the I/O bandwidth. It's | |||
(hence decrease) as hardware and software naturally improve over | expected to improve (hence, decrease) as hardware and software | |||
time. It should be chosen conservatively as this rate may be used by | naturally improve over time. It should be chosen conservatively, as | |||
the sender in all conditions including the worst conditions. It's | this rate may be used by the sender in all conditions -- including | |||
also not a bottleneck as the flow control algorithm may use a higher | the worst conditions. It's also not a bottleneck as the flow control | |||
rate in good conditions, in particular when the receiver acknowledges | algorithm may use a higher rate in good conditions, particularly when | |||
quickly and the receive window is large enough compared to the RTT. | the receiver acknowledges quickly, and the receive window is large | |||
LPP could be in the range of 5 to 90 with a proposed 15. A smaller | enough compared to the RTT. LPP could be in the range of 5 to 90 | |||
value provides faster feedback at the cost of the small overhead of | with a proposed 15. A smaller value provides faster feedback at the | |||
more PSNP messages. PartialSNPInterval could be in the range 50ms to | cost of the small overhead of more PSNP messages. PartialSNPInterval | |||
500ms with a proposed 200ms. One may distinguish the value used | could be in the range 50 to 500 ms with a proposed value of 200 ms. | |||
locally from the value signaled to the sender. The value used | One may distinguish the value used locally from the value signaled to | |||
locally benefits from being small but is not expected to be the main | the sender. The value used locally benefits from being small but is | |||
parameter to improve performance. It depends on how fast the IS-IS | not expected to be the main parameter to improve performance. It | |||
flooding process may be scheduled by the CPU. It's safe as, even | depends on how fast the IS-IS flooding process may be scheduled by | |||
when the receiver CPU is busy, it will naturally delay its | the CPU. Even when the receiver CPU is busy, it's safe because it | |||
acknowledgments which provides a negative feedback loop. The value | will naturally delay its acknowledgments, which provides a negative | |||
advertised to the sender should be conservative (high enough) as this | feedback loop. The value advertised to the sender should be | |||
value could be used by the sender to send some LSPs rather than keep | conservative (high enough) as this value could be used by the sender | |||
waiting for acknowledgments. Receive Window in the range of 30 to | to send some LSPs rather than keep waiting for acknowledgments. | |||
200 with a proposed 60. In general, the larger the better the | Receive Window could be in the range of 30 to 200 with a proposed | |||
performance on links with high RTT. The higher the number and the | value of 60. In general, the larger the better the performance on | |||
higher the number of IS-IS neighbors, the higher the use of control | links with high RTT. The higher that number and the higher the | |||
plane memory so it's mostly dependent on the amount of memory which | number of IS-IS neighbors, the higher the use of control plane | |||
may be dedicated to IS-IS flooding and the number of IS-IS neighbors. | memory, so it's mostly dependent on the amount of memory, which may | |||
From a memory usage perspective, a priori, one could use the same | be dedicated to IS-IS flooding and the number of IS-IS neighbors. | |||
From a memory usage perspective (a priori), one could use the same | ||||
value as the TCP receive window, but the value advertised should not | value as the TCP receive window, but the value advertised should not | |||
be higher than the buffer of the "socket" used. | be higher than the buffer of the "socket" used. | |||
6.2.4.2. Dynamic values | 6.2.4.2. Dynamic Values | |||
The values may be updated dynamically, to reflect the relative change | To reflect the relative change of load on the receiver, the values | |||
of load on the receiver, by improving the values when the receiver | may be updated dynamically by improving the values when the receiver | |||
load is getting lower and degrading the values when the receiver load | load is getting lower and by degrading the values when the receiver | |||
is getting higher. For example, if LSPs are regularly dropped, or if | load is getting higher. For example, if LSPs are regularly dropped, | |||
the queue regularly comes close to being filled, then the values may | or if the queue regularly comes close to being filled, then the | |||
be too high. On the other hand, if the queue is barely used (by IS- | values may be too high. On the other hand, if the queue is barely | |||
IS), then the values may be too low. | used (by IS-IS), then the values may be too low. | |||
The values may also be absolute value reflecting relevant average | Alternatively, the values may be computed to reflect the relevant | |||
hardware resources that are monitored, typically the amount of buffer | average hardware resources, e.g., the amount of buffer space used by | |||
space used by incoming LSPs. In this case, care must be taken when | incoming LSPs. In this case, care must be taken when choosing the | |||
choosing the parameters influencing the values in order to avoid | parameters influencing the values in order to avoid undesirable or | |||
undesirable or unstable feedback loops. It would be undesirable to | unstable feedback loops. For example, it would be undesirable to use | |||
use a formula that depends, for example, on an active measurement of | a formula that depends on an active measurement of the instantaneous | |||
the instantaneous CPU load to modify the values advertised in the | CPU load to modify the values advertised in the Flooding Parameters | |||
Flooding Parameters TLV. This could introduce feedback into the IGP | TLV. This could introduce feedback into the IGP flooding process | |||
flooding process that could produce unexpected behavior. | that could produce unexpected behavior. | |||
6.2.5. Operation considerations | 6.2.5. Operational Considerations | |||
As discussed in Section 4.7, the solution is more effective on point- | As discussed in Section 4.7, the solution is more effective on point- | |||
to-point adjacencies. Hence a broadcast interface (e.g., Ethernet) | to-point adjacencies. Hence, a broadcast interface (e.g., Ethernet) | |||
only shared by two IS-IS neighbors should be configured as point-to- | only shared by two IS-IS neighbors should be configured as point-to- | |||
point in order to have more effective flooding. | point in order to have more effective flooding. | |||
6.3. Transmitter Based Congestion Control Approach | 6.3. Transmitter-Based Congestion Control Approach | |||
This section describes an approach to congestion control algorithm | This section describes an approach to the congestion control | |||
based on performance measured by the transmitter without dependance | algorithm based on performance measured by the transmitter without | |||
on signaling from the receiver. | dependence on signaling from the receiver. | |||
6.3.1. Router Architecture Discussion | 6.3.1. Router Architecture Discussion | |||
(The following description is an abstraction - implementation details | Note that the following description is an abstraction; implementation | |||
vary.) | details vary. | |||
Existing router architectures may utilize multiple input queues. On | Existing router architectures may utilize multiple input queues. On | |||
a given line card, IS-IS PDUs from multiple interfaces may be placed | a given line card, IS-IS PDUs from multiple interfaces may be placed | |||
in a rate-limited input queue. This queue may be dedicated to IS-IS | in a rate-limited input queue. This queue may be dedicated to IS-IS | |||
PDUs or may be shared with other routing related packets. | PDUs or may be shared with other routing related packets. | |||
The input queue may then pass IS-IS PDUs to a "punt queue" which is | The input queue may then pass IS-IS PDUs to a "punt queue," which is | |||
used to pass PDUs from the data plane to the control plane. The punt | used to pass PDUs from the data plane to the control plane. The punt | |||
queue typically also has controls on its size and the rate at which | queue typically also has controls on its size and the rate at which | |||
packets will be punted. | packets will be punted. | |||
An input queue in the control plane may then be used to assemble PDUs | An input queue in the control plane may then be used to assemble PDUs | |||
from multiple linecards, separate the IS-IS PDUs from other types of | from multiple line cards, separate the IS-IS PDUs from other types of | |||
packets, and place the IS-IS PDUs on an input queue dedicated to the | packets, and place the IS-IS PDUs in an input queue dedicated to the | |||
IS-IS protocol. | IS-IS protocol. | |||
The IS-IS input queue then separates the IS-IS PDUs and directs them | The IS-IS input queue then separates the IS-IS PDUs and directs them | |||
to an instance-specific processing queue. The instance-specific | to an instance-specific processing queue. The instance-specific | |||
processing queue may then further separate the IS-IS PDUs by type | processing queue may then further separate the IS-IS PDUs by type | |||
(IIHs, SNPs, and LSPs) so that separate processing threads with | (IIHs, SNPs, and LSPs) so that separate processing threads with | |||
varying priorities may be employed to process the incoming PDUs. | varying priorities may be employed to process the incoming PDUs. | |||
In such an architecture, it may be difficult for IS-IS in the control | In such an architecture, it may be difficult for IS-IS in the control | |||
plane to determine what value should be advertised as a receive | plane to determine what value should be advertised as a receive | |||
window. | window. | |||
The following section describes an approach to congestion control | The following section describes an approach to congestion control | |||
based on performance measured by the transmitter without dependance | based on performance measured by the transmitter without dependence | |||
on signaling from the receiver. | on signaling from the receiver. | |||
6.3.2. Guidelines for transmitter side congestion controls | 6.3.2. Guidelines for Transmitter-Side Congestion Controls | |||
The approach described in this section does not depend upon direct | The approach described in this section does not depend upon direct | |||
signaling from the receiver. Instead it adapts the transmission rate | signaling from the receiver. Instead, it adapts the transmission | |||
based on measurement of the actual rate of acknowledgments received. | rate based on measurement of the actual rate of acknowledgments | |||
received. | ||||
Flow control is not used by this approach. When congestion control | Flow control is not used by this approach. When congestion control | |||
is necessary, it can be implemented based on knowledge of the current | is necessary, it can be implemented based on knowledge of the current | |||
flooding rate and the current acknowledgement rate. The algorithm | flooding rate and the current acknowledgment rate. The algorithm | |||
used is a local matter. There is no requirement to standardize it | used is a local matter. There is no requirement to standardize it, | |||
but there are a number of aspects which serve as guidelines which can | but there are a number of aspects that serve as guidelines that can | |||
be described. Algorithms based on this approach should follow the | be described. Algorithms based on this approach should follow the | |||
recommendations described below. | recommendations described below. | |||
A maximum LSP transmission rate (LSPTxMax) should be configurable. | A maximum LSP transmission rate (LSPTxMax) should be configurable. | |||
This represents the fastest LSP transmission rate which will be | This represents the fastest LSP transmission rate that will be | |||
attempted. This value should be applicable to all interfaces and | attempted. This value should be applicable to all interfaces and | |||
should be consistent network wide. | should be consistent network wide. | |||
When the current rate of LSP transmission (LSPTxRate) exceeds the | When the current rate of LSP transmission (LSPTxRate) exceeds the | |||
capabilities of the receiver, the congestion control algorithm needs | capabilities of the receiver, the congestion control algorithm needs | |||
to quickly and aggressively reduce the LSPTxRate. Slower | to quickly and aggressively reduce the LSPTxRate. Slower | |||
responsiveness is likely to result in a larger number of | responsiveness is likely to result in a larger number of | |||
retransmissions which can introduce much longer delays in | retransmissions, which can introduce much longer delays in | |||
convergence. | convergence. | |||
Dynamic increase of the rate of LSP transmission (LSPTxRate) (i.e., | Dynamic increase of the rate of LSP transmission (LSPTxRate), i.e., | |||
faster) should be done less aggressively and only be done when the | making the rate faster, should be done less aggressively and only be | |||
neighbor has demonstrated its ability to sustain the current | done when the neighbor has demonstrated its ability to sustain the | |||
LSPTxRate. | current LSPTxRate. | |||
The congestion control algorithm should not assume the receive | The congestion control algorithm should not assume that the receive | |||
performance of a neighbor is static, i.e., it should handle transient | performance of a neighbor is static, i.e., it should handle transient | |||
conditions which result in a slower or faster receive rate on the | conditions that result in a slower or faster receive rate on the part | |||
part of a neighbor. | of a neighbor. | |||
The congestion control algorithm should consider the expected delay | The congestion control algorithm should consider the expected delay | |||
time in receiving an acknowledgment. It therefore incorporates the | time in receiving an acknowledgment. Therefore, it incorporates the | |||
neighbor partialSNPInterval (Section 4.5) to help determine whether | neighbor partialSNPInterval (Section 4.5) to help determine whether | |||
acknowlegments are keeping pace with the rate of LSPs transmitted. | acknowledgments are keeping pace with the rate of LSPs transmitted. | |||
In the absence of an advertisement of partialSNPInterval, a locally | In the absence of an advertisement of partialSNPInterval, a locally | |||
configured value can be used. | configured value can be used. | |||
7. IANA Considerations | 7. IANA Considerations | |||
7.1. Flooding Parameters TLV | 7.1. Flooding Parameters TLV | |||
IANA has made the following temporary allocation from the IS-IS TLV | IANA has made the following allocation in the "IS-IS Top-Level TLV | |||
codepoint registry. This document requests the allocation be made | Codepoints" registry. | |||
permanent. | ||||
Type Description IIH LSP SNP Purge | +=======+=========================+=====+=====+=====+=======+ | |||
---- --------------------------- --- --- --- --- | | Value | Name | IIH | LSP | SNP | Purge | | |||
21 Flooding Parameters TLV y n y n | +=======+=========================+=====+=====+=====+=======+ | |||
| 21 | Flooding Parameters TLV | y | n | y | n | | ||||
+-------+-------------------------+-----+-----+-----+-------+ | ||||
Figure 3 | Table 1 | |||
7.2. Registry: IS-IS Sub-TLV for Flooding Parameters TLV | 7.2. Registry: IS-IS Sub-TLV for Flooding Parameters TLV | |||
This document creates the following sub-TLV Registry under the "IS-IS | IANA has created the following sub-TLV registry in the "IS-IS TLV | |||
TLV Codepoints" grouping: | Codepoints" registry group. | |||
Name: IS-IS Sub-TLVs for Flooding Parameters TLV. | ||||
Registration Procedure(s): Expert Review | ||||
Expert(s): TBD | ||||
Description: This registry defines sub-TLVs for the Flooding | ||||
Parameters TLV(21). | ||||
Reference: This document. | Name: IS-IS Sub-TLVs for Flooding Parameters TLV | |||
Registration Procedure(s): Expert Review | ||||
Description: This registry defines sub-TLVs for the Flooding | ||||
Parameters TLV (21). | ||||
Reference: RFC 9681 | ||||
+=======+===========================+ | +=======+===========================+ | |||
| Type | Description | | | Type | Description | | |||
+=======+===========================+ | +=======+===========================+ | |||
| 0 | Reserved | | | 0 | Reserved | | |||
+-------+---------------------------+ | +-------+---------------------------+ | |||
| 1 | LSP Burst Size | | | 1 | LSP Burst Size | | |||
+-------+---------------------------+ | +-------+---------------------------+ | |||
| 2 | LSP Transmission Interval | | | 2 | LSP Transmission Interval | | |||
+-------+---------------------------+ | +-------+---------------------------+ | |||
| 3 | LSPs Per PSNP | | | 3 | LSPs per PSNP | | |||
+-------+---------------------------+ | +-------+---------------------------+ | |||
| 4 | Flags | | | 4 | Flags | | |||
+-------+---------------------------+ | +-------+---------------------------+ | |||
| 5 | Partial SNP Interval | | | 5 | PSNP Interval | | |||
+-------+---------------------------+ | +-------+---------------------------+ | |||
| 6 | Receive Window | | | 6 | Receive Window | | |||
+-------+---------------------------+ | +-------+---------------------------+ | |||
| 7-255 | Unassigned | | | 7-255 | Unassigned | | |||
+-------+---------------------------+ | +-------+---------------------------+ | |||
Table 1: Initial Sub-TLV | Table 2: Initial Sub-TLV | |||
allocations for Flooding | Allocations for Flooding | |||
Parameters TLV | Parameters TLV | |||
7.3. Registry: IS-IS Bit Values for Flooding Parameters Flags Sub-TLV | 7.3. Registry: IS-IS Bit Values for Flooding Parameters Flags Sub-TLV | |||
This document requests IANA to create a new registry, under the "IS- | IANA has created a new registry, in the "IS-IS TLV Codepoints" | |||
IS TLV Codepoints" grouping, for assigning Flag bits advertised in | registry group, for assigning Flag bits advertised in the Flags sub- | |||
the Flags sub- TLV. | TLV. | |||
Name: IS-IS Bit Values for Flooding Parameters Flags Sub-TLV. | ||||
Registration Procedure: Expert Review | ||||
Expert Review Expert(s): TBD | ||||
Description: This registry defines bit values for the Flags sub- | ||||
TLV(4) advertised in the Flooding Parameters TLV(21). | ||||
Note: In order to minimize encoding space, a new allocation should | ||||
pick the smallest available value. | ||||
Reference: This document. | Name: IS-IS Bit Values for Flooding Parameters Flags Sub-TLV | |||
Registration Procedure: Expert Review | ||||
Description: This registry defines bit values for the Flags sub-TLV | ||||
(4) advertised in the Flooding Parameters TLV (21). | ||||
Note: In order to minimize encoding space, a new allocation should | ||||
pick the smallest available value. | ||||
Reference: RFC 9681 | ||||
+=======+==================================+ | +=======+=================================+ | |||
| Bit # | Description | | | Bit # | Description | | |||
+=======+==================================+ | +=======+=================================+ | |||
| 0 | Ordered acknowledgement (O-flag) | | | 0 | Ordered acknowledgment (O-flag) | | |||
+-------+----------------------------------+ | +-------+---------------------------------+ | |||
| 1-63 | Unassigned | | | 1-63 | Unassigned | | |||
+-------+----------------------------------+ | +-------+---------------------------------+ | |||
Table 2: Initial bit allocations for | Table 3: Initial Bit Allocations for | |||
Flags Sub-TLV | Flags Sub-TLV | |||
8. Security Considerations | 8. Security Considerations | |||
Security concerns for IS-IS are addressed in [ISO10589] , [RFC5304] , | Security concerns for IS-IS are addressed in [ISO10589], [RFC5304], | |||
and [RFC5310] . These documents describe mechanisms that provide for | and [RFC5310]. These documents describe mechanisms that provide for | |||
the authentication and integrity of IS-IS PDUs, including SNPs and | the authentication and integrity of IS-IS PDUs, including SNPs and | |||
IIHs. These authentication mechanisms are not altered by this | IIHs. These authentication mechanisms are not altered by this | |||
document. | document. | |||
With the cryptographic mechanisms described in [RFC5304] and | With the cryptographic mechanisms described in [RFC5304] and | |||
[RFC5310] , an attacker wanting to advertise an incorrect Flooding | [RFC5310], an attacker wanting to advertise an incorrect Flooding | |||
Parameters TLV would have to first defeat these mechanisms. | Parameters TLV would have to first defeat these mechanisms. | |||
In the absence of cryptographic authentication, as IS-IS does not run | In the absence of cryptographic authentication, as IS-IS does not run | |||
over IP but directly over the link layer, it's considered difficult | over IP but directly over the link layer, it's considered difficult | |||
to inject false SNP/IIH without having access to the link layer. | to inject a false SNP or IIH without having access to the link layer. | |||
If a false SNP/IIH is sent with a Flooding Parameters TLV set to | If a false SNP or IIH is sent with a Flooding Parameters TLV set to | |||
conservative values, the attacker can reduce the flooding speed | conservative values, the attacker can reduce the flooding speed | |||
between the two adjacent neighbors which can result in LSDB | between the two adjacent neighbors, which can result in LSDB | |||
inconsistencies and transient forwarding loops. However, it is not | inconsistencies and transient forwarding loops. However, it is not | |||
significantly different than filtering or altering LSPs which would | significantly different than filtering or altering LSPs, which would | |||
also be possible with access to the link layer. In addition, if the | also be possible with access to the link layer. In addition, if the | |||
downstream flooding neighbor has multiple IGP neighbors, which is | downstream flooding neighbor has multiple IGP neighbors (which is | |||
typically the case for reliability or topological reasons, it would | typically the case for reliability or topological reasons), it would | |||
receive LSPs at a regular speed from its other neighbors and hence | receive LSPs at a regular speed from its other neighbors and hence | |||
would maintain LSDB consistency. | would maintain LSDB consistency. | |||
If a false SNP/IIH is sent with a Flooding Parameters TLV set to | If a false SNP or IIH is sent with a Flooding Parameters TLV set to | |||
aggressive values, the attacker can increase the flooding speed which | aggressive values, the attacker can increase the flooding speed, | |||
can either overload a node or more likely generate loss of LSPs. | which can either overload a node or more likely cause loss of LSPs. | |||
However, it is not significantly different than sending many LSPs | However, it is not significantly different than sending many LSPs, | |||
which would also be possible with access to the link layer, even with | which would also be possible with access to the link layer, even with | |||
cryptographic authentication enabled. In addition, IS-IS has | cryptographic authentication enabled. In addition, IS-IS has | |||
procedures to detect the loss of LSPs and recover. | procedures to detect the loss of LSPs and recover. | |||
This TLV advertisement is not flooded across the network but only | This TLV advertisement is not flooded across the network but only | |||
sent between adjacent IS-IS neighbors. This would limit the | sent between adjacent IS-IS neighbors. This would limit the | |||
consequences in case of forged messages, and also limits the | consequences in case of forged messages and also limit the | |||
dissemination of such information. | dissemination of such information. | |||
9. Contributors | 9. References | |||
The following people gave a substantial contribution to the content | ||||
of this document and should be considered as coauthors: | ||||
* Jayesh J, Ciena, jayesh.ietf@gmail.com | ||||
* Chris Bowers, Juniper Networks, cbowers@juniper.net | ||||
* Peter Psenak, Cisco Systems, ppsenak@cisco.com | ||||
10. Acknowledgments | ||||
The authors would like to thank Henk Smit, Sarah Chen, Xuesong Geng, | ||||
Pierre Francois, Hannes Gredler, Acee Lindem, Mirja Kuhlewind, | ||||
Zaheduzzaman Sarker and John Scudder for their reviews, comments and | ||||
suggestions. | ||||
The authors would like to thank David Jacquet, Sarah Chen, and | ||||
Qiangzhou Gao for the tests performed on commercial implementations | ||||
and their identification of some limiting factors. | ||||
11. References | ||||
11.1. Normative References | 9.1. Normative References | |||
[ISO10589] ISO, "Intermediate system to Intermediate system intra- | [ISO10589] ISO/IEC, "Information technology - Telecommunications and | |||
domain routeing information exchange protocol for use in | information exchange between systems - Intermediate system | |||
conjunction with the protocol for providing the | to Intermediate system intra-domain routeing information | |||
connectionless-mode Network Service (ISO 8473)", ISO/ | exchange protocol for use in conjunction with the protocol | |||
IEC 10589:2002, Second Edition, November 2002. | for providing the connectionless-mode network service (ISO | |||
8473)", Second Edition, ISO/IEC 10589:2002, November 2002, | ||||
<https://www.iso.org/standard/30932.html>. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
[RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic | [RFC5304] Li, T. and R. Atkinson, "IS-IS Cryptographic | |||
Authentication", RFC 5304, DOI 10.17487/RFC5304, October | Authentication", RFC 5304, DOI 10.17487/RFC5304, October | |||
2008, <https://www.rfc-editor.org/info/rfc5304>. | 2008, <https://www.rfc-editor.org/info/rfc5304>. | |||
skipping to change at page 25, line 19 ¶ | skipping to change at line 1097 ¶ | |||
[RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, | [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, | |||
"Computing TCP's Retransmission Timer", RFC 6298, | "Computing TCP's Retransmission Timer", RFC 6298, | |||
DOI 10.17487/RFC6298, June 2011, | DOI 10.17487/RFC6298, June 2011, | |||
<https://www.rfc-editor.org/info/rfc6298>. | <https://www.rfc-editor.org/info/rfc6298>. | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
11.2. Informative References | 9.2. Informative References | |||
[I-D.ietf-lsr-dynamic-flooding] | ||||
Li, T., Psenak, P., Chen, H., Jalil, L., and S. Dontula, | ||||
"Dynamic Flooding on Dense Graphs", Work in Progress, | ||||
Internet-Draft, draft-ietf-lsr-dynamic-flooding-18, 5 | ||||
April 2024, <https://datatracker.ietf.org/doc/html/draft- | ||||
ietf-lsr-dynamic-flooding-18>. | ||||
[RFC2973] Balay, R., Katz, D., and J. Parker, "IS-IS Mesh Groups", | [RFC2973] Balay, R., Katz, D., and J. Parker, "IS-IS Mesh Groups", | |||
RFC 2973, DOI 10.17487/RFC2973, October 2000, | RFC 2973, DOI 10.17487/RFC2973, October 2000, | |||
<https://www.rfc-editor.org/info/rfc2973>. | <https://www.rfc-editor.org/info/rfc2973>. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, | |||
<https://www.rfc-editor.org/info/rfc5681>. | <https://www.rfc-editor.org/info/rfc5681>. | |||
[RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection | [RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection | |||
and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, | and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, | |||
May 2021, <https://www.rfc-editor.org/info/rfc9002>. | May 2021, <https://www.rfc-editor.org/info/rfc9002>. | |||
[RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | |||
STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | |||
<https://www.rfc-editor.org/info/rfc9293>. | <https://www.rfc-editor.org/info/rfc9293>. | |||
Appendix A. Changes / Author Notes | [RFC9667] Li, T., Ed., Psenak, P., Ed., Chen, H., Jalil, L., and S. | |||
Dontula, "Dynamic Flooding on Dense Graphs", RFC 9667, | ||||
[RFC Editor: Please remove this section before publication] | DOI 10.17487/RFC9667, October 2024, | |||
<https://www.rfc-editor.org/info/rfc9667>. | ||||
IND 00: Initial version. | ||||
WG 00: No change. | ||||
WG 01: IANA allocated code point. | ||||
WG 02: No change. | Acknowledgments | |||
WG 03: | The authors would like to thank Henk Smit, Sarah Chen, Xuesong Geng, | |||
Pierre Francois, Hannes Gredler, Acee Lindem, Mirja Kühlewind, | ||||
Zaheduzzaman Sarker, and John Scudder for their reviews, comments, | ||||
and suggestions. | ||||
* Pacing section added (taken from RFC 9002). | The authors would like to thank David Jacquet, Sarah Chen, and | |||
Qiangzhou Gao for the tests performed on commercial implementations | ||||
and for their identification of some limiting factors. | ||||
* Some text borrowed from RFC 9002 (QUIC Loss Detection and | Contributors | |||
Congestion Control). | ||||
* Considerations on the special role of the DIS. | The following people gave substantial contributions to the content of | |||
this document and should be considered as coauthors: | ||||
* Editorial changes. | Jayesh J | |||
Ciena | ||||
Email: jayesh.ietf@gmail.com | ||||
WG 04: Update IANA section as per IANA editor comments (2023-03-23). | Chris Bowers | |||
Juniper Networks | ||||
Email: cbowers@juniper.net | ||||
WG 06: AD review. | Peter Psenak | |||
Cisco Systems | ||||
Email: ppsenak@cisco.com | ||||
Authors' Addresses | Authors' Addresses | |||
Bruno Decraene | Bruno Decraene | |||
Orange | Orange | |||
Email: bruno.decraene@orange.com | Email: bruno.decraene@orange.com | |||
Les Ginsberg | Les Ginsberg | |||
Cisco Systems | Cisco Systems | |||
821 Alder Drive | 821 Alder Drive | |||
skipping to change at page 27, line 4 ¶ | skipping to change at line 1174 ¶ | |||
Guillaume Solignac | Guillaume Solignac | |||
Email: gsoligna@protonmail.com | Email: gsoligna@protonmail.com | |||
Marek Karasek | Marek Karasek | |||
Cisco Systems | Cisco Systems | |||
Pujmanove 1753/10a, Prague 4 - Nusle | Pujmanove 1753/10a, Prague 4 - Nusle | |||
10 14000 Prague | 10 14000 Prague | |||
Czech Republic | Czech Republic | |||
Email: mkarasek@cisco.com | Email: mkarasek@cisco.com | |||
Gunter Van de Velde | Gunter Van de Velde | |||
Nokia | Nokia | |||
Copernicuslaan 50 | Copernicuslaan 50 | |||
2018 Antwerp | 2018 Antwerp | |||
Belgium | Belgium | |||
Email: gunter.van_de_velde@nokia.com | Email: gunter.van_de_velde@nokia.com | |||
Tony Przygienda | Tony Przygienda | |||
Juniper | Juniper | |||
1137 Innovation Way | 1133 Innovation Way | |||
Sunnyvale, Ca | Sunnyvale, CA 94089 | |||
United States of America | United States of America | |||
Email: prz@juniper.net | Email: prz@juniper.net | |||
End of changes. 176 change blocks. | ||||
568 lines changed or deleted | 529 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |