Recursively Cautious Congestion Control

The paper connects two observations:
First that ISPs have unused bandwidth to maintain performance in the event of failures in their network (and most of the time, there aren’t failures in the network).
Second, that TCP uses a cautious slow start, where the receive window starts small to ensure fairness and avoid network oversaturation (a slow start is preferable to an almost-instant network oversaturation with high packet drops). While the receiver window does grow as TCP headers are exchanged, this means TCP will be cautiously growing that window size while the network is generally underutilized.

Recursive Cautious Congestion Control (RC3)

The authors propose their RC3 approach, which doesn’t replace TCP, it rather supplements it, running parallel/alongside it.

RC3 is designed to:

  1. take advantage of the (usually) unused network bandwidth
  2. not add traffic (and congestion) to the network in the (rarer) case that the network is actually seeing heavy use
  3. not interfere with performance and functionality of normal TCP traffic (i.e. in worst-case, RC3 will not degrade TCP’s performance)

Try to Fill the Pipe

How does RC3 try to use the underutilized network? Normally, when bytes are queued for sending over TCP, TCP will start with the first bytes, at the head of the buffer, and will continue onwards towards the tail, until all is received.

RC3 allows TCP to continue that behavior but, additionally, after TCP has queued its first bytes,RC3 takes all the remaining bytes (bytes TCP hasn’t gotten to yet), and asks the NIC to send them all;RC3’s bytes are marked as low-priority for the NIC and for the network.

Worse Quality of Service (WQos) When Network is Congested

For RC3 to not negatively affect the network when aggressively sending packets, the authors propose that network operators could offer a worse quality of service to packets with certain bits set in the IP header. Essentially, any traffic generated by RC3 would be marked as requesting WQoS, or in other words, low priority.

If the network is busy, routers can decide to drop RC3 traffic.

This is not implemented yet and the authors discuss its feasibility and the problems of partial implementation (what to do if some routers support WQoS but others do not).

Host and NIC Prioritization

RC3 does not want to impact existing TCP traffic, so (assuming the NIC supports traffic priority) RC3 categorises its traffic as lowest priority.

If the network congestion is between the host’s NIC and the network, the NIC’s prioritization will ensure normal TCP is undisturbed.

Levels of WQoS

As we know, RC3 tries to send the bytes in the buffer that TCP hasn’t touched yet, but not all the RC3-sent bytes have the same (lowered) priority.

From the tail end of the buffer, the last 40 bytes have low priority, the 400 bytes before that have an even lower priority and the 4000 bytes before those have a yet even lower priority, and so on. Every bigger batch has an ever-dwindling priority.

In the paper’s Figure 1, the highest priority is 0. priority.png

The example in section 2.2 of the paper paints the whole process nicely.

Dropped Packets

RC3 only transmits packets once, and the receiver of low-priority RC3-generated packets does not ACK their receipt normally, it ACKs it at the same priority it was received at.

The low-priority ACKs are used by the receiver to inform the transmitter what packets are missing from RC3’s attempt. So, despite RC3’s fire-and-forget approach, TCP’s diligent nature will fill in the holes RC3 left behind (which it would have had to do anyway if RC3 was fully absent).

Implementation

The authors are glad to report very few changes are needed in the Linux kernel network stack, and they are fairly nonintrusive, in that the core TCP machinery is untouched; mostly, adding some wedges to separate RC3 traffic from normal TCP traffic.

Other changes they made:

  • Priority bits initially set in TCP headers needs to be mirrored in IP headers (because of IP routing)
  • Increase the socket buffer sizes, so more data is handed to TCP (and RC3) in one shot, so that RC3 may send as much of it on the wire as possible.
  • Disable Forward ACKs and Duplicate ACKs mechanisms, as RC3 behavior might confuse them

Performance

The slow start of TCP is amortized over very big flows (a flow being the total amounts of byte to be sent e.g. a file). As TCP ‘detects’ the network capacity, it will approximately send as many bytes as the network can carry.

RC3 also assumes the network is underutilized, so if this is not the case, there isn’t any headroom for RC3 to take advantage of while TCP is being cautious.

Paper goes in detail over many situations, both simulated results and experimental results with varied changes (flow size, link state, link heterogeneity, link bandwidth, network utilization levels, network losses, and more). Their expected gains are fairly close to those observed experimentally.

They also noticed that bucketing different RC3-sent bytes in lower and lower priorities levels does improve performance over having 1 single low-priority level for all bytes.