Cell Relay Archive

Cell Relay Retreat>List Archive>month:1995-Apr> msg00394



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

Re: TCP window size

  • From: vjs@calcite.rhyolite.com (Vernon Schryver)
  • Date: Tue, 25 Apr 1995 04:53:27 -0500, Mon, 24 Apr 1995 23:49:21 GMT

In article <D7JsLK.78y@world.std.com> craigp@world.std.com (Craig Partridge) writes:

> ...
>As several folks have noted the SOL_SNDBUF and SOL_RCVBUF options set
>the window size options.  Critical here is that TCP negotiates the
>window size at setup time and picks the MINIMUM of the sender's offered
>window size (SNDBUF) and the receiver's max window size (RCVBUF).
>So both sender and receiver must request a large enough window size.

A technical detail is that the TCP window size is not negotiated.  The
MSS is negotiated and the minimum used, but there is no way in the TCP
protocol for the receiver (the machine that advertises the window, set
in BSD code according to tcp_recvspace and the RCVBUF setsockopt) to
know how much buffering the sender has (set in BSD code according to
tcp_sendspace the SNDBUF setsockopt).


>In general, if you care about throughput, you should always ask for
>the largest possible window size, since TCP will dynamically adapt if
>the link cannot handle that much data.  (Using the Jacobson work mentioned
>in other postings).

That is true only to a point.  You should not advertise a window
that is grossly large unless your network topology is trivial.
Consider the following scenario:
    - 2 Ethernet, each with its own IP network (or subnet) number
    - one pair of 56Kbit/sec leased line, frame relay, or other wide
	area routers connecting the Ethernets.  Let the routers have
	the ~50KByte of buffering that is (or was) the default for
	one major vendor of such routers.
    - one pair of UNIX workstations, one on each Ethernet, running
	4.3BSD or newer TCP (fast retransmission, etc), with a default
	TCP/IP window of 60K (or larger)
    - start a file transfer from one workstation to other other
    - simultaneously try to do some interactive work over the link
	simulated with the `ping` command.

Modify that scenario to taste, replacing the Ethernets with faster links
(e.g. FDDI or 155Mbps ATM) and the 56Kbps link with a faster (e.g. T1)
or slower (e.g PPP/v.32bis) link, but keeping the speed of the LANS far
faster than the WAN link.

You will see the round trip delays reported by `ping` gradually increase
to 10 seconds (or the buffer size in the routers divided by the wide
area link data rate), drop abruptly to nearly 0, and then repeat the
sawtooth.

What is happening is that slow-start is gradually increasing the congestion
window.  Eventually the router's queue overflows and a packet is dropped.
By that time, the measured round-trip-time is large (e.g. 50KByte/56kbps
save the day, keeping the pipe too full and the delays enormous.  If
other traffic is present, such as `vi` or `ping`, more than one packet
will be lost, the transmitter see too many duplicate ACKs and give up
for a long time (remember the RTT).

None of this is theoretical.  The best solution is probably to
automagically reduce the workstation's windows, but that is non-trivial.
The 2nd best solution is manually reconfigure all workstations so that
either the sender or the receiver has a small default window (e.g. change
tcp_sendspace and tcp_recvspace at the "remote office" to 4K), but people
resist and refuse that for non-technical reasons.  The only solution I
have been able to get installed is to reduce the buffering in the routers
to less than 1 second.  That increases the likelihood of packet losses
and TCP timeouts, but at least prevents those 10 second waits for an
`rlogin` echo.


>The TCP_NODELAY option does not help, and in most situations will hurt,
>since it allows TCP to send runt sized segments (rather than always
>trying to send max sized segments, which is more efficient).

Agreed, except
    - when doing bulk transfers, and when systems are careful to pick
	window sizes that are a multiple of the segment size used by
	the sendier, the Nagle algorithm has no chance of coming into play

    - the original questioner said the TCP_NODELAY helped whatever was
	being done, which suggests the involvment of an application
	that has not bee designed with an eye to on-the-wire traffic,
	something that the Nagle algorithm hurts.


Vernon Schryver    vjs@rhyolite.com