Cell Relay Archive[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index][Thread Index][Author Index][Subject Index] Re: TCP window size
In article <D7JsLK.78y@world.std.com> craigp@world.std.com (Craig Partridge) writes: > ... >As several folks have noted the SOL_SNDBUF and SOL_RCVBUF options set >the window size options. Critical here is that TCP negotiates the >window size at setup time and picks the MINIMUM of the sender's offered >window size (SNDBUF) and the receiver's max window size (RCVBUF). >So both sender and receiver must request a large enough window size. A technical detail is that the TCP window size is not negotiated. The MSS is negotiated and the minimum used, but there is no way in the TCP protocol for the receiver (the machine that advertises the window, set in BSD code according to tcp_recvspace and the RCVBUF setsockopt) to know how much buffering the sender has (set in BSD code according to tcp_sendspace the SNDBUF setsockopt). >In general, if you care about throughput, you should always ask for >the largest possible window size, since TCP will dynamically adapt if >the link cannot handle that much data. (Using the Jacobson work mentioned >in other postings). That is true only to a point. You should not advertise a window that is grossly large unless your network topology is trivial. Consider the following scenario: - 2 Ethernet, each with its own IP network (or subnet) number - one pair of 56Kbit/sec leased line, frame relay, or other wide area routers connecting the Ethernets. Let the routers have the ~50KByte of buffering that is (or was) the default for one major vendor of such routers. - one pair of UNIX workstations, one on each Ethernet, running 4.3BSD or newer TCP (fast retransmission, etc), with a default TCP/IP window of 60K (or larger) - start a file transfer from one workstation to other other - simultaneously try to do some interactive work over the link simulated with the `ping` command. Modify that scenario to taste, replacing the Ethernets with faster links (e.g. FDDI or 155Mbps ATM) and the 56Kbps link with a faster (e.g. T1) or slower (e.g PPP/v.32bis) link, but keeping the speed of the LANS far faster than the WAN link. You will see the round trip delays reported by `ping` gradually increase to 10 seconds (or the buffer size in the routers divided by the wide area link data rate), drop abruptly to nearly 0, and then repeat the sawtooth. What is happening is that slow-start is gradually increasing the congestion window. Eventually the router's queue overflows and a packet is dropped. By that time, the measured round-trip-time is large (e.g. 50KByte/56kbps save the day, keeping the pipe too full and the delays enormous. If other traffic is present, such as `vi` or `ping`, more than one packet will be lost, the transmitter see too many duplicate ACKs and give up for a long time (remember the RTT). None of this is theoretical. The best solution is probably to automagically reduce the workstation's windows, but that is non-trivial. The 2nd best solution is manually reconfigure all workstations so that either the sender or the receiver has a small default window (e.g. change tcp_sendspace and tcp_recvspace at the "remote office" to 4K), but people resist and refuse that for non-technical reasons. The only solution I have been able to get installed is to reduce the buffering in the routers to less than 1 second. That increases the likelihood of packet losses and TCP timeouts, but at least prevents those 10 second waits for an `rlogin` echo. >The TCP_NODELAY option does not help, and in most situations will hurt, >since it allows TCP to send runt sized segments (rather than always >trying to send max sized segments, which is more efficient). Agreed, except - when doing bulk transfers, and when systems are careful to pick window sizes that are a multiple of the segment size used by the sendier, the Nagle algorithm has no chance of coming into play - the original questioner said the TCP_NODELAY helped whatever was being done, which suggests the involvment of an application that has not bee designed with an eye to on-the-wire traffic, something that the Nagle algorithm hurts. Vernon Schryver vjs@rhyolite.com |
|