2008-11-07 05:45:48克理斯 在 Internet!

調整 TCP/IP 以加速傳輸流量及程式開發設定(一)

無意間拿起案頭的一本書 O'rilly 的 TCP Tuning and Network Troubleshotting!
看到從前著記的文章中的一段,覺得對 TCP/IP 的 transfer turning 很有意義就把它記錄下來,給看的懂、用的到的 System Administrator 和 Enginner 參考!
網路傳輸的公司是  Throughput = buffer size / latency
而書上寫著 Windows XP default TCP buffer size 為 17,520 bytes
所以 WinXP 的最大傳輸可能為:
17520/ 0.04 sec = 0.44MB/sec = 3.52Mbits/sec
而在 MacOSX 上的 TCP buffer size 是 64K
所以 MacOSX 的最大傳輸可能為:
65936/ 0.04 = 1.6484MB/sec = 13.1872 Mbits
並由 http://www.psc.edu/networking/projects/tcptune/#options

參照如何 Turning TCP Performance 的五個 factor

1. Maximum TCP Buffer (Memory) space: All operating systems have some global mechanism to limit the amount of system memory that can be used by any one TCP connection.

On some systems, each connection is subject to a memory limit that is applied to the total memory used for input data, output data and control structures. On other systems, there are separate limits for input and output buffer space for each connection.

Today almost all systems are shipped with Maximum Buffer Space limits that are far too small for nearly all of today's Internet. Furthermore the procedures for adjusting the memory limits are different on every operating system. You must follow the appropriate detailed procedures below, which generally require privileges on multi-user systems.

2. Socket Buffer Sizes: Most operating systems also support separate per  applications, the programmer can choose the socket buffer sizes using a setsockopt() system call. A Detailed Programmers Guide by Von Welch at NCSA describes how to set socket buffer sizes within network applications.

  • Some common applications include built in switches or commands to permit the user to manually set socket buffer sizes. The most common examples include iperf (a network diagnostic), many ftp variants (including gridftp) and other bulk data copy tools. Check the documentation on your system to see what is available.

    This approach forces the user to manually compute the BDP for the path and supply the proper command or option to the application.

  • There has been some work on autotuning within the applications themselves. This approach is easier to deploy than kernel modifications and frees the user from having to compute the BDP, but the application is hampered by having limited access to the kernel resources it needs to monitor and tune.

    • NLANR/DAST has an FTP client which automatically sets the socket buffer size to the measure bandwidth*delay product for the path. This client can be found at http://dast.nlanr.net/Projects/Autobuf/
    • NLANR/NCNE maintains a tool repository which includes application enhancements for several versions of FTP and rsh. Also included on this site is the nettune library for performing such enhancements yourself. connection send and receive buffer limits that can be adjusted by the user, application or other mechanism as long as they stay within the maximum memory limits above. These buffer sizes correspond to the SO_SNDBUF and SO_RCVBUF options of the BSD setsockopt() call. [more][less]

           The socket buffers must be large enough to hold a full BDP of TCP data plus some operating system specific overhead. They also determine the Receiver Window (rwnd), used to implement flow control between the two ends of the TCP connection. There are several methods that can be used to adjust socket buffer sizes:

       

      TCP Autotuning automatically adjusts socket buffer sizes as needed to optimally balance TCP performance and memory usage. Autotuning is based on an experimental implementation for NetBSD by Jeff Semke, and further developed by Wu Feng's DRS and the Web100 Project. Autotuning is now enabled by default in current Linux releases (after 2.6.6 and 2.4.16). It has also been announced for Windows Vista and Longhorn. In the future, we hope to see all TCP implementations support autotuning with appropriate defaults for other options, making this website largely obsolete.

  • The default socket buffer sizes can generally be set with global controls. These default sizes are used for all socket buffer sizes that are not set in some other way. For single user systems, manually adjusting the default buffer sizes is the easiest way to tune arbitrary applications. Again, there is not standard method to do this, you must refer to the detailed procedures below.

    Since over buffering can cause some applications to behave poorly (typically causing sluggish interactive response) and risk running the system out of memory, large default socket buffers have to be considered carefully on multi-user systems. We generally recommend default socket buffer sizes that are slightly larger than 64 kBytes, which is still too small for optimal bulk transfer performance in most environments. It has the advantage of easing some of the difficulties debugging the TCP Window scale option (see below), without causing problem due to over buffering interactive applications.

  • For customs

3. TCP Large Window Extensions (RFC1323): These enable optional TCP protocol features (window scale and time stamps) which are required to support large BDP paths.

 

      Note that under these constraints (which are common to many platforms), a client application wishing to send data at high rates may need to set its own receive buffer to something larger than 64k Bytes before it opens the connection to ensure that the server properly negotiates WSCALE.

 

 

 

     Another RFC1323 feature is the TCP Timestamp option which provides better measurement of the Round Trip Time and protects TCP from data corruption that might occur if packets are delivered so late that the sequence numbers wrap before they are delivered. Wrapped sequence numbers do not pose a serious risk below 100 Mb/s, but the risk becomes progressively larger as the data rates get higher.

  1. Due to the improved RTT estimation, many systems use timestamps even a low rates.

 

5. TCP Selective Acknowledgments Option (SACK, RFC2018) allow a TCP receiver inform the sender exactly which data is missing and needs to be retransmitted.

Without SACK TCP has to estimate which data is missing, which works just fine if all losses are isolated (only one loss in any given round trip). Without SACK, TCP often takes a very long time to recover following a cluster of losses, which is the normal case for a large BDP path with even minor congestion. SACK is now supported by most operating systems, but it may have to be explicitly turned on by the system administrator.

If you have a system that does not support SACK you can often raise TCP performance by slightly starving it for socket buffer space, The buffer starvation prevents TCP from being able to drive the path into congestion, and minimize the chances of causing clustered losses.

Additional information on commercial and experimental implementations of SACK is available at http://www.psc.edu/networking/projects/sack/.

 

5. Path MTU The host system must use the largest possible MTU for the path. This may require enabling Path MTU Discovery (RFC1191, RFC1981, RFC4821).

Since RFC1191 is flawed it is sometimes not enabled by default and may need to be explicitly enabled by the system administrator. RFC4821 describes a new, more robust algorithm for MTU discovery and ICMP black hole recovery. See our page on jumbo MTUs for more information.

The Path MTU Discovery server (described in a little more detail in the next section) may be useful in checking out the largest MTU supported by some paths.


知道了原理,那在 Linux 、FreeBSD 及 WinXP 系統上如何 Turning 呢?

今天先來試看看 FreeBSD ,明天待緒 Linux 和 WinXP 及程式開發的部分!

FreeBSD下的 Turning  

raise the maximum socket buffer size

 > sysctl -w kern.ipc.maxsockbuf=4000000

並且有分為 TCP/UDP 的三個參數可以設定,但要使用這三個參數前 /etc/rc.conf 參數檔中的 'tcp_extensions="YES"' ,並且設定 sysctl 參數: net.inet.tcp.rfc1323

net.inet.tcp.sendspace

net.inet.tcp.recvspace

net.inet.udp.sendspace

FreeBSD 的預設中 "inflight limiting" 是啟動的,這一個功能也會影響到 TCP 的傳輸量!

可以進行手動關閉 >sysctl -w net.inet.tcp.inflight_enable=0

而 MTU discovery  在 FreeBSD 系統上也是 defult 是啟動的,可以在 sysctl 上把它 Mark 起來 --- net.inet.tcp.path_mtu_discovery