The IP over ATM Mailing List Archive by date

Cell Relay Retreat>List Archive>month:1995-Oct> msg00091



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

AAL5 CRC and TCP.

  • From: Craig Partridge <craig@aland.bbn.com>
  • Date: Mon, 23 Oct 95 12:35:19 -0700
  • CC: ip-atm@matmos.hpl.hp.com


    The AAL5 specification calls for AAL5 to do a CRC for each cell.
    With this being done, and the fact that lines today
    are better than when TCP was first implemented, is there
    still a need to do a checksum in TCP when running over ATM?

    I was wondering if most people think not doing the checksum
    would be safe.

Pat:

    Jim Hughes, Jonathan Stone and I did a study on the AAL5 CRC and the
TCP checksum published in SIGCOMM '95.

    We didn't look at your question directly, but my view is that taking out
the checksum would be a bad idea.  Here's the basic logic.  The study showed
that under a particular class of error scenarios, using real data (i.e.,
this was an empirical study), the CRC failed to detect one error in 2^32 packets
and the TCP checksum failed to detect one error in about 2^10 packets.
Furthermore, while there's nowhere near enough data to be sure, their
failure rates seem to be independent, so combined we might see a failure
of one in 2^42 packets.  These error rates were specific to the error
scenarios we studied, but let's assume for a moment they are generic (since,
to my knowledge, no one else has done any empirical studies at all).

    So the question is, is 2^32 good enough?  Let's suppose we've got a
gigabit ATM link (which for purposes of easy math, we'll say is 2^30 bits
per second).  The average TCP/IP packet size is about 1000 bits (2^10 bits).
So 2^32 packets is 2^42 bits and our gigabit link transmits that many packets
in 2^12 seconds.  That's about 70 minutes.

    So, if you have a link which is really sick and trashing each packet, in
a bit over an hour, a packet would slip by the CRC.  If you used the CRC + TCP
checksum, it would be closer to two months before a packet slipped past.

    Now, one will immediately say that no one is going to let a link run sick
for an hour and and ten minutes.  That's probably true -- but imagine a link
that is slightly sick and has brief errored periods through a week -- what's the
chance the total errored time might equal more than an hour???  The point is
that the safety limit with just the CRC is too low.  (And goes down as bandwidth
goes up -- folks are fast approaching 100 Gb/s SONET links in the lab....)

Craig

PS: Oh yes, you may say, well what's one bad packet?  There are legions of
NFS users who have suffered bit rot on Ethernets from the occasional bad
packet that slipped past the 32-bit CRC and since the packets weren't
checksummed, lost entire filesystems as a result.