The MPLS-OPS Archive

Cell Relay Retreat>MPLS-OPS Archive>month:2002-Feb> msg00155



[Date Prev][Date Next][Thread Prev][Thread Next]  
  [Date Index][Thread Index][Author Index][Subject Index]

RE: Doubt in MPLS fault-tolerance

  • From: nbwaite@attglobal.net
  • Date: Fri, 22 Feb 2002 21:44:34 -0500 (EST)
  • Resent-Date: Fri, 22 Feb 2002 22:13:19 -0500
  • To: mpls-ops@mplsrc.com

Ramachandra Bachala:

   > I got some doubts.  Regarding my thesis, I would
   > like to do something that is new.  Then, I got an
   > idea.
   >
   > Considering fault-tolerance in MPLS, We will always
   > have back-up path for the current path.  If current
   > path fails, the traffic is shifted onto the backup
   > one.  Once restored the traffic is shifted back to
   > the current one.  Now, this way the load
   > oscillations will occur which is undesirable if the
   > failure occurs frequently on the current one, since
   > the probability will be higher to failure.  So, what
   > if we can use the backup path as the main path,
   > though previous path is restored.  Now, the load
   > oscillations will be less.  What is your idea on
   > this.  Is this worth enough? or is this ridiculous?

Sure, systems with automatic controls can oscillate.
But, it is premature to consider oscillations.

You might consider:

Identify and describe at least roughly the collection of
networks your are considering.

Your efforts in fault tolerance will likely not cover
everything that is conceivable.  So, get a description of
the kinds of failures you will be 'tolerant' to.

Stand back a little and see and then describe in
sufficient detail the 'uncontrolled exogenous' influences
that are causing the failures.

Since it appears that you want to do this work in the
context of probability, construct an appropriate
probabilistic model of these influences and failures:

     The key content of such a model includes (1) the
     distributions of various relevant random variables
     and (2) the conditional independence relationships
     among these random variables.  In simpler terms, for
     (1) you need random variables whose values will
     describe, in sufficient detail for your purposes,
     the system as it operates and fails; for (2) you
     need to know what random variables are independent
     of what other random variables.  Here (2) is quite
     simplified:  Conditional distributions will likely
     be necessary.  E.g., your model may include a Markov
     assumption where the past and future are
     conditionally independent given the present.  E.g.,
     you may want an arrival process that has stationary
     and independent increments -- there is only one of
     these, the Poisson process.  E.g., you seem to have
     in mind that the conditional probability of a
     failure is higher given a recent failure; you will
     need to be more clear on this and to convert it into
     some justifiable mathematics you can manipulate for
     your main goals.

So, now you have some descriptions of your system and the
causes of failure.

Now you need to move on to what you are going to do about
the failures.

     Broadly, one issue is to detect the failures and a
     second issue is to diagnose them, that is, find the
     causes.  These are not always trivial.  Some work in
     detection is in

          N. B. Waite, "A Real-Time System-Adapted
          Anomaly Detector", 'Information Sciences',
          volume 115, April, 1999, pages 221-259.

     But, for this discussion, suppose we just leave
     detection and diagnosis up to your network
     operations center (NOC) and propose that they report
     failures and their causes.  We are not saying that
     the NOC's work is easy or that no problems remain;
     we are just assuming for our purposes they are able
     to do the work.

So, you have a failure and its cause:  Now what?

Well, what do you want?  Yes, you want to get the network
fully operating again.  So, broadly you will want to
exploit some spare capacity, reroute traffic, get repairs
to correct the specific failure, and possibly to consider
what you might do if there were a second failure while
the repairs where still in progress from the first
failure.  Still more generally, at least in theory, would
be a problem of several failures coming while several
repairs were still in progress.

You have mentioned having backups for current paths.
Well, in terminology you may consider that two nodes are
directly connected with a 'link' and that a 'path'
consists of one or more such links (joined in the
appropriate ways at nodes, with no loops, honoring
one-way considerations, etc.).  With this terminology,
you may want some 'backup' links instead of whole backup
paths and to use the backup links to construct the needed
new paths broken by the failure.  There are standard
approaches for such things, and you may wish to assume
these are what you will consider.

Your probabilistic model for your uncontrolled exogenous
influences that are causing the failures will be a
'stochastic process'.  As you receive an unpredictable
failure and its cause and execute a correction, your
correction will be essentially a 'control'.  So, broadly
your work will be in the area of 'stochastic control'.

As you determine your 'control', you may also be
concerned with effects on capacity, performance, and
reliability.  Yet, you may not be so concerned:  It may
be that time to repair is so much shorter than time to
failure that what happens as the repairs are being
implemented is of no real concern.

And, oscillation per se may not be much of a concern.

What might be a concern?  Well, usually people are
concerned with cost.  So, might focus on reducing
expected cost.

But, quite broadly you should want your work to be "new,
correct, and significant".  For the third, it is good to
get a good solution to an important problem.  For the
first, it is important to see what is already in the
literature.


Norman B. Waite, Ph.D.
Network Architectonics
9 Fox Run
Wappingers Falls, NY 12590
nbwaite@attglobal.net
845-227-7821

-------
The MPLS-OPS Mailing List
Subscribe/Unsubscribe:  http://www.mplsrc.com/mplsops.shtml
Archive: http://www.mplsrc.com/mpls-ops_archive.shtml