Capillary multi-path routing with
Forward Error Correction
Switzernet – EPFL
1015
emin.gabrielyan@{switzernet.com,epfl.ch}
Table of contents:
II. Path Diversity Spread
Routing
IV. Adaptive Redundancy Overall
Need (ARON)
V. Real-Time streaming and
choice of an encoder
VI. Choice of the MDS FEC
block size
VIII. Building Capillary Routing
Media streaming is becoming increasingly important in Internet. The major cause for multimedia quality degradation in IP-networks is packet loss. Packet loss occurs in network due to bursts and link failures or packets may be dropped in the application, if they are received too late.
Forward Error Correction is a mechanism that may allow the streaming application to overcome losses without utilizing retransmissions. Real-time media or files sent over the internet are chopped into packets, and each packet is either received without error or not received. Packetized communications over internet thus behave like erasure channels and the erasure resilient FEC codes, such as Reed Solomon or another Maximum Distance Separable (MDS) code can be applied on the packet level to a stream of UDP (User Datagram Protocol) data packets.
With the proper amount of redundant data included in the stream of transmitted packets, erasure resilient FEC can mitigate the impact of packet loss on the data. For numerous packetized applications, employment of erasure resilient FEC codes offered spectacular results: in satellite feedback-less broadcasts of recurrent voluminous updates of GPS (Global Positioning System) maps to millions of motor vehicles under the condition of arbitrary fragmental visibility due to their locations and trajectories: tunnels, underground parking, buildings and garages [honda04]; in a film industry for a fast delivery over Internet of the day’s film footage from the location it has been shot to the studio that is many thousand miles away [hollywood03] using LT codes [Luby02]. 3GPP, 3rd Generation Partnership Project, recently adopted Raptor [Shokrollahi04] as a mandatory code in Multimedia Broadcast/Multicast Service (MBMS), a part of 3G standard, for its low CPU usage and for the significant performance in media broadcasting and file transfer [MBMS05a], [MBMS05b], [MBMS05c].
The above references are rather examples of off-line applications, where there are no constrains on the buffering time because the sender is not obliged to deliver in time the “fresh” packets of a very short life time.
Regarding from the stand point of a real-time application, in the film transfer or in the GPS map update, the buffering time is practically unlimited. Several publications studied application of FEC in real-time media streaming. [Johansson02] reported improvements from the application of FEC if the packet losses range is between 1% and 5%. [Huang05] proposed to combine retransmissions, if the round trip time (RTT) is low, with FEC, and reported improved speech quality perception in a practical experiment. [Padhye00] reported high overheads from the use of FEC during bursts but proposed to survey the history of losses (experienced by the application) after reconstruction to choose a good FEC. Other publications have shown that for two-way, delay-sensitive real-time communications, the application of FEC on the packet level can not give any valuable results at all [Altman01].
In the off-line streaming success examples (MBMS or GPS map update) it was the long (or almost unlimited) buffering time that let FEC be an efficient solution of the fault tolerance. However according to [Altman01] in real-time applications with limited buffering time, only negligible improvement from application of FEC can be obtained at the cost of very high overheads. Voice over IP, a widely demanded interactive application, is an example of a poor FEC efficiency: the ratio of the lossy network path’s failure time over the permitted limit on the buffering time is too high.
If it is still believed that FEC can however significantly improve the tolerance of real-time streaming, we must exploit other dimensions which can “replace” the long buffering time and open to FEC the true perspectives in the domain of real-time streaming.
All previous studies stressing on the poor FEC efficiency assumed that the media stream always follows a single path. Short granular failures can be still mitigated at a certain cost of the FEC overhead, but obviously, there is no FEC which can solve a problem of a complete link failure on the single route path if the failure duration is long enough to overpass the permitted maximum of the buffering time (long bursts or congestions, hardware or software circuit outages until the recovery triggering, deny of service – DOS attacks).
Application of FEC to the real-time streaming however is not hopeless, because, there exists the other unexploited dimension: the underlying network routing, an alternative orthogonal method toward the buffering. The previous studies ignored the effort that could have been provided by the underlying routing.
The advantage of the flow following multiple paths versus a single path routing is obvious: a long link failure on the single path route leaves the application completely helpless, but the application can apply FEC to recover a single link failure of a spread multi-path route.
There is an emerging body of a literature addressing the advantages of multi-path routing in media streaming. The author of [Qu04] studied video communication following alternate paths and has shown that in disjoint paths the advantages of strong FEC are significant; in the correlated paths weak FEC can be still advantageous. In [Tawan04] the author proposed path diversity in Ad-Hoc network motivated by the load balance and capacity increase concerns but mostly by the choice of the adaptive routing protocol; the achieved efficiency in use of FEC was also mentioned. In [Ma03] and [Ma04] it was suggested to replace in MANET the link level Automatic Repeat Query (ARQ) by a link level FEC assuming intelligent regenerating nodes. The routing topology is simplified into a path with serial or parallel links between the nodes and it was suggested that the end-to-end packet loss rate as a function from hops will improve. However in such a hop by hop based FEC, since the retransmission of a lost media packet may be delayed until its recovery, for the stability of the real-time media the node needs a jitter buffer (of the size of the transmission block) and the jitter buffer’s constant delay will be propagated therefore with each hop, weakening this approach for the real-time streaming. In [Nguyen02] it was proposed the server diversity in one-way video, demonstrating that streaming to a single receiver from multiple servers improves the tolerance and increases the capacity. The author is focusing on the receiver driven protocol for allocating to sending servers the rates (based on RTT values measured from the receiver) and for partitioning non-overlappingly the packets to be transmitted. But for the buffered multi-point video delivery the author should have chosen instead of Reed-Solomon (RS) a rateless uniform fountain code (random linear fountain, LT or Raptor [MacKay05]) and would therefore skip from the protocol the packet partitioning part [Byers99]. The author also ignores the network topology, assuming that the routes from the senders to the client do not share links. The same author lately studied the path diversity effect, this time, on the protection of point to point real-time video streaming using a permanent RS(30,23) encoding (23 media packets and 7 redundant packets) at the sender [Nguyen03]. The author, similarly to [Qu04] compared two-path scenarios (shortest path together with an alternate completely disjoint or with an alternate but correlated path) with the single OSPF routing strategy and has shown clear advantages of the double path routing. The choice of RS versus fountain code is justified in [Nguyen03] by the limits of the real-time application on the buffering.
If one of the axes amplifying the effect of FEC was the buffering time, then the second orthogonal ax therefore is clearly the routing. While the first ax, the buffering [Weatherspoon02], [Krunz04], gave excellent results in the off-line streaming, the second, the routing ax, is hopefully suitable for the protection of real-time two-way streaming, insurmountable by the buffering ax alone, due to interactiveness and delay sensitivity.
All above referred publications suggested the important and promising impact the routing can have to the efficiency of FEC. However the considered topologies were simple and limited to only two, possibly correlated, paths [Qu04] and [Nguyen03], or in the best case to a sequence of parallel and serial links [Ma03]. We have not found a study of an impact of a wider range of routing patterns to the FEC efficiency. There is lack of a work in the literature studying the routing as a space of possibilities or as a dimension, and seeking in there the optimal point of FEC efficiency – the friendliest routing pattern. From this point of view we can say that the previous studies only proved the advantageousness of path diversity, most of the time of double-paths, toward only single path routing. The author of [Nguyen03] concludes that deployment of the second redundant path along with the default path effectively results in a factor of 3 reduction in irrecoverable packet loss as compared to uni-path scheme. In this paper we try to present a global comparative study across much larger number of friendly routing patterns, all multi-path, virtually erected along a routing ax. Single path routing, being considered as too hostile, will be even excluded from our comparison system. We begin from the strategies similar to multi-path approaches of [Qu04], [Nguyen03] and [Ma03] as our starting point, but we will develop them step by step into more elaborated, dispersed and advantageous routing samples. Addressing the parallelism and spreading aspects in the routing patters we assume several abstractions and simplifications for other network model parameters (e.g. uniform links and lack of delay prorogation). To compare one spread routing with another one, we will introduce a measure of the routing’s advantageousness. We will demonstrate that the first level of diversity obtained by converting a single path routing to the basic multi-path routing is not the terminal achievement of the path diversity approach. The routing can be still sensibly improved (toward FEC), layer by layer. Routing ax consisting of only spread routing patterns; can further burst the efficiency of FEC. We hope that our investigations will provide guidelines for future design and development of multi-path routing strategies in multimedia transmission.
Addressing the problem of fault-tolerance from the routing point of view, we propose to route a media stream in a way which minimizes the impact of individual link failures to the quality of the media. Even if failure recovery mechanisms are assumed in the network, the way we route the end to end transmission over the network must maximally protect the media before the network detects and recovers the failure. Suggested routing does not require additional capacities from the network; it simply spreads the packetized transmission over as many independent paths as possible.
If a network can sustain a given number of independent transmissions each following a single path, then in principle, the same transmissions can traverse the network in a spread fashion, without requiring additional capacity.

Transmissions following a single path

Transmissions spread over multiple paths do not
require extra network capacity
Spread routing alone does not solve the problem of tolerance. If media stream is not capable to sustain any losses at all, by spreading the transmission the media becomes more vulnerable to link failures, since there are more links whose failure will damage the stream. However if media is equipped with a certain level of tolerance toward losses, the spreading of flow may protect the transmission. With sufficient redundancy and spreading factor the transmissions can be protected completely against any single link failure.
A tolerance of media to losses can be the part of the nature of the media itself. If the media is destined for a human perception (VOIP, image or video), it can still conserve its content with a quite high level of stochastic losses. With g729r8, g723r53, g723r63 and gsmfr VOIP compression codecs the speech remains comprehensible with 8% to 11% losses. Similarly, a video compression codec (e.g. MPEG) also ensure a certain tolerance to losses, e.g. due to passive error concealment (level of tolerance of video streams is lower compared with voice).
Natural tolerance of media may be not sufficient to make a real benefice from the spread routing. For any type of real-time media transmission, tolerance to losses can be obtained by injection at the source of some redundancy provided by erasure resilient forward error correction codes (e.g. Reed Solomon codes).
For the two figures above, assuming a tolerance of 1/3 for each media stream, in the single path scenario, any single link failure would kill one of the media streams. At the same time all multiple path streams will sustain a complete failure of any of three links.
Most internet routers employ First-In-First-Out (FIFO) policy in which, the successive arrived packets are dropped if the network buffer becomes full. Hence a model of fragmental link failures can be adopted to approximate link outages, bursts and congestions. Link is either in the operational or in a faulty state and is characterized by its average failure and operation time.
It is not however very efficient to permanently transmit with the media stream a large amount of constant redundancy. It is much better to increase the redundancy on demand. The receiving node can detect the deficit of delivered packets and demand an appropriate increase of the source coding. In this paper we propose a combination of the following three factors: (1) an optimal friendly spread routing, (2) a little permanent constant tolerance to combat bursts and (3) an added layer of a dynamic adaptive end-to-end coding to combat outages, congestions and long failures.
As a reference for the real-time streaming application we take the most widely used example: voice over IP. The driving motivation for transporting real-time voice over IP networks (apart the uniformity, fluidity and manageability aspects) is the significant cost savings achievable by eliminating or bypassing the circuit-switched TDM telephony infrastructure. Although use of VOIP is currently weak, the wide penetration of the packetized voice with possibly complete replacement of TDM technologies is practically inevitable.
Most frequent failures, congestions caused by the bursty traffic, are not supposed to be considered as link outages, since most of routers maintain QoS and the tiny VOIP stream must not suffer from the HTTP bursts, which are often less important from the user point of view, but may consume 25 to 50 times more bandwidth. The IP telephony service provider (ITSP) however often is different from the internet service provider (ISP, usually the former monopolist of the ex-ante regime providing internet via ADSL employing the last kilometer infrastructures) and the packets of the ITSP most of the time do not have particular priority in the network of the ISP. The bandwidth consumed by IP Telephony VOIP stream (using the high complexity codecs) with all IP, UDP and RTP header overheads is as low as 16 to 20 Kbps (the pure voice payload bandwidth is as low as 5.3, 6.3 or 8 kbps, the recently standardized AMR has 4.75 kbps mode). The average ADSL capacity is of 600 Kbps to 1 Mbps for the downstream and the forth of it for the upstream. The end user will have therefore absolutely no objection if, due to a short failure somewhere in internet, the IP telephony voice call application will be capable to temporarily burst its transmission rate several times saving thus the voice call from dropping. [Huang05] presented an implementation doubling the media packets at high losses and reported good results in practice. With a spread multi-path routing the need of a bit rate increase would be much fewer. As a reaction to failures arbitrarily occurring at different places in the network, the ability of a voice software or hardware (e.g. of a SIP phone) to periodically increase its transmission rate, maintaining the speech quality constantly at the high level during the phone conversation, will be highly appreciated and the bandwidth is not a concern. In the present study we therefore assume that all links have a sufficiently large capacity compared with the bandwidth of a media stream.
The cooperativeness or friendliness of a routing toward FEC
is simpler of all to measure based on the satisfaction level of a realistic
application employing end-to-end adaptive FEC. In the spread routing, at the
node which has multiple outgoing links, the router randomly allocates the
packets of the incoming queue to the outgoing links (according the load
distribution pattern of the routing). Losses at the receiver, occurring due to
temporary congestions or failures of network links have therefore a random
pattern. All links of our model have an equal probability, frequency and
duration of failure. Before the triggering of any failure detection and
recovery mechanism, a failure of a link laying on the communication path,
produces, according the portion of the traffic being still routed toward the
faulty route, random losses, observed finally at the receiver. The media stream
of the application is equipped with a constant tolerance t to certain level of losses (
). If the losses measured at the receiver are exceeding the
critical level the receiver demands the sender via a feed-back channel
injecting into network an extra redundancy. Either the feed-back loop is
assumed negligibly short compared with the duration of a failure of a link or
the usual constant tolerance of the media to random losses is high enough to
leave a sufficient time to the feedback loop, without a risk of too many
damages to the usability of the media stream. The sender must add to its
further transmission a sufficient quantity of redundancy to compensate the new
losses signaled by the receiver, thus continuing to maintain the communication
constantly at the required level of quality. The sender must compute the
redundancy need as a function from the percentage of losses p observed by the receiver in the stream
of incoming packets. Such an end-to-end feedback based adaptive FEC mechanism
is not aware of the routing and is implemented entirely on the end nodes in the
application level.
Implementation of this method in practice was proposed by several authors: [Kang05] proposed to adjust the amount of FEC based on packet loss information under dynamically changing network environment, [Xu00] proposed to dynamically combine FEC with automatic repeat request. [Padhye00] proposed to slowly tune the amount of FEC considering the history of losses measured after decoding. [Johansson02] and [Huang05] proposed to apply to packet switched network the mode adaptation scheme of Adaptive Multi-Rate (AMR) codec, adopted as the standard speech codec by 3GPP. If the network is lossy, source coding is reduced and channel coding is increased and, inversely, if the streaming is lossless the source coding is increased and a weak FEC is used.
FEC can be added to the media stream in one of the following three ways. In media specific FEC methods, the redundant information is generated using a different lower bit rate encoding algorithm than the algorithm used to encode the original packet (e.g. using the eight modes of AMR). Media specific schemes piggyback information about the present period with later packets [Padhye00], [Altman01], [Johansson02]. In a similar method, but without multimode media source encoder, i.e. abstracting from the media, packets can be extended to carry one or more duplicates of previously produced samples or their XOR products (or another form of their redundancy). While the two above methods are extending the packets of the stream in size, according the third method the redundant packets (representing a number of original packets) are extending the stream in quantity, inserting the redundant packets in the stream [Huang05], [Nguyen03]. In our model we use the third method. The following arguments are in favor of injecting the redundant packets into the stream. By adding packets of identical size we do not (or almost do not) influence on the packet loss rate (which can be a function from the link BER and the packet size or from the queue policy of routers), and thus we do not need to model individual links and routers to adjust this change. A practical advantage of adding the redundant packets without modifying the stream of original packets (using a systematic erasure resilient FEC code) is the possibility of a full transparency on the media application layer. The redundant packets can be generated and decoded back into media packets (or dropped if no media packet of the block is lost) on the edge routers (usually the NAT router of the user), leaving this process completely transparent for the media application.
Although the proposed adaptive FEC application is not aware of the underlying network topology and the routing, we define the aggregate effort of adaptive redundancy demanded from the sender by the receiver during the whole communication time as an evaluation measure of the friendliness and the advantageousness of the underlying network routing toward the tolerance. Fewer is the overall amount of the encoder’s redundancy effort demanded by the receiver to the sender, higher is the rating of the routing toward the tolerance of the communication (assuming that the routing suggestions are measured under the identical pattern of network link failures). The so defined encoder’s aggregate effort, called henceforth Adaptive Redundancy Overall Need (ARON), is the easily measurable scalar value characterizing the friendliness of the entire underlying routing (toward FEC). Any collection of routing suggestions can be sorted by their advantageousness toward tolerance, simply by measuring and comparing their ARON values.
Strictly defined ARON is the sum of all FEC overheads demanded from the sender for the compensation of sequential losses of all sub-flows carried by each individual link in the footprint of the communication path. The critical links of the path carrying the entire traffic are ignored, since the FEC required for the compensation of a failure of such links would be infinite. We assume that all routing schemes that are subject of comparisons are smart enough to delegate the load from a critical route over other links, if of course the network topology makes it possible. Thus our comparisons cannot contain a really bad routing sample which is following a single path route on a link when alternate multiple path routes are possible, and if the link is really critical by the network topology, then without a risk of affecting the comparison results such a link can be removed from all suggested routings in order to compare their remaining portions. We do not therefore study the advantage of the multiple path strategies versus single path routing, as in [Qu04], [Ma04], [Nguyen03], but we rather measure the advantageousness of routing within the scope of multi-path approaches.
If the failure periodicity is the same for all links such
that any single link fails approximately every s seconds, if the observation duration (or the duration of the
communication) is T and the duration
of a single failure is d, then
is the total failure
time of any single link during the observation period.
Let M be the
initial number of original media packets to be delivered to the receiver by the
transmission blocks. Let
be the needed size
(number of packets) of the transmission block, under
percent of random
losses in the network (observed at the receiver) for maintaining the required
QoS. Let L be the set of links laying
on the communication path, and let
be the load of a link
under the present
routing scheme. Let the percentage
of recoverable losses,
be the level of the tolerance permanently maintained in the media stream (even
if there are no observed losses, good for bursty loss environment). The
equation for ARON then looks as follows:
If we know the ARON of the given routing, we can deduce the number of redundant packets that the sender will be obliged to transmit in order to compensate all arbitrary network failures during the call duration T. We need also to know the periodicity and the duration of an individual link failure and the block transmission rate B (the block transmission rate is supposed to stay constant, while the packet transmission rate may periodically rise due to temporary increase of the number of redundant packets in the block). The number of redundant packets (the formula below) gives a practical meaning to the routing parameter ARON, but we assumed here a single link failure at a time [] (e.g. short link down-time and long link up-time).
![]()
as a function from the
network’s random loss probability p
may vary depending on the type of the encoder, the number of the media packets
in the block (the buffering time) and the requirement for QoS.
We can imagine a pseudo “real-time” file transmission application in which the reception must be maintained at a constant rate and the file transfer must be accomplished within the time of the file size over the reception rate. Thus to ensure the reception at the maximal downlink rate through a lossy network the sender must transmit at a variable rate compensating thus the arbitrary losses arising in different network locations. This could be needed for synchronous updates of multiple stations of limited downlink bandwidth or a common problem of a last kilometer bottleneck for the internet user watching a one way video [Nguyen02].
Files are chopped into packets and transmitted by a spread
route over the network possibly experiencing a faulty link, which is the same
as the communication via an erasure channel with a random loss pattern, since
the routers allocate the packets to the outgoing links randomly (according to
the routing pattern). The theoretically rateless random linear fountain code [MacKay05], or the practical LT code [Luby02], or the improved Raptor code [Shokrollahi04], all reach in the file transfer
application the Shannon capacity universally for any loss pattern (LT and
Raptor codes with the decoding failure probability of
, but lower in the recent versions of Raptor). We can
maintain the constant receive rate
by maintaining the
packet transmission variable rate at
, where p is the
current random loss probability. At the same time the above cited fountain
codes ensure the successful decoding (with a very little overhead) because they
are simultaneously near-optimal for any erasure channel. Thus the routing
parameter ARON for a file transfers application using at the sender a rateless
universal fountain code (video chopped into files or very large blocks) and
maintaining at the receiver constant reception rate (the bottleneck of the last
kilometer), is computed by the following equation:
![]()
Let us compute the value of ARON for two routing examples over the same network shown in the figure below, where a is the sending node and b is the receiving node. Let the constant part of the tolerance of the media stream in this example be 0.1, i.e. the media stream (without additional adaptive tolerance demanded from the receiver) permanently survives the loss of up to 10%.

According to the first routing, shown in the figure below,
the flow is evenly split over two paths
and
. The ARON of this routing is equal to 3.2.
|
|
link |
load |
redundancy need |
|
|
a |
b |
0.5 |
0.8 |
|
|
a |
c |
0.5 |
0.8 |
|
|
d |
b |
0.5 |
0.8 |
|
|
c |
d |
0.5 |
0.8 |
|
|
c |
e |
0 |
0 |
|
|
e |
d |
0 |
0 |
|
The second routing is similar to the first one except that
it is slightly refined by breaking the load on the link
across itself and the
two-link alternate path
. The ARON of this new routing is 3, which is lower the ARON
of the previous routing and therefore the new routing is FEC friendlier and is
more advantageous.
|
|
link |
load |
redundancy need |
|
|
a |
b |
0.5 |
0.8 |
|
|
a |
c |
|||