PML sends UDP packets to a multicast group to distribute resources
from a single sender to a (potentially very large) group of receivers. The
receivers provide no feedback to the sender whether they have received all the
packets or not. Because packet-loss is always to be expected, Forward Error
Correction (FEC) is required to prevent certain levels of packet-loss from
impacting the correct reception of the original data.
FEC algoritms encode a checksum for a set of data. Both the
original set of data and the checksum data is then transmitted. A receiver can
then use the FEC algoritm on the set of data and the checksum data
to decode the original set of data even when part of the
original set of data is lost or corrupted during transport. To give an example,
suppose the original set of data is k = 100 bytes and we select an n = 150.
With the FEC algoritm we compute n - k = 50 checksum bytes for a total
amount of k + n = 150 bytes. The receiver uses the data it receives to
re-construct the original k = 100 bytes.
There are currently two non-patented FEC algoritms for which open-source
implementations exist:
The Reed-Solomon codec
is a so-called small-block FEC algorithm. It can only calculate
checksums for small sets of data. For practical purposes the limit for the
set of original data plus the checksum data currently is 255 bytes. For
larger sets of data the encoding and decoding times become prohibitive. To work
around this limitation a complex scheme for
encoding the data into packets is required but, even with this scheme, files
will need to be broken up into multiple parts before they can be transmitted.
The downside of breaking up a file in different parts is that
checksum data is only useful for recovering errors in the transmission of the
part to which they belong to. So, if the transmission of 9 out of 10 parts of a
file experiences no packet-loss at all but the transmission of the 10th part
experiences just 1% more packet-loss then the FEC algoritm can correct then the
transmission of the complete file has failed. An advantage of Reed-Solomon
codes is that a receiver only needs to receive an amount of data that equals
the size of the original set of data to be able to restore the original set of
data. This property of a FEC codec is called MDS (for Minimum Distance
Separation). In the example above, the receiver has to receive any k = 100 of
the k + n = 150 bytes to reconstruct the original k = 100 bytes. The advantage
of MDS is that you only need to take into account the expected packet-loss in
the transmission path to calculate the amount of checksum data. If you expect
10% packet loss for k = 100 bytes then you specify n = 10 checksum data.
The Low
Density Generator Matrix (LDGM) codec is a large block FEC codec that
supports large sets of data to be coded. These sets of data can be tens of
megabytes. The advantage of a large block codec is that any checksum packet can
be used for the complete data transmission, which is not the case with small
block codecs. Another advantage of the LDGM codec is that it is much
faster to encode and decode data then with the Reed-Solomon codes. The
disadvantage of the LDGM codec is that is is not MDS. This means that using the
example above it needs to receive k + I * k (for inefficiency) packets to
be able to compute the original data set of k packets. Research
has shown that inefficiency I is on average 12.84% for a dataset
between 2 and 10Megabytes. The median inefficiency is 11.20% and
the measured inefficiency ranges from 10.1% to 27.0%. This means that
the number of checksum packets that should be included in the transmission
not only depends on the expected packet-loss but also should take into account
an estimate on the inefficiency of the codec.
Version 0.1 of the PML software only supports the Reed-Solomon codec. The
upcoming 0.2 version will support the LDGM codec as well.