PML: Encoding and decoding resources

FEC codecs

RS coding scheme

Menu

PML sends UDP packets to a multicast group to distribute resources from a single sender to a (potentially very large) group of receivers. The receivers provide no feedback to the sender whether they have received all the packets or not. Because packet-loss is always to be expected, Forward Error Correction (FEC) is required to prevent certain levels of packet-loss from impacting the correct reception of the original data.

FEC algoritms encode a checksum for a set of data. Both the original set of data and the checksum data is then transmitted. A receiver can then use the FEC algoritm on the set of data and the checksum data to decode the original set of data even when part of the original set of data is lost or corrupted during transport. To give an example, suppose the original set of data is k = 100 bytes and we select an n = 150. With the FEC algoritm we compute n - k = 50 checksum bytes for a total amount of k + n = 150 bytes. The receiver uses the data it receives to re-construct the original k = 100 bytes.

There are currently two non-patented FEC algoritms for which open-source implementations exist:

The Reed-Solomon codec is a so-called small-block FEC algorithm. It can only calculate checksums for small sets of data. For practical purposes the limit for the set of original data plus the checksum data currently is 255 bytes. For larger sets of data the encoding and decoding times become prohibitive. To work around this limitation a complex scheme for encoding the data into packets is required but, even with this scheme, files will need to be broken up into multiple parts before they can be transmitted. The downside of breaking up a file in different parts is that checksum data is only useful for recovering errors in the transmission of the part to which they belong to. So, if the transmission of 9 out of 10 parts of a file experiences no packet-loss at all but the transmission of the 10th part experiences just 1% more packet-loss then the FEC algoritm can correct then the transmission of the complete file has failed. An advantage of Reed-Solomon codes is that a receiver only needs to receive an amount of data that equals the size of the original set of data to be able to restore the original set of data. This property of a FEC codec is called MDS (for Minimum Distance Separation). In the example above, the receiver has to receive any k = 100 of the k + n = 150 bytes to reconstruct the original k = 100 bytes. The advantage of MDS is that you only need to take into account the expected packet-loss in the transmission path to calculate the amount of checksum data. If you expect 10% packet loss for k = 100 bytes then you specify n = 10 checksum data.

The Low Density Generator Matrix (LDGM) codec is a large block FEC codec that supports large sets of data to be coded. These sets of data can be tens of megabytes. The advantage of a large block codec is that any checksum packet can be used for the complete data transmission, which is not the case with small block codecs. Another advantage of the LDGM codec is that it is much faster to encode and decode data then with the Reed-Solomon codes. The disadvantage of the LDGM codec is that is is not MDS. This means that using the example above it needs to receive k + I * k (for inefficiency) packets to be able to compute the original data set of k packets. Research has shown that inefficiency I is on average 12.84% for a dataset between 2 and 10Megabytes. The median inefficiency is 11.20% and the measured inefficiency ranges from 10.1% to 27.0%. This means that the number of checksum packets that should be included in the transmission not only depends on the expected packet-loss but also should take into account an estimate on the inefficiency of the codec.

Version 0.1 of the PML software only supports the Reed-Solomon codec. The upcoming 0.2 version will support the LDGM codec as well.