Open Source Fast File Transfers

by , June 18, 2012

There exist a number of open source projects trying to tackle accelerated file transfer via UDP. Some solutions are more mature than others and also use different technologies to solve the same problem of large data transfer over WAN. This article should provide the reader enough information to compare the different solutions and gauge if an open source project could be used instead of purchasing a commercial solution like FileCatalyst.

Some commercial managed file transfer solutions claiming to have UDP acceleration have integrated one of those open source projects into their core file transfer technology. These solutions will inherit the strengths but also the weaknesses of the open source project. FileCatalyst has developed its own UDP based protocol written from the ground up, and does not include any code from open source UDP technology.

We will review 4 open source projects: three which use UDP and one which only optimizes TCP parameters to provide fast file transfer.

Common problems with all 4 solutions is lack of graphical user interface. Two provide bare bones sender and receiver APIs (meaning that the end user has to compile the classes), while the third one only comes with a command line interface (CLI). Another common problem is poor support for firewall traversal. While this is not an issue for internal transfers, most organizations are interested in sending files over the WAN (which will almost certainly have at least one firewall somewhere on the route). None of these solution fare well in the worst network conditions, where packet loss, bandwidth or latency are very high. Finally the congestion control in the UDP projects is missing the flexibility to adapt to ever changing network conditions during the data transfer.

Below is a quick reference table comparing the four products. Following the table are reviews of each of the products tested.

UDT Tsunami UFTP GridFTP
Multi-threaded no no no yes
Protocol Overhead 10% 20% ~10% 6-8% (same as TCP)
Encryption no no yes yes
C++ source code yes yes yes yes
Java source code partial no no no
Command Line no no yes yes
Binaries no (source code only) no (source code only) yes (CLI only) yes (CLI only)
UDP based point-to-point yes yes yes no
Firewall Friendly Partial (no auto-detection) no Partial (no auto-detection) no
GUI client no no no no
Server with secure user accounts no no no yes
Congestion control yes (udp blast mode preferred) yes (limited) yes (congestion control file has to be specified before the transfer starts) yes - using TCP
Automatic retry and resume no no no (manual resume yes) yes
Jumbo Packets yes no yes (up-to 8800 bytes) yes
IPv6 support yes no no yes
support for any packet loss no no no yes
support for low bandwidth high packet loss (ie. satellite) no no no no
optimized for medium bandwidth (<155Mbps) high latency yes yes yes yes
optimized for high bandwidth (500Mbps or more) high latency no no no no
memory footprint medium medium medium high (grows with each concurrent stream)

 

1. UDT UDP-based Data Transfer

Link: http://udt.sourceforge.net/

Functionality Issues:

  • No installer and no binaries are available, both client and server have to be built from source
  • This is only a bare bones source code implementation of the sender and receiver, all the functionality around user authentication, reporting, monitoring and file management have to be implemented by the programmer

This project could only be used if 2 back office servers are sending files with no firewalls in between and without any user interaction.

Core:

  • no multi-threading, meaning that only a single CPU core can do the work of receiving, processing, decrypting, decompressing, and writing to disk, this may also limit the number of concurrent connections that can be serviced at once
  • poor performance on high packet loss, low bandwidth links, default configuration is very sensitive to packet loss. In fact a single dropped packet could force a failed transfer
  • inflexible congestion control, adapts poorly to quickly changing network metrics, CUDPBlast is the workaround, but doesn't actually provide much congestion control.
  • High CPU / memory usage of very fast links 300 Mbps or higher
  • C++ library is relatively mature, while the Java port is still in its infancy with many reported bugs
  • no graphical client interface for point-to-point transfers
  • limited support for firewall traversal, no auto detection of UDP is possible
  • no built-in automatic retry/resume (although it could be built by the programmer)

2. Tsunami UDP Protocol

Link: http://tsunami-udp.sourceforge.net/

This open source project has not been developed in 2 years. (unchanged since May 2010)

Functionality:

  • requires to be built from Source (no binaries)
  • This is only a source code implementation of the sender and receiver; all the functionality around user authentication, reporting, monitoring and file management must be implemented by the programmer.

Core:

  • Only C++ source code
  • 20% protocol overhead, example: (100 Mbps link will only be able to send at 80 Mbps)
  • no jumbo packet support
  • no multi-threading, meaning that only a single CPU core does the work of receiving, processing, decrypting, decompressing, and writing to disk. This may also limit the number of concurrent connections that can be serviced at once
  • Not optimized for very high bandwidth 100 Mbps or more
  • Not optimized for low bandwidth high pocket loss (ie. satellite)
  • no graphical client interface for point-to-point transfers
  • no support for firewall traversal
  • no resume and retry (although it could be built by the programmer)

3. UFTP

Link: http://www.tcnj.edu/~bush/uftp.html

UFTP is a UDP-based file transfer protocol and the name of a tool that implements that protocol. (UFTP) is designed for particularly efficient file transfers under scenarios where the file is to be broadcast/multicast or the transfer occurs over a wireless link (such as satellite). However, in low-error, high-bandwidth or high-latency scenarios, it can outperform TCP-based protocols such as FTP by 100% or more.source: Wikipedia

The UFTP protocol was based on the Starburst MFTP protocol (source: Wikipedia)

Functionality:

  • Comes with command line tools only
  • No firewall auto-detection, meaning that UDP is always forced, there is no fall back to TCP/HTTP
  • Congestion Control can only be enabled ahead of the transfer via pre-populated config file
  • no user account management on the server

Core:

  • Protocol designed predominantly for multicast, point-to-point file transfer is not the core of the technology
  • Poor performance in high packet loss environment (satellite or wireless)
  • no multi-threading, meaning that only a single CPU core can do the work of receiving, processing, decrypting, decompressing, and writing to disk, this may also limit the number of concurrent connections that can be serviced at once
  • not optimized for high bandwidth (500 Mbps or more)
  • no graphical client interface for point-to-point transfers

4. GridFTP

Link: http://www.globus.org/toolkit/data/gridftp/

GridFTP is an implementation for use with Grid computing.

The underlying TCP connection in FTP has numerous settings such as window size and buffer size. GridFTP allows automatic (or manual) negotiation of these settings to provide optimal transfer speeds and reliability (settings are likely to need to be different for best performance with large files and for large groups of files).Source: Wikipedia

Although GridFTP is not UDP based, it can be used to solve the problem of poor TCP performance with FTP.

Functionality:

  • Complicated install of the framework to allow multiple streams; doesn't directly address point-to-point file transfers
  • No firewall traversal

Core:

  • Grid FTP requires a much larger framework called Globus, which is steered under the organisation of the Global Grid Forum.
  • For optimized transfers, multiple nodes or TCP streams must be used
  • Optimized transfer of a single large file wit a single stream between 2 nodes is not possible
  • Command line client interface only (no GUI)
  • must know TCP buffer size and block size ahead of time before the transfer begins: tcp-bs and -tcp-buffer-size
  • The server and client must be part of a much larger network of Globus nodes
  • Not optimized for very high bandwidth 500 Mbps or more
  • Not optimized for low bandwidth high packet loss (ie. satellite)

Conclusions

So... Which of these solutions is the most viable?

UDT seems to be pulling ahead for now. But none of these projects are a viable replacement for the enterprise because they are lacking the functionality and the ease of use of commercial applications. GridFTP could be used if the organization plans to use Globus and develop a file transfer workflow based on the CLI. A commercial solution such as FileCatalyst addresses each of the weak points, including flexible congestion control, firewall friendliness, GUI client apps and automatic resume/retry which provides a real cost savings and efficiency boost when compared to piecing together a custom solution using a bare bones API.

Tags:

1 Comments

  • On September 13, 2012, John Tkaczewski said :

    I forgot to mention that UDT fails if any pocket loss is present on the line

Sorry, comments for this entry are closed at this time.