Analysis On Improving Throughput Part 1: Disk IO

by , January 17, 2012

This post is the first of a series and continues in Part 2: Memory

In light of the release of FileCatalyst Direct v3.0, I thought I'd write a few articles about the road to achieving 10Gbps speeds. It seems to me the best place to start is with the endpoint of the transfer: the storage media. Why? Before we even think about the file transfer protocol, we have to be sure that our disks can keep up. It's very easy for people to forget how much of a bottleneck disk read & write IO can be.

Disk IO Tests

10Gbps (1250MB/s ) is fast. Especially for a single file stream. By comparison:

  • Your average enterprise HDD SATA drive will read/write between 50-100MB/s (or 800Mbps)
  • Your average SSD SATA drive will read data at 300MB/s, and write at 200MB/s (1.6 Gbps-2.4 Gbps)
  • FiberChannel SAN arrays normally connect between 500MB/s to 1000MB/s (4Gbps or 8Gbps)

Getting FileCatalyst to run 10Gbps speeds (1250 MB/s) requires testing your hardware infrastructure. Having a fiber SAN may not be enough.

For our internal test environment, this meant adding a few RAID 0 arrays (8 SSD drives or 16 HDD drives) in order to achieve the desired speeds. Certainly not mentioning this as a recommendation for a production solution, but it was the simplest way to achieve the speeds we needed for testing.

To determine what your system is capable of providing, FileCatalyst includes tools within FileCatalyst Server, HotFolder, and Command Line Client. The tools are command-line scripts designed to emulate how the FileCatalyst software manages Disk IO, and can give you a good approximation of your system's potential before a single byte has transferred over the network.

Running Enhanced Disk IO Tests

ReadIO and Write IO tests have a similar syntax to run them. ReadIO is executed by the executing the application switch "-testReadIO", while the write io uses the switch "-testIO". Since most issues are found with write speeds (which tend to be slower than read), let's focus on write IO.

The Write IO test creates a file on your file system, and reports back how quickly the file was created by the application. How it creates that file is determined by the parameters you must fill in before it runs:

Things to note:

  • It is better to create a file large enough that it represents your total data set expected for the transfer (ie: 9GB DVD ISO), or larger than the amount of memory the OS can utilize for file system buffers (see results below).
  • Timeout should represent a valid length of time you expect can easily be reached when writing a single copy of the file.
  • Specify # of runs with the average posted.
  • Buffer size represents an optional switch on the Server & Clients which dictates how large each read/write IO should be from the JAVA code down to the file system. You should experiment with a few values here, as different disk configurations sometimes yield vastly different optimal results.
  • The Server and Clients support multiple read and write threads per file.
  • Keep the # of files to create concurrently to 1 if you are testing only a single file tranfer speed (single client to server endpoint). This is actually one of the hardest test cases to manage, as there are OS level locks which often task the CPU, limiting throughput you can get when writing to a single file. If you are looking to test 10 clients each utilizing 1Gbps connection, select multiple files (much higher file IO possible when multiple files are being saved at a time).
  • Always select Direct IO, since this is what the FileCatalyst application uses.
  • Select default "rw" mode, which takes advantage of OS level memory buffers if available.

Results

Machine 1a: Windows 7, single SSD drive Machine 1b: Windows 7, single HDD drive
Machine 2a: Ubuntu RAID 0, 8 x SSD, 10GB file Machine 2b: Ubuntu RAID 0, 8 x SSD, 60GB file

Observations: Note that neither of these machines benefitted from multiple writer threads, and that performance was higher when a single writer was involved.

Machine 1a: When writing to SSD, we can get 230+ MB/s (>1.8Gbps) of write speed when using 1 thread. Block sizes do not affect throughput.

Machine 1b: Same machine, but utilizing slower secondary HDD drive. When using slower disks, the software can only get a fraction of the bandwidth (in this case < 1Gbps). We do see marginal improvements the larger the block size used, so limiting the block size is not a good idea. By default, the FileCatalyst application will use the largest block size it can (determined by UDP block size).

Machine 2a: Can read/write at 2000+MB/s (>16Gbps) for 10GB files. We can also see a sweet spot of ~256KB write block, where smaller writes adversely affect performance (as to larger blocks). This system however has 48GB of RAM on it, so the numbers it provides me are actually above what I would expect the system to give me.

Machine 2b: Same test, but with 60GB file. Now we have realistic numbers which match the disk IO, giving us a system capable of sustaining 1350MB/s (10.5Gbps) write speed.

Configuration Values

On the Server & Client side, the following configuration options are therefore available to set (CLI arguments shown):

  • numBlockWriters [#]
  • writeBufferSizeKB [# KB]
  • writeFileMode [rw/rwd/rws]
  • numBlockReaders [#]
  • readBufferSizeKB [# KB]

This are machine specific settings. To maximize performance, you need to run tests on both endpoints (client + server). On the server (if client connections are going to do both upload and downloads), you should run both read and write tests.

For both the FileCatalyst Server and HotFolder, these settings are configuration file changes that must be manually set (fcconf.conf for server, fchf.conf on HotFolder). For the CLI, these may be passed in as run-time arguments.

Conclusion

Knowing the limits of your system IO is the first required step in achieving high speed transfers. FileCatalyst v3.0 provides several tools to help both benchmarks those limits and tune the application to best take advantage of your system.

Want to learn more? Continue to the follow-up article, Part 2: Memory

Tags: , ,

No Comments Yet

Be the first to respond!

Sorry, comments for this entry are closed at this time.