Skip to content

OCFS2 vs. NFS: Benchmarks and the reality

by daimon on März 11th, 2011

Virtual infrastructure needs a shared storage for the virtual machine images, if you want to do live migration and minimize storage needs. Of course, this shared storage must support concurrent file access and deliver the utmost performance.

So, I set out to look for the ideal solution for my employer’s virtual infrastructure, which was decided to be based on Debian Squeeze, KVM and OpenNebula in an earlier step.

The number of candidates seem to be overwhelming at first, here’s a short list of the best or what to look into for your own research.

Distributed Filesystems (File-oriented): NFSv4, AFS
Clustered Filesystems (Block-oriented): GFS2, OCFS2
Distributed Parallel Filesystems: PVFS2, Lustre

The big overview is to be found here (Wikipedia).

In our case, the Filesystem had to serve a small infrastructure (3-5 Virtualization Nodes), so I ruled out the rather complex group of Distributed Parallel Filesystems.
Online research also brought me to pass AFS. AFS makes heavy use of caching on the client side, copying each file to the client first and copying it back to the server when requested by another client. This seems to be rather counterproductive concerning live migration.
Also, the internet tought me that GFS is inferior to OCFS2. In retrospect, I am sorry that I didn’t look into it myself.

So – I found myself confronted with the choice between two very different beasts: NFS and OCFS2(+iSCSI).

First, you have to consider that setting up a clustered filesystem with iSCSi or FibreChannel is much more tedious and error-prone than simply mounting a remote NFS export. Also, there has to be a regular heartbeat to check the nodes, a journal for every node accessing the filesystem and a fencing algorithm which drops out unreachable nodes and replays their journal to assure filesystem consistency.

Concerning Backups: You can only access the clustered filesystem remotely – in other words, there’s no access from within the storage node itself. Also, it’s very difficult (while possible) to LVM snapshot the filesystem and mount the snapshot concurrently to the productive filesystem. I did get it to work, but using that backup strategy, my nodes started to fall out of the cluster approx. once a week (ugly!!) with no apparent reason. So you’re doomed to do your backups inside the virtual machines, which is far too much work for me ;)

But in theory, block-based access should be much more efficient than file-based access. Let’s see how that reproduces in benchmarking tests:

 


guest ocfs2 guest nfs
host ocfs2 host nfs
Initial write 42724.05 81850.65 3705.01 58040.79
Rewrite 943784.23 85101.03 4761.68 75402.4
Read 3462423.25 694803.1 2647475.31 112451.1
Re-read 3784108.94 685130.02 2709897.81 2400733.31
Reverse Read 3075087.41 649971.82 2748257.66 3013057.24
Stride read 2947290.06 597918.41 2715029.06 2798984.5
Random read 2698422.62 587306.41 2694619.41 2977062.91
Mixed workload 2038916.34 513450.92 2281787.2 2471154.78
Random write 7099.09 5698.52 3704.64 11511.51
Pwrite 620963.48 361500.38 3973.19 86925.73
Pread 2657647.47 628858.3 2566150.03 112739.96

These numbers were gained by issuing “iozone -R -l 5 -u 5 -r 4k -s 100m -F /f1 /f2 /f3 /f4 /f5″.

I did the test once for each filesystem, once on the host and once inside the virtual machine. At the time of running the tests, there was just one node using the shared filesystem.

The stats show clearly: Inside the virtual machine, the I/O throughput using OCFS2/iSCSI is approximately 5 times as high than using NFS while reading. The Write statistics are a bit inconsistent, but also show that the performance is at least twice as good than using NFS.

Wow, that’s some margin that makes you forget about the disadvantages mentioned earlier! So boldly I started to setup the OCFS2 storage backend and 4 Cluster Nodes.

That was about 5 Months ago. This Beast has been productive the last 4 months, covering the whole development compartment in this company. Most of the time, it run’s fine, but the increasing number of dropouts were really getting on my wick. It didn’t help a bit to increase the timeouts in the o2cb configuration. Other avenues also proved to be dead ends.
But when my boss started to preach “that wouldn’t have happened with Windows”, I’ve really had it. OCFS hat to disappear.

So I installed another storage node with NFS and migrated the machines one after another to the new storage. This one works like a charm now, totally easy to snapshot and backup (it’s just an ordinary ext4 FS) from within the storage node. No dropouts, no inconsistencies whatsoever.

Also, quite surprisingly, the virtual machines seem to act quite responsive. Guess what?

The load average on the KVM nodes dropped from about 2.5 average to 0.5.
Storage Backend too seems to be quite bored: Load Average has dropped from about 3.0 to just below 1.
Backup is done in about 3 hours, using OCFS and local mounting it took nearly 30 hours!

Note to self: never trust benchmarks you didn’t forge yourself even if you forged them yourself.

6 Comments
  1. art permalink

    thanks for the experience. looks like nfs is the way to go.

  2. Very good post! I’m also going to create a blog post relating to this… thanks

  3. mculibrk permalink

    Very nice article…
    I’m in the middle of building a very similar setup using ocfs2+fc although not as the main virtual storage but just for some specific clustered nodes.

    I would like to discuss about the problems you noticed/experienced if you have the time. If so, please, contact me by mail.

    Thanks!

    regards

  4. daimon permalink

    Hi Mauricio,

    thanks for your comment!

    Of course we can converse about the specific problems with an ocfs2 setup. This will certainly be of mutual benefit.
    I just sent you an email with some details and further questions.

    Stefan

  5. gene permalink

    well one cannt put the CPU use down to (solely) OCFS2 – that is a problem caused by iscsi.. unless you actually implemented iscsi over HBAs. but yes there is some overhead in OCFS2, but also this is true for NFS – where you havent made any comments on, is on the managment of the filesystem security issues..

    All these things are actually well managed using either technology, but indeed my preference/recommendation would be NFS.. (high concurrent writes into same directory and files, is where clustering wins over NFS. but I’d choose another solution over OCFS as was hinted in this article too.

  6. Richard permalink

    Your online research of AFS is wrong. An AFS client read is cached, so subsequent reads make no network traffic or server load. But the file is written to the server when the application closes the file, not “when requested by another client”. Multi-user, multi-client use of AFS is almost an order of magnitude less network traffic and server load than NFS, for most usage scenarios.

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS