The Kosmos distributed FS

Kosmos Files

Article from Issue 90/2008
Author(s):

Distributed filesystems effortlessly juggle enormous files in the gigabyte and terabyte ranges. The Kosmos filesystem plans to impress its competitors.

Modern computer programs handle increasingly large volumes of data. Whereas data-mining applications are content to sift through mountains of existing data, Internet search engines constantly horde new information. Users who access this data regularly encounter files of several gigabytes or more.

Legacy filesystems soon reach their limits with this kind of data and throughput. Consequently, organizations that manage huge volumes of data need an alternative solution for fast and safe access. Having redundant data storage is useful; after all, who wants to lose the valuable data gained by several days of number crunching because of a banal disk error?

Distributed filesystems fulfill these requirements. A distributed filesystem splits the data into manageable chunks and stores the chunks on a scalable cluster of computers. By virtualizing storage on the cluster, the filesystem then tricks applications into believing that they are talking to an enormous hard disk.

Into Space

The Kosmos filesystem (KFS) [1] is a promising new entry into this field. Kosmix Corporation developed KFS and released the source code under the Apache license. The first alpha version 0.1 appeared in September 2007. KFS's relative youth shows when setting up the filesystem: KFS requires 64-bit Linux. If possible, the Linux version and distribution should be identical on all the computers involved in data storage.

KFS is up against a number of renowned competitors, including Google filesystem (GFS), which Google uses as the underpinnings for its search engine, and Hadoop project's HDFS [2]. The KFS developers lifted much of the structure and functionality from Google, but they have removed a number of limitations. KFS – like GFS – is optimized for scenarios in which many large files are created once but read many times [3].

Job Descriptions

The Kosmos filesystem consists of three components:

  • one or multiple chunk servers that store the data on their own hard disks,
  • a metaserver that keeps an eye on the chunk servers, and
  • an application that quickly gets rid of a single large file.

KFS thus works much like a database that resides between a computer program and the traditional filesystem (see Figure 1).

Figure 1: The Kosmos filesystem resides between the existing hardware and the application, just like a legacy database. A client library handles access to the virtual filesystem.

Chunkwise

KFS first splits a file into handy 64MB blocks. The filesystem distributes these chunks evenly over all attached servers, aptly referred to as block or chunk servers. The servers store the blocks on normal filesystems that belong to the host operating systems.

If the chunk servers start to run out of storage capacity, the administrator can simply add a new computer to the cluster. KFS automatically adapts the new storage node, which keeps the whole system scalable and helps it keep pace with increasing storage demands.

KFS mitigates hardware errors by storing the blocks from every single file redundantly on multiple chunk servers; typically, three instances of each file placed in storage exist.

This safety net allows administrators to deploy standard PCs as cheap, but reliable, data repositories. Google FS proves that this works day after day. If a disk or server fails, you just replace it with a new one. KFS detects the replacement and automatically integrates the newcomer into the cluster.

As another preventive measure against data loss, each block has both a version number and a checksum. KFS evaluates the checksum on each read operation. In case of irregularity, the distributed filesystem deletes the defective chunk and replaces it immediately with an intact copy (re-replication).

Version numbers help to identify obsolete chunks: If a poor Internet connection temporarily separates one server from the cluster, it can identify obsolete chunks quickly when the connection is reestablished and retrieve the more recent variant from the other servers in the cluster.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • RAID Performance

    You can improve performance up to 20% by using the right parameters when you configure the filesystems on your RAID devices.

  • Progress by Installments

    Desktop applications, websites, and even command-line tools routinely display progress bars to keep impatient users patient during time-consuming actions. Mike Schilli shows several programming approaches for handwritten tools.

  • Partition Backup

    A partition backup offers several advantages over legacy, file-based backup alternatives, and using a backup server adds even more convenience. We’ll show you some free tools for partition backup over the network.

  • Ready to Rumble

    A Go program writes a downloaded ISO file to a bootable USB stick. To prevent it from accidentally overwriting the hard disk, Mike Schilli provides it with a user interface and security checks.

  • Ask Klaus!
comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News