Big Data = big money

The term Big Data describes the processing and analysis of massive data sets, in areas ranging from consumer/retail, to health information, to government/nonprofit. Maintaining Big Data will require massive petabyte storage solutions.

A recent story by Venture Beat describes some recent Big Data startups which have attracted funding:

“Here are the top five:

  • Palantir, which most recently was reported to be raising $100 million with a $9 billion valuation.
  • MongoDB, which announced a $150 million round in October.
  • Mu Sigma, which raised $108 million two years ago and now has customers including Microsoft and other big companies for its analytics tools.
  • Cloudera, which took on a $65 million round last December and is presumably leading the pack of Hadoop distribution vendors.
  • Predictive analytics provider Opera Solutions, which grabbed $84 million in 2011.”

Petabyte storage solutions: Internet Archive

An interesting post from the Internet Archive blog highlighted the large-scale petabyte storage solutions they use:

“Did you wonder where the Internet Archive stores millions of books, movies, recordings, and 150 billion web pages? Not in some conceptual cloud, but on our custom-designed Petabox servers, that’s where. This week, we’re installing another petabyte of storage; that’s a thousand terabytes or a million gigabytes.

Each Petabox is comprised of ten racks; each rack holds thirty-eight three-terabyte hard drives, two of which are used for the operating systems with the remainder used for data.”


Petabox, source:

Case Study: Facebook

A new article from Bloomberg Business Week highlights some of Facebook’s hardware capabilities . Facebook, as one of the more heavily trafficked websites, has an extremely powerful machine set up. From the article:

As of its latest count, Facebook claimed 955 million users. What makes Facebook so unique in the annals of Web history is the amount of time all those people spend on the site (at least a few hours per month) and the frequency with which they return to it (just about every day). To cope with the update-hungry masses, Facebook has pushed the limits of computer science in a few areas. In particular, the company is good at sucking in, analyzing, and sharing huge volumes of data at record speed so that users get new, up-to-date pages every time they visit Facebook.

In order to deal with this massive data, Facebook has had to invest in some serious performance hardware :

Facebook’s largest cluster—or collection of data-center computers—can handle more than 100 petabytes of information. One petabyte is the equivalent of about 250 billion pages of text. Facebook has a homegrown system called Hive that it uses to collect and keep track of all its data. Every 30 minutes, the Hive system combs through 105 terabytes of data. More than 500 terabytes are sucked into the database each day.


How to store petabytes of Data

As global data is projected to explode exponentially in the next ten years, the capacity to store it has not yet been realized. A recent Forbes article discusses new scientific breakthroughs that are part of the search for large-scale storage solutions. Some of these breakthroughs are in surprising fields as scientists find that microbacteria and even inanimate objects are providing models that may be used in the future to create large scale petabyte hard drives. From the Forbes article:

Students at the Chinese University of Hong Kong figured out how to store encrypted data in the DNA of E. coli bacteria. Such “biostorage” could be used for text, music, video—you name it. A single gram of the bacteria could hold as much as 450 conventional 2-terabyte hard drives. And since the bacteria keep replicating, the data could last for thousands of years.

Science is searching through inorganic material as well:

Researchers at Harvard, the Max Planck Institute of Quantum Optics and Caltech were able to store a quantum state in a diamond crystal for more than second, at room temperature. Doesn’t sound like much, but in quantum physics, that’s a lifetime, and a big step toward building a quantum computer.

As some of these come to light, and possibly come to fruition, we may see more of this “out of the box” thinking in coming years.



A petabyte case study: Avatar

The 2009-2010 blockbuster Avatar, which won acclaim for its special effects as well as a meteoric run at the box office, was and still is the largest movie ever rendered, in terms of computing power. It is interesting to analyze the processing power necessary for such an undertaking. According to

“The entire rendering of Avatar reportedly requires over 1 Petabyte of storage space according to BBC’s Clickbits, which is the equivalent of 500 harddrives of 2TB each. That’s equal to a 32 year long MP3 file(!).

The movie was shot and rendered in Miramar, New Zealand, (population 8,334) in one of the most powerful purpose-built data centers in the world (same place they created King Kong and Lord of the Rings). According to Information Management, the computing core – 34 racks, each with four chassis of 32 machines each – adds up to some 40,000 processors and 104 terabytes of RAM. The blades read and write against 3 petabytes of fast fiber channel disk network area storage from BluArc and NetApp.”

As petabyte processing becomes more and more prominent, will we see more movies like Avatar? Will the rendering process become quicker and quicker, so that complicated CGI effects such as those seen in Avatar can be created in hours and days, not weeks and months? Only time will tell, but it appears to be safe bet that given Avatar’s success both with film critics and box office customers, that we’ve only just begun.

A world of Petabytes

Petabyte Hard DriveA hard drive able to hold a petabyte of data, a longtime exercise in imagination for science fiction writers and computer enthusiasts, has actually been reality for some time. Though still largely unavailable for consumers on the mass market, and currently only in use by a select few corporations, petabyte hard drives are rapidly coming down in price and experts predict that in only a few short years, they will be standard fare for computer users.

What can I do with a petabyte?

Many of us will immediately think of recreational uses for petabytes of storage – storing massive amount of movies and music, rendering graphics in seconds, and running realistic video games. However, in addition to leisurely pursuits, a petabyte of processing power and storage will allow for an amazing amount of new scientific breakthroughs. For example, as a petabyte hard drive approaches the theoretical limit of human memory, computer applications should be able to shed more light on neurological functions. Being able to replicate a human mind will help in the fields of artificial intelligence as well as helping to perfect the long-term project of mapping the brain.

Another example involves large scale text mining and natural language processing. With the sum total of human knowledge estimated to be a few thousand petabytes (and as a reference point the entire collection of the Library of Congress is estimated to be only three petabytes), many interesting textual analyses could be quickly undertaken. For example, one could instantly compare every best-seller throughout the history of literature to see where themes and influences overlap.

Finally, scientific endeavors which rely on cloud computing, or sharing processing power across many computers, should be able to work faster and accomplish more. For example, SETI (the Search for Extra-Terrestrial Intelligence) has famously used the help of untold computer users running their search programs during downtime in their quest to document any evidence of alien life. The availability of computers with petabytes of memory and processing power will help have an astounding effect on their endeavor.

When will we see Exabyte hard drives?

It could be a long time, experts predict. Though the technology is theoretically capable of producing such a storage device, it would need a gargantuan processing speed, take up an enormous amount of space, and consume a dangerous amount of environmental resources. For these reasons, the concept is largely theoretical and it will be some time before Exabyte and yottabyte hard drives are available, though according to Moore’s Law, such technological marvels should make an appearance in the average computer user’s lifetime.