Facebook’s focus on efficiency

August 2013 was the month in which Facebook, Ericsson, MediaTek, Nokia, Opera, Qualcomm and Samsung partnered up to launch the Internet.org initiative, a global effort to make affordable Internet access available to the next five billion people. Current global cost of delivering data is 100 times too expensive for this plan to be economically feasible, but Facebook believes that it is reasonable to expect the overall efficiency of delivering data to increase by 100 in the next 5 to 10 years.

Two key factors are in question for this strategy to be possible:

–       Bringing down the underlying costs of delivering data

–       Using less data by building more efficient apps

In April 2011 Facebook opened the datacenter in Pineville, Oregon and showed the world it can achieve an overall PUE of just 1.07. In 2013 the newest data center was brought online in Sweden, the measurements showing a PUE of 1.04, among the best in the world. It is cooled by 100 percent outdoor air and runs on 100 percent renewable hydroelectric power. The interior of the server hall, from the servers to the racks, is provisioned with 100% Open Compute Project designs.

Focusing of efficiency Facebook is constantly giving an example on how to achieve the best performance out of less energy, without gambling with user data. I will present below several concepts that the social networking giant already has in production or on in the research lab.

But firstly for those who aren’t familiar with the project here is a link where you can read about the Open Compute Project.

Outdoor Air Cooling: Traditional datacenters that use chillers or cooling towers are both ineffective and have a detrimental effect on the environment. Facebook uses an evaporative cooling system that brings in outside air and then lowers the temperature of that air by adding humidity. Facebook uses the outside air as a first stage of cooling, which is also known as outside air economization. Outside air enters the data center, gets filtered and directed down to the servers, and is then either re-circulated or exhausted back outside. When the outside air needs extra cooling, Facebook data centers employ either a direct ECH misting system or the use of wetted media. In Prineville data center, direct evaporative cooling operates 6% a year on average, and will be less than 3% for Luleå because of its cooler climate.

Cooling 2source: Internet.org

Power Management: Facebook rethought every part of their power management system to optimize for efficient power consumption, such as cutting out a stage of power transformers, using a higher voltage throughout the facility, and removing everything that doesn’t directly benefit efficiency. Luleå data center uses a new, patent-pending system for UPS that reduces electricity usage by up to 12%. And given the robustness of the utility grids in Luleå, they have been able to reduce the number of backup generators by 70%.

 Cooling

source: http://www.readwrite.com

Vanity-Free Server Design: The Open Compute Project design philosophy, Facebook’s so called “vanity-free”, eliminates anything from the designs that isn’t necessary to the function of that device. A good example of this philosophy in practice can be found in the removal of the plastic bezel from the front of the first web server designs. What was found was that not only that bezel is unnecessary from a materials standpoint—it was a non-functional piece of plastic on every server that would need to be commissioned and then recycled at end of life—but the bezel was also impeding air flow through the server, meaning the server fans needed to consume more energy to cool the device. Removing these bezels from the front of the servers reduced fan power consumption by 25 Watts per server, when compared with other web server models at the time.

Computesource: http://www.readwrite.com

In terms of Future Work, Facebook is already working on the following:

Network switch: Even if the computing is done via OCP design servers, the switching is still done with black box proprietary switches. OCP recently expanded its focus to include networking, collaborating on the development of an OS-agnostic switch that is designed for deployment at scale and will allow consumers to modify or replace the software that runs on them.

Cold Storage: Cold data storage is increasingly in demand as more people share more content that needs to be stored, like old photos that are no longer accessed regularly but still need to be available. However, there’s a lot of progress to be made in developing a system with high capacity at low cost.

The Open Compute specification for cold storage is designed as a bulk load fast archive. The typical use case is a series of sequential writes, but random reads.

Cold Storagesource: http://www.readwrite.com

Disaggregated Rack: Much of the hardware built and consumed as an industry is highly monolithic — the processors are inextricably linked to the motherboards, which are in turn linked to specific networking technology, and so on. This leads to inefficient system configurations that can’t keep up with evolving software and in turn waste energy and material. First steps toward this kind of rack disaggregation are:

  •  Silicon photonics: Intel is contributing designs for its upcoming silicon photonics technology, which will enable 100 Gbps interconnects, which is enough bandwidth to serve multiple processor generations. This technology’s low latency allows components that previously needed to be bound to the same motherboard to be spread out within a rack.
  • “Group Hug”Facebook is contributing a specification for a new common slot architecture for motherboards that can be used to produce boards that are completely vendor-neutral and will last through multiple processor generations.
  • New SOCs: AMD, Applied Micro, Calxeda, and Intel have all announced support for the Group Hug board, and Applied Micro and Intel have already built mechanical demos of their new designs

To conclude, I can’t be that naive to think that Facebook does all this from the kindness of their hart, this also serves their purpose to evolve and have a better infrastructure. On the other hand, open sourcing their designs the initiatives that they fight for come as a benefit to the whole community worldwide and that is something that deserves our admiration.

Big Data Trainings

Some time has passed since I started working on Big Data topics and although the Internet is a great source of information I often feel the need to get some formal training and even a certification. Big Data is still emerging and there aren’t too much choices if you are looking for some training especially in Europe.

However, I managed to find both vendor related and independent trainings on different levels on knowledge and specialization. Maybe the next person that searches the web for trainings will benefit from this and will not lose endless  hours in search for some good training.

The course specializations that  I found are: Administration, Developer, Data Science. Here are the trainings I found, and my personal opinion of them:

Hortonworks (100% Opensource Hadoop Distribution):

Developing Solutions for Data Analysts – US & UK locations, 4days training, aprox. 3000$

Hadoop Essentials – Only Live Online, 1day, aprox. 700$

Administering Apache Hadoop – US & Germany (only attend if you understand German), 3days,  aprox 2000$

Appling Data Science with Hadoop – Only US, 2 days, aprox. 2300$

Developing Hadoop Apps with Java – US, UK, Online, aprox. 3000$

Cloudera University (Sponsored by Cloudera, a proprietary Hadoop Distribution): Multiple US and Europe locations, ask in advance for the language of the course. The price also varies, even for the same course in different locations.

Developer Training – 4 days

Administrator Training – 4 days

Data Analyst Training – 3 days

HBase Training – 4 days

Hadoop Essentials – 1 day

Put the pieces together

MindShare  is a company that offers a training In Austin TX it’s completely vendor independent but I don’t have any feedback on the quality of their training. Price is 3000$ and the training takes 4 days.

New Circle Academy offers training for technical and non-technical attendees. The most interesting to me seemed Hadoop Ecosystem Training Course and Hadoop Online Admin+Develop. The course is classroom based and also online, so if getting to US is a problem for you can join the online classroom.

BigData Training.in offers some interesting courses from which BigData Hadoop Architect FastTrack seem particular interesting to me. The thing is that the training takes place in India and I’m not a big fan of that and also the price isn’t displayed anywhere.

Third eye classes are taking place in US: Ney York City or Irvine, CA, but the course only takes two days so if you are not from around these places it might be too much of a burden to travel the distance for just 2 days training. The price is 1350$ as early bird and 1400$ at door.

There are also a lot of providers for trainings that if you are working in the IT world you must of heard of, but their trainings are strictly related to the product they offer so they are not a very good match for me. However, these are some of them and their Big Data products: IBM – Big Insights; HP – Vertica, HAVEn; EMC – Greenplum, PivotalHD;  Oracle Exalytics, MapR, Intel etc.

If you have attended any of those course I’ve mentioned tell me your feedback, or if you discovered any other courses let’s complete together the list.

Multipath TCP

A lot of fuss going around the Apple lunch a few weeks back, a lot of innovation they say; the competitors are as always saying that Apple was not the first and they find devices that implemented something alike at some point. This is usually a never ending story from which myself as customer I only have to win from. Usually in the high-tech history so far, when a company didn’t have important competitors, the level of innovation, the money invested in research and development were very low. That’s why I hope that Apple and Android, Sony’s Play Station and Microsoft’s Xbox, and so on, will continue fighting for supremacy for a long, long time.

However, an important feature, from my point of view, of the new iOS7, a feature that wasn’t explicitly presented, is the Multipath TCP capabilities. If it’s the first time you hear about this, here is the RFC on the topic. Unfortunately, since I live 6000 miles from the US, it was hard to get my hands on an iPhone 5S, so the information I have are from other articles that I came across.

So what is the benefit? The benefits of this include improved network utilization, higher throughput, and greater resiliency by letting the network automatically and smoothly react to path failures. As the name states, a multipath TCP device is capable of transmitting data over multiple paths simultaneously. In plain English, it means that the device can concurrently use Mobile Data 3G and Wi-Fi to transmit data in the same time.

Multipath TCPSource: Olivier Bonaventure

The capabilities discovered in the iPhone are limited to just a few application, so don’t expect to be used by Safari for example; at the moment it is limited to the interaction with Apple Servers and the best example identified for that is Siri.

Siri is an intelligent personal assistant and knowledge navigator which works as an application for Apple iOS. The application uses a natural language user interface to answer questions, make recommendations, and perform actions by delegating requests to a set of Web services. Apple claims that the software adapts to the user’s individual preferences over time and personalizes results.

It is a mystery how Apple plans to implement the protocol beyond its own servers, but it is speculated that the company in looking for ways to make iCloud services more reliable and this could become an important step. There are also no guesses if Samsung or other competitor has in plan to implement the same protocol, but in the process of improving the quality of the services and making the access to information more affordable all around the world, this could be an important step.

I go back to the motto of this article and state once more that for the end user, the competition between the producers is the best thing that could happen.