Ibuildings blog

afbeelding van patrickvandevelde

Distributed Systems Tutorial

For the morning of tutorial day, I chose to attend Think like an ant, distribute the workload, given by Helgi Þormar Þorbjörnsson. Helgi is a former Ibuildings colleague and now a bigshot at Orchestra.io. I'm happy to see he's doing well. His presentation started off explaining to us why distributing can be a good thing by pointing out three significant aspects: budget, efficiency and perception.

Budget-wise, for a distributed application there is no need to invest in a big, expensive and hard to maintain server that runs the entire application by itself. A company can save a significant amount of money investing in a collection of smaller or virtual servers or even use “the cloud”. They are often easier to maintain and combined they can outperform a big server. This brings us to the efficiency aspect. The efficiency gain argument is strengthened by explaining the security-related principle that 1,000 people can exit a room quicker through 10 small exits than through 1 big exit. The perception gain lies in the ability to let other machines do resource-consuming processes so the originating machine can keep the user informed, or even let the user do other tasks while the deferred processes are running. An appropriate quote from Helgi, “Make our fish look like a shark”, was illustrated by a photo of a fish in a bowl with a shark fin strapped to its back. Indeed, nature already seems to have solved many of our problems for us. Take a look at an ant colony. They are well organized and very efficient in solving problems they might encounter. They benefit from their strength in numbers, their ability to work together and the fact they have specialised types of ants for different tasks. Translated to a distributed application, the application as a whole represents the colony and the components represent ants, doing their specific task to keep the colony running. Distributing your application involves the following characteristics: decoupling, elasticity, high availability and concurrency. Decoupling means splitting your application into functional pieces, like a database, a frontend, a cache, etc. Elasticity is that if the usage of your application is growing beyond initial expectations, more distributed components can be added to increase the capacity of your application. In order to make such dynamic changes, a certain level of monitoring is required. The initial design of your distributed application should account for this from the start, because it is very hard to implement an expansion strategy when your application is just about to reach its limits. Some monitoring tools are commercial products, some are open source. Bringing the monitoring and the elastic behaviour of your application together creates a beautiful solution: an application that knows when to add/release extra components (servers) to keep things running smoothly - and actually does so! There were several pitfalls and proposed solutions mentioned in the session:

  • local sessions; solvable by storing sessions in a db or memcache
  • local memory; solvable by using networked memcache
  • local files, uploads, writing to /tmp; solvable by storing on S3 or networked filesystem, serving static files from CDNs.

Internal APIs between the components of your application creates a level of abstraction which makes it possible to switch components (servers) without the other components realising it. Next, Helgi illustrated a number of tools by telling several real-life stories about using these tools to implement distributed solutions for specific problems. Tools like Gearman, CloudSplit, syslogd, internal APIs, CouchDB, supervisord, Map/Reduce technique, Hadoop, ZeroMQ and others are all key items in the toolbox for creating scalable systems. Each story was an insightful look into real-world problems and distributed solutions. Especially the combination of a distributed application, extensive monitoring and the ability to automatically adapt to certain events was an appealing concept to me. I wish I could recite all his anecdotes right here, but remembering all details from a 3-hour tutorial is nearly impossible. If you have the opportunity to see Helgi speak at a conference or even talk to him in person, don't hesitate to do so!