Posted by on April 17, 2012

Let’s review a few networking terms and concept.

IP Address
Imagine you are in a computer laboratory, or an Internet cafe.   Each computer on the laboratory, or the internet cafe is part of a Network or a group of Computers,   Each computer that wants to be part of the network will be given a unique identifier called an IP address.    This address will be used when communicating with a specific computer within the network.   These addresse are numeric addresses and in the format of  W.X.Y.Z where W, X, Y, Z are any numbers from 0-255.

An IP address can either be public or private.   When a private IP address is given to a machine in the network, it means it can only communicate with the machines within the network.  When a machine is given a public IP address, it means that the whole world can communicate with the network.

When a machine wants to communicate with another machine on the network,  they do so by specifying the IP Address and use of ports.  Think of ports as a private line when communicating with a machine.  A port can have values ranging from 0 – 65535.   Every time somebody in the network wants to communicate with a certain machine,  that certain machine will use an unused port for their communication and some of the computer’s processing power.  Not all ports from 0-65535 are usually some of these ports are reserved by your operating system for later use and some are use by the applications.

Scenario 1: Suppose for the purpose of discussion that all ports from 0 – 65535 are usable by the machine.  This would mean  at any given time only 65535 connections are allowed per IPaddress.  This is your limitation, suppose you want to exceed your limitations,  what will you do?

Solution 1:  The logical answer would be to add another IP address for the machine,  But doing so will also eat up some of the  processing power of your machine.
Solution 2: Add another machine, a clone of the first one. OK, sounds good.  How will these fare?  To clone a machine it will take time, and there will be possible data loss during cloning and deployment.

Solution 3: How about let’s clone them first then deploy?

Machine 1 ------------ network
Machine 2 ------------ network

machines are not in sync.

Solution 4: How about  this solution?
Machines are cloned first then ran under a load balancer.

Machine 1 \ -----------  Load
Machine 2 / -----------  Balancer

Load Balancer
Load Balancing is a method to distribute workload across multiple machines to optimize resource utilization, response time and avoid overload.

Pretty good huh?  since each machine has its own IP address,  Its the load balancer’s job to check parameters and determine which machine can be used to handle the task best.  This way ports can be freed up from the first machine and processing required is distributed as well.
1) Suppose we want our configuration to be at this state always not because its maximizes the use of resources, but it also gives us the maximum output.
2) Suppose a sudden influx of communication request need to be handled by the Load Balancer and it ran out of ports,  Its scenario 1 all over again.

Enter NAS
A NAS (Network-Attached Storage) is a data stprage connected to the network.  It is a special kind of Computer where its’ sole purpose is to store and save files.  A NAS is a convenient method of sharing files among multiple computers.

Network   ------- Machine1  -------------\
Attached  ------- Machine2  ------------- >  Load Balancer
Storage   ------- Machine3  -------------/

This is a better  configuration.  All files that are shared so that adding a new machine will ensure that the files it will have are current.   We can now scale out to more machines with out fear of synchronizing data.

How about if one of the machines fail.  No problem since we are on a load balanced configuration, the load balancer will handle things for us.
We just achieved a fault tolerant, scaling out, redundant system.

GlusterFS is a distributed NAS solution.  It can scale out to grow the amount of storage needed.

Storage Pool 1  ---Gluster---  Gluster  ------\
Storage Pool 2  ---Gluster---    FS     -------\  M4   ---   Load
Storage Pool 3  ---Gluster---    SAN    -------/  M3   ---   Balancer
Storage Pool 4  ---Gluster---  Solution ------/

Features and Benefits of GlusterFS includes

  • Real time self-healing (availability)
  • Volume failover (availability)
  • Automatic load balancing (performance)
  • Stripe files across storage blocks (performance)
  • Scales to hundreds of petabytes (scalability)
  • Data mirroring & replication (availability)

By using glusterFS we can achieve in creating a highly available, highly scaleable, and high performing servers.

Leave a Reply

Your email address will not be published. Required fields are marked *