Ive broken this guide into bite-sized chunks by topic and so I recommend you bookmark it. Ive found spaced learning and repetition to be incredibly valuable tools to learn and retain information. And Ive designed this guide to be chunked down into pieces that are easy to do spaced repetition with.
Distributed computing Interview Questions and Answers 2019 Part-1 | Distributed computing
TCP – Transmission Control Protocol
TCP is a utility built on top of IP. As you may know from reading my posts, I firmly believe you need to understand why something was invented in order to truly understand what it does.
TCP was created to solve a problem with IP. Data over IP is typically sent in multiple packets because each packet is fairly small (2^16 bytes). Multiple packets can result in (A) lost or dropped packets and (B) disordered packets, thus corrupting the transmitted data. TCP solves both of these by guaranteeing transmission of packets in an ordered way.
Being built on top of IP, the packet has a header called the TCP header in addition to the IP header. This TCP header contains information about the ordering of packets, and the number of packets and so on. This ensures that the data is reliably received at the other end. It is generally referred to as TCP/IP because it is built on top of IP.
TCP needs to establish a connection between source and destination before it transmits the packets, and it does this via a “handshake”. This connection itself is established using packets where the source informs the destination that it wants to open a connection, and the destination says OK, and then a connection is opened.
This, in effect, is what happens when a server “listens” at a port – just before it starts to listen there is a handshake, and then the connection is opened (listening starts). Similarly, one sends the other a message that it is about to close the connection, and that ends the connection.
A popular solution – consistent hashing
Unfortunately this is the part where I feel word descriptions will not be enough. Consistent hashing is best understood visually. But the purpose of this post so far is to give you an intuition around the problem, what it is, why it arises, and what the shortcomings in a basic solution might be. Keep that firmly in mind.
The key problem with naive hashing, as we discussed, is that when (A) a server fails, traffic still gets routed to it, and (B) you add a new server, the allocations can get substantially changed, thus losing the benefits of previous caches.
There are two very important things to keep in mind when digging into consistent hashing:
Please keep these in mind as you watch the below recommended video that explains consistent hashing, as otherwise its benefits may not be obvious.
I strongly recommend this video as it embeds these principles without burdening you with too much detail.A brief intro to consistent hashing by Hannah Barton
If youre having a little trouble really understanding why this strategy is important in load balancing, I suggest you take a break, then return to the load balancing section and then re-read this again. Its not uncommon for all this to feel very abstract unless youve directly encountered the problem in your work!