System Design: How to design Twitter? Interview question at Amazon, Google, Microsoft, Apple


hello friends welcome to reach goals today we are going to talk about system
design of Twitter let’s jump into the topic so these are the things we are
going to discuss today we are going to talk what Twitter is key things to know
before we go into the system design what you should not answer in the interview
and twittered system design in detail and some of the operational challenges
faced during the system design or implementation of the architecture right let’s understand what Twitter is Twitter
is a microblogging site to post to interact with the messages let’s say you
login to the search system you can add your friends as a followers now if you
say if you post a message it goes across all your fault followers within few
seconds that’s what what Twitter is right and currently if you see there are
close to 300 million monthly active users into Twitter and it is the largest
source of breaking news and it’s popular with celebrities politicians and
newsmakers so before going to the Twitter design we have to know certain
key things they are fan-out social graph reduce cache load balancers MySQL
database and solar form what is fan-out so in computer industry you can say like
if you want a pass information from one machine to multiple machines that is
called as a final so it’s basically you have data in one machine you spread
across multiple machines in parallel right without any interruption right so
that is called fan-out what is social graph social graph is a data structure
which is predominantly used in all the social medias like LinkedIn or Facebook
or Google search what not right so what it does is it creates the relation
between different nodes for example if so if you log into the system you see
you have multiple friends and you can say certain items like you know certain
articles you might be like in train so you can have this information stored in
multiple nodes and you can have a relation so you can query the social
graph to understand like you know if you want to understand who what are the
articles are likely to the like or what are the articles you liked
it right so you can get all this information with the help of social
graph data structure right and what is the released cash reduce caches it’s a
key value pair where you store information in key and you have multiple
data or multiple value points and the as a value now you give a key and query
that you get list of values for example in this case if I want to if I want to
get into the home screen and if I want to see the home time series right so you
log into the system you system queries for your user ID what are the messages
you are treated right so you can get those informations or you can store
those kind of informations in the reduced cache right so what is load
balancer so we talked about load bila balancer in one of the earlier earlier
video I would suggest you or recommend you to go into that and have a look to
have a more information on that I think most of the guys are familiar with MySQL
database it’s nothing but a relational database store the informations right
solar form so basically solar form is used in Twitter or any applications to
have a search or to create an index out of search and you can get you can query
the index and get the search information right so these are the some of the
software’s and key terminologies used in designing of the Twitter okay this slide
talks about what you should not answer an interview right so if I ask a
question like how to design its Twitter most of the candidates say like you know
they have a database where all the informations like TV ID tweet messages
everything is stored in the database and when we want certain information we
quaint query by we can query the database by giving a select statement
right so this is not going to work if you want to build a scalable system so
keep in mind that when you when you’re asked a question like you know how do
you descend a design a robust system you have to think beyond the databases you
have to think beyond what you have been doing in the high school graduate or in
the undergraduate thing right now let’s jump into the certain use cases within
the Twitter system design C in this we are going to talk primarily about four
different use cases one is home serious home screen time series second is user
screen time then talk about search and we will talk
about follow right and if you need more use cases please comment or don’t forget
to subscribe to see the future informations right ok what you see on
the screen is an architectural diagram or you can call it a system design
diagram for tweeter right so if you look at at this system you know here we have
the computers or you know the phones which are accessing the load balancers
through the CDN and you have multiple services like you know create tweet me
tweet view user Street and search the tweet to follow users and you can add
more services when based on your scalability or the new requirements then
you have a create account right so we can have all the services built in all
the services set up in docker and you can deploy in multiple parts right and
you can configure like a auto scaling or you can pre scale and keep it based on
your necessity and need so with respect to the data I have talked about the
primary databases which is user table which has the information about the user
and its details account etc we use this table when the
user creates a narcotics a user creates an account we store that information in
the user table and the second service I want to talk about is follow users right
so follow user is very simple it say all the information which is stored in the
social graph as we talked about social graph a social graph is nothing but it
really it creates the relation between you and your objects or your entities
which you are related to right for example you are following certain
information or you are following a certain article that relation is
maintained over here and that’s what we have in the social graph data structure
like right let’s say if I if I query if I use the service and if I query and
asking a question like you know hey give me the list of users who am I’m
following then you go into the social graph and query that and get those
information and display in the UA right and the other one is search tweet right
so we may never recreate a tweet the tweet informations are stored at the
tweet table now there is a relation between the tea-table and the solar form
the tweet table has all the text which you have which you are treated or which
your followers has tweeted the social for and there is a solar form
which turns in the back end it gets all that peak messages related ID and it it
starts to index all those information and keeps in the solar form right now
when we go when you go and use the service like search tweet it goes and
queries in the solar form and gets the all information and shows up in the you
way in between there is a search filter right so let’s say if you want to remove
certain violation key violation words or if you want to remove certain
information from the search or from the search results you can use the search
filter and you can apply the filter on the text and you can remove all those
information and show up in the show up in the UA that’s where that’s where we
have a search filter right and this is this services view user tweet right so
let’s say if somebody has tweeted or or you have logged into the system and you
are tweeted all your information is stored in the tweet table right so now
if I want to query that information I can give you a Twitter ID and based on
that I can get all your tweet IDs right and those informations all the Twitter
informations can can be gathered from here and we can bring as a JSON object
and show up in that UI that is you use it to eat right I I want to discuss in
detail about these two services which is create tweet and do home right so what
is view home this service is basically a time series which is on the home screen
as soon as you log into the screen you see the tweets which are which has been
tweeted by your followers right so how do we how do you how do we do this right
so if you look if you look clearly here what happens is when user creates a to
beat the message goes over here and it is find out right how it is find out so
it goes into the reduce reduce form and it figures out figures out what are the
what are your who are your fans and based on that it sends all your tweet
messages and puts in here it puts as a value for each tweet ID
right so how is this how is it done so in
three three different activities are happening here so as soon as you do you
as soon as you created to beat it goes into social graph and it figures out who
are your followers right let’s say you are a user yay and you have a followers
like B C and D right now you have B C and D as a key in the reduced form in
the B C and D it goes and puts the tweet message which are saying right so if you
see in this data structure of the reduce form you will see something like B and
what are the tweet messages it has to be shown for the B when he is logging it
right if C is logging in it will show up a water the cute messages twitch has to
be shown for she see when you slug me that’s where this this service comes
into picture when you say when you go interview home it has a lookup service
what lookup service does is it goes and figure outs for these for this user
which reduce form or which tildes cache is having your tweet ID or tweet
information let’s say it figured out the release one is having the to a tweet
information in that it goes and queries and say hey I am I am user B who has
logged into the system let’s figure out who are what are the tutor information
for B so it creates a JSON structure and it and it it takes it to the UI and
where we can display it in that you a screen so that’s what create wheat and
you home you home services are used right and if you have any questions
please make sure you comment so that you know I can answer in your comment
sections and also I can add more use cases into that if you are if you are
really interested let’s talk about operational challenges right
the first thing is top followers like silly would have noticed like you know
the celebrities and presidents as hundreds and hundreds of millions of
followers right so now if they have these many followers and if they tweet
and and those messages has to go across hundred million followers right so it is
not possible in the real time for example if you if they have it to be and
it is very difficult to find out find out to go and see 100 million followers
and add their tweets into each each of their each of the values in the reduced
cache right it’s very difficult it is going to take a lot of time so it leads
to an inconsistency right it’s going to take a lot of time means when some user
comes into the UI they may not be seeing their what what cute message is for that
particular user right so how do we how do we navigate this scenario so what
what wheter actually is doing is when when whenever they when somebody wants
to view the home screen right they come into the release cache they get all the
they get all the tweet messages of their followers and during the runtime what
they do what they do is they go and get the tweet messages of tea celebrities or
they go and tweet messages of the user was high number of followers and the
embed or they merge that and should they show up in the UI right so in that way
they can reduce the latency time and also make sure there’s a make sure that
you know the data is coming to the US see you guys seamlessly the third thing
second thing I want to talk about is reduced cash how to rebuild that right say let’s say you have different reduce
reduce caches let’s say one of the releases one of the release is down or
it got crashed so all the information which is stored in the reduce its
crashed or you lose that information now we have to rebuild that information so
rebuilding is nothing but you again you do the same process I mean repeatedly go
for the specific user right figure out who are the followers for them and go
and you get all the tweet information from the tree table and reconstruct the
reduce reduce cache or a reduced can added whether it is form so that is
called rebuilding of the reduced cache right
the third operational issue which we see is search filters see nowadays if you
see lot of lot of illegal things has been added like you do shows in certain
countries are certain informations like you know you should not show up certain
messages right so they would say you know some of the messages should not be
shown in the certain countries so what should you do in this in this case it’s
a dynamic turn it see keep changes based on the current laws existing in the
country right so we in order to make sure that you get a right thing onto the
UI we have to keep on updating the search filter let’s say if you don’t
have this filter whatever you have in the solar form and if you query that
based on your search is going to some of a way to come up to the UI and show up
right so if you have a search filter and and if you keep maintaining that in such
a way that having certain informations which has to be removed from remote and
it has to be displayed of the UI we can we can use the search filter right so
the if search filter is not appropriate you know it shows up into the UI which
is not required and it leads to a legal issue side so that is the way we need to
have a search filter yeah that’s it I have about the twitter twitter system
design if you have any questions you can comment and make sure to subscribe and
like and share so that you can get the future informations or future videos
very quickly thank you




Leave a Reply

Your email address will not be published. Required fields are marked *