I swear Kate Upton and Megan Fox wrote I was handsome and sexy, you guys just didn’t use two phase commit for your document snapshots and version vectors so you never received those writes on your local copy (since your version vector was more up to date than the document snapshot)!

Hey guys and welcome back to the channel uh today has been a good day for some of my viewers at least two of them have reached out to me saying they got meta offers and one of them even called me handsome which is how you know he was a

Dude so let’s go ahead and get into it we’ve got another video for you all it is a Friday I’m going to go out after this so I’m excited today we’re going to be talking about Google Docs okay so like I mentioned today we are going to

Be talking about Google Docs now I assume this is a product that most of you are familiar with but in the event that you are not let me show off an example use case so we’ve got some sort of text document you know we’ve got one line over here that’s already been

Written Jordan is dot dot dot and then you’ve got three people over here all actively editing it at the same time and you know if you’re editing this document or you have it open you can actually see other people making changes as they make them so as you can see we’ve got Kate

Upton writing that Jordan is handsome Megan Fox writing that he’s sexy Tech lead writing that he’s dumb perhaps because Jordan called him out in one of his previous videos over all his crypto scamming uh but that is another topic for another time so let’s talk about some problem requirements uh I kind of

Made three of these even though this could probably be summarized in like three words just to fill up a slide’s worth of data but the point is that more or less if I write you should be able to see it in real time if you write I

Should be able to see it in real time and basically if the document has a ton of users on it at once all of those rights should be propagated to all of the other readers in real time hopefully that makes sense cool and of course another important thing is that you know

We don’t want people to have documents that are out of sync if I make a right I don’t want to be able to see my right and someone else can see my right but then a third user cannot it’s important that they all eventually converge to the

Same state even if it’s not entirely instant okay so let’s go through some capacity estimates because these are going to be pretty important for what we ultimately end up discussing so the first thing is that uh I’m going to keep this one abstract um you know I don’t

Need to go into any specific numbers because the whole point of this problem in particular or the reason I want to talk about it is because I want to basically be able to design it for arbitrary scale so with that being said let’s imagine we have like billions of

Users billions of different documents you can have you know thousands of users editing a document at once and attempting to read it which of course is going to be a lot of load each document can have hundreds of pages and uh the sizes can be in the kilobytes but

Frankly if you have hundreds of pages you could even be looking at documents in the sizes of megabytes which would be pretty bad so of course if we want to be able to be processing these rights as fast as possible possible both the number of writers right I said you know

Literally thousands so that’s going to be really tough to actually transmit those changes from writer to writer or the writer to reader and also the document sizes if there are lots of kilobytes can both be problematic okay so I’m basically going to go over this problem by starting out

With a really terrible solution and then we’ll gradually make them better and better as we break down what is wrong with each solution so the first one is going to be very naive where basically we’ve got client one over here and client 2 over here they’re both connected to some server via

Websockets uh the reason we want websockets is so that you know if I make a change in real time the server can actually push it to the other client and the general idea is you know I make a change right so I send my version of the

Document over to the server I grab a lock in order to do so so that this guy can’t send his document over at the same time because otherwise you know who wins we have no clue so I send over my document uh it goes over to S3 or

Wherever the server sends it to some sort of persistent data store and then basically it says okay let me look at all the websockets that I have for this document send the document back to them so it’s now in step two going to send my document over to the other client and

The reason it needs to send it first before this client can actually go ahead and grab the lock again is so that this client can incorporate my edits into their document so you know they’re going to do some sort of merging and then in step three they’re

Going to grab that server lock and then they’re going to in turn make the update and then that is going to be sent to S3 so as a as a result of this lock and as a result of sending out my edits to everyone else uh and as a result of the

Websockets we are actually able to achieve real-time editing here however the issue is it’s just going to be really freaking slow why is that well like I mentioned the document is large and so you don’t really want to be sending it over the network every single

Time you make a write when all you did was type let’s say one or two characters additionally we are using a lock and when there are locks you can have contention especially when multiple different people are attempting to grab that lock at the same time and so

Ideally we would like to be able to avoid both of these things it’s not so easy but we’ll talk about that later so let’s make this thing a little bit better this is what I’d like to call a bad solution so in our bad solution we

Still have a lock but now instead of doing anything like that what we can do is basically say Hey you know in order to make a right grab the lock first and then insert you know let’s say character a at position three so what’s going to

Happen is that the server is going to lock no one else can write it’s going to send that character to our database you know maybe it’s more practical to use a database here instead of S3 because we’re doing one character at a time and S3 stores the whole document but uh the

Point is we send that character there we then go ahead and say okay we now have to send that right to all of our clients so again we’re doing this all over websocket and then now finally this guy incorporates this right and then they can send a new right which is going to

Be valid over to the server and grab the lock once again the process repeats so this is going to be a lot better because now we don’t have to send full documents however in order to avoid some sort of right conflict or merge conflict or whatever you really want to call it uh

We need to again use a lock right so that my changes in Step One are integrated into this guy’s document before he makes his right in step three and so again there’s going to be lots of contention on the lock because in reality there’s not just going to be two

Users or well most of the time there will just be two users or even one user but in the really bad Edge case which is what we want mostly want to focus on here because it it makes it an interesting problem uh you can have thousands of users writing at the same

Time and you don’t want their rights to take you know 5 10 seconds due to trying to grab a lock over and over again so basically the idea again is that uh the issue with the lock is we have to process all of the previous rights in

Order to make our own and to process all of those previous rights we need to make sure that there’s basically some concurrency control via the lock so what would be a better solution so this one is going to be pretty interesting because it allows us to actually avoid

Using the lock so the question is you know if I write and you write at the same time right we’re both doing something like hey insert character o at position one that’s what client one’s doing and then client two is going to do insert character G at position five now

If you think about it right if I insert character o at position one I go from coding and now all of a sudden I have cing kudin I don’t know why I made up this example it’s bad but uh yeah and so so if this guy originally wanted to insert

Character G at position five right because we had 0 1 2 3 4 5 it would have been coding now after he does that what’s going to happen is 0 1 2 3 4 5 now all of a sudden if this guy’s right gets processed second we have

Cigan whatever that is and so as you can see uh the order in which we process the rights actually affects where we want to insert those character characters and this is kind of the premise behind something known as operational transform right so operational transform is saying

Hey if I see this right and then I see this other right I have to actually transform this guy so that now we’re no longer inserting that character G at position five but we’re going to do it at position six now you might say to

Yourself wow there must be a ton of edge cases here and to be honest there probably are uh but fortunately there are Engineers with no Lives who have actually managed to tackle those for us so operational transform is something that has actually been utilized by something like Google docs for many many

Years in real life so for example you know let’s uh let’s try and demonstrate it via a diagram we have some sort of centralized server right and the idea here is that if I’m sending over right number one and this guy is sending over right number two but on his client it’s

Saying hey I’ve I’ve not seen right number one yet you know I’ve only I’ve only seen up to write zero so then uh the server is able to say oh you know what if this guy hasn’t seen right one he needs to incorporate right one and that means we need to transform right

Number two you know as if right number one had come first and that is going to give us two Prime and so what’s now going to happen is that we’ll say you know we’re going to send one back to this guy and two prime back to this guy

And one back to this guy and two prime back to this guy and then you know we can have some sort of client side optimization such that this guy sees his right until he receives the actual one from the server and you know we don’t

Need to talk about that too much but the point is we do have to transform rights depending on the order that they come in and each client is going to have to keep state of how many rights it’s actually seen so that we know whether or not to

Transform that right on the server right so hopefully that makes sense however even this while it is a completely feasible solution that has worked for many years on the Google Docs team uh it does have some issues so for starters we basically need a server that has to see

All of the rights so what does that mean this is kind of the same concept as something like single leader replication right where as long as all of the rights go to that same single leader everything is in good shape however as a result of that this single server can become a

Bottleneck and so you know if we now have this guy over here and this guy over here and this guy over here and they’re all submitting rights over to our server our server has to order them all which means it’s going to have to perform some amount of locking right

It’s uh it’s not going to have to force everyone to wait for an incoming right to come in but you know it is going to be bounded by both uh its its like literal networking capacity and also just using something like a blocking que and so as a result uh This Server might

Not be able to hold up properly if we genuinely have thousands of people writing to it at the same time so now this is where we’re going to start getting real complicated right because I made this video once already it was complicated enough and then I decided to

Make it even more complicated so let’s try and walk through it and we’ll do the best that we can so if you recall I love talking about crdts on this channel the reason for that is because Martin kman loves talking about crdts and I love Martin

Kman uh that’s the guy who wrote ddia for uh for reference but generally the idea is that um as opposed to having to send every single right through a server it would be great if we could send data from place to place in a manner such that you know not every single server

Needs to actually see every single right obviously every client has to see every single right but if we can you know have three servers instead and you randomly send a right to one of those three servers and then one of those three sends them to all the clients uh that

Would mean that we could then support a lot more rights at the same time so that is the idea behind using a crdt here so we want to use something called a text editing crdt but before I go into that let me remind everyone what a crdt is so

A crdt is basically saying hey I’ve got some data living on all these different servers and even though right now all of those servers may seem uh different states when they eventually communicate with one another they’ll all converge to the same state right so I’ve got a node

One node 2 node 3 I’ve got state state Prime State double Prime and when these guys all eventually end up you know sending their data to one another they’ll all converge to State triple prime or something like that so hopefully that makes sense from a very abstract point of view but we’ll

Start talking about that a lot more concretely in the coming slides cool so a high level overview of what we want to be doing with crdts is like I mentioned it is going to allow us to have more right servers so you know imagine we’ve got these two guys writing

At the same time they don’t actually both have to go to this server a over here in fact one of them can go to server B one of them can go to server a we can then basically just multicast or you know use websockets or anything like

That the point is we can send the right out to every other one of our clients and as long as they both get as long as all of those clients get the right it doesn’t actually matter what order they come in we can just merge them together

Using our crdt merge function and then we’ll all eventually converge to the same state now again I’ve said there’s some sort of merge function obviously I’m going to have to Define what that merge function is for a text document but the idea is you know if this guy is

Saying O at position 2 and this guy is saying a at position one there is some sort of merge function that it actually won’t matter what order these rights come to you in and you can all basically end up seeing the same document regardless of their order

Cool so now in order to start defining what our crdt is going to look like for these text documents I’m going to cover the concept of a state-based crdt versus an operational crdt so a state crdt means we’re actually sending the entire state of what it is that we’re trying to

Maintain and in the case of Google docs we’re trying to maintain a text document so what that would mean was you know if I’m a client and I make a write I take my updated text document and I send it you know to either the server or to

Every other client it doesn’t really matter it just has to eventually get to every single other client so I would send that full text document and then we would have a merge function that takes the two versions of text documents right so if I have my own local version and I

Get an incoming version I have to have some sort of merge function you can almost think of this like uh the git algorithm to literally merge two files in code and the idea there is that that merge function has to be commutative associative and item potent

So what that means is you know if I receive your file or you receive my file we have the same result and if I receive your file and then I receive your file again I get the same exact result as if I had just received it once that means

It’s item potent so the one issue here is uh again like we can do this right if we use something like a git merge function with our text documents we would be able to resolve those changes and I could just send my file to you and

Then you would merge it in and I could send my file to you again and you would merge it in and it would be the same however this is going to be tough because the document is big if we have an 100 page document I don’t want to

Have to send that to every single other reader of the document that’s going to be really rough uh and it’s going to make for a bad day so what could we actually do instead well we’ve already talked about this quite a bit this video we’re better off just sending the

Operations that we do on the state so in this case the operation is going to be insert X at position y now again we still need to Define ourselves a merge function that is going to be commutative right so it doesn’t matter the order that you receive these operations in and

Also associative which is similar uh you can go look up the math function because I’m blanking on it at the moment anyways the point is with these operational crdts one problem with them is that they aren’t item potent right if I receive the operation insert X at position Y and

Then I receive that operation again how do I know if it was the same operation as before or someone’s actually just inserting the same character at the same position again it’s not exactly easy to tell so we have to do a little bit of extra work to actually make this thing

Item potent so if that I’m a client and I receive this insert xit position y I don’t apply it for a second time so we’re going to need some additional metadata with each one of these operations and then additionally the question is how can we actually make these character insertions commutative

So the problem with that is if you call all the way back to our operational transform thing you know it actually depends what order these guys are processed in so if I said insert G at position 5 and then I said insert uh o at position one then what would actually

Happen is I would go from codin to coding to cing and that’s actually what we want so that’s a different result than if I process them in a different order so keep in mind that order does actually matter for these insertions how can we make it so that order doesn’t

Matter and that is where our merge function is going to come in so actually we’re going to change the way that we Define our document a little bit so the first thing is that rather than uh giving every single character in our document an index from you know like an

Integer index so 0 1 2 3 4 5 actually what we’ll do instead is we’ll take the first character in our document assign it to position zero the last character in our document assign it to position one and then from there we take every single other character and basically

Evenly distribute them out along that zero to one timeline so you know for example if we have four characters we would have c d n g d is going to be at 33 n is going to be at 66 so now what’s going to happen is now someone else

Comes along they want to insert a character so if I want to insert o between character C and between character D I actually take these two numbers and then I just divide them by two right so that’s actually going to put me at let’s see1

165 and so this is where I’m going to be inserting position o so I would say o at65 and that’s what I’m going to write over the network insert o at165 cool and then the same thing over here if I wanted to insert I between d

And n I would take 33. 66 split them in two and that’s that’s going to obviously be 0.5 so now all of a sudden I have i at position .5 now uh one thing to note is that you can actually add a small random Jitter so maybe instead of doing 0.5 I do

05012 or something like that because that way if someone else wants to write a character you know coding like that their random Jitter puts them at 0524 and that way it’s actually deterministic whether e is going to be in front of I or not right so if I say

Write at 0 5024 now I know that every single client is going to deterministically know that e is in front of I if I had them both at 0. five then it’s possible that some clients would have e after I it’s possible that some clients would have E before I but

If you think about this what’s actually happening here is when we do this bsection if you think about it it doesn’t actually matter the order in which the characters are inserted right because regardless of whether this guy goes first or this guy goes first o is

Still going to be at uh position 16 and I is still going to be at position .5 and so this is the basis behind how we would Model A text document in order to make a crdt of it however this is not entirely easy right there’s still going

To be some edge cases here so let’s imagine one that’s called interleaving so this is one that Martin kman talks a little bit about on YouTube but the idea is let’s imagine I want to write the word cat right so assuming there’s nothing in the text field so far we just

Have some start and end character at zero and one the characters for cat would be at 0.25 50 and .75 because we’re Distributing them evenly and if someone else is writing bird we would have the characters .20 4060 and 80 and keep in mind that basically the idea is

We want to organize these characters over the number line so that would mean you know we’ve got B at0 20 c at0 25 I at 0 4 and on and on and on missing this pattern and so what’s basically happened is because we’ve tried to insert two words that are slightly different

Lengths at the same place in our document the characters actually get interleafed what we would want is to just have you know cat bird or something like that or bird cat now fortunately uh there are a lot of researchers who have devoted a good amount of time to working through all of

These edge cases and ultimately Tech crdts have now gotten in a pretty good spot such that you can effectively you know follow some sort of function that when you type of character uh there’s some client library on your machine that converts it to the appropriate operation

For the text crdt and then as soon as they’ve converted that to the appropriate operation you can send that in any indiscriminate order to all other clients and they will be able to reconstruct the state of the document that looks exactly the same as yours which is really really

Cool now the last thing that I wanted to talk about is going to be item potent because I mentioned that uh you know assuming that I send you an operation insert X at position y if you receive it multiple times are they unique rights or

Are they the same rights so how can we actually ensure that they are different and this is going to spiral into a whole deep discussion uh so let’s see if we can go ahead and follow it okay so again I receive a right as a client have I

Seen it before have I not how do I know well for starters one approach that we can take is that each write could have an Associated uu ID with it and then I locally just keep some sort of set of uu IDs that I’ve seen and assuming that I

Have not seen it before I apply it and assuming that I have seen it before I don’t and that’s going to work uh that’s totally fine but now for every single character typed or every single write done I have to store an additional uu ID now uu IDs aren’t that small you know

That could be another uh 16 characters or possibly even more uh per right so it could actually be significant overhead another possibility would actually be to use something like version vectors so version vectors I’ve discussed pretty extensiv ly on this channel more so in the concepts videos because they aren’t

Too relevant for most problems but the idea of a version Vector is that you know if I see um a right with a smaller version Vector than uh the version Vector that I’m currently holding on my client I know that I’ve already processed that right if the right has a

Bigger version Vector than the one that I have stored uh for my local document then I have not processed it so we can talk about this in a second how does it actually look in practice let’s go ahead and make things concrete okay so I am going to now Define some

Weird slight sort of kind of version Vector uh just to reduce a little bit of the space overhead when sending these rights over the network but uh I think it does still work so let’s imagine we’ve got two clients number one number two and two servers A and B and actually

We’ve got a third client over here so let’s imagine that this guy is going to make a right to server a the right is going to be basically given uh the sequence number A1 because it’s the first write that’s been processed by server a we also have this guy writing

To server a and since it gets there after A1 it is going to be labeled A2 now keep in mind that uh in our crdt system we can have rights being processed by multiple different servers that’s the whole appeal it’s going to allow us to ensure that we don’t have

Any bottlenecks when making rights so that means this guy over here can now make a right to server B and because it’s the first one processed on server B that is going to have the index B1 or the sequence number B1 okay so like I mentioned we’re just

Going to assign every single write an ID based on the server that processed it and how many rights that server has processed so A1 A6 A10 B1 B20 doesn’t matter and the reason we’re doing this is because for example a typical version Vector would look something like this so

A1 would look like 1 comma 0 because there’s one right to the first server zero rights to B uh A2 would look like you know 2 comma 0 but we don’t really need all the information of other servers right because the point is this version Vector actually gets bigger as

There are more number of servers that can process our rights and we really only care about actually what number right this is on one particular server so it’s going to save us a little bit of space to represent rights using this kind of um server sequence number it’s

Almost like a Lamport clock really so the idea is we’re going to save space compared to the typical version Vector which is a vector whereas this is more of a Lamport clock but uh eventually we are still going to have to use actual version vectors so I’ll talk about that

In a second but the problem with version vectors is they have a length equal to the number of servers so if we had to actually send that over the network with every single right you know let’s say we had 20 different servers processing rights for this given document that

Could get pretty hefty cool so now how are we actually going to use this on a client in order to make sure that we haven’t process to given right before so let’s say this client is me and the first thing that’s going to happen is I

See WR A1 I process it I haven’t seen any rights from server a before so I’m good to go then I see WR A2 now all of a sudden I know that um you know I’ve seen two wres from server a and then finally if I see A1 again I will know that

That’s a duplicate because right now I’m actually going to keep a local version Vector which looks something like this 2 comma 0 meaning I’ve seen two rights from server a I’ve seen zero rights from server B so if something comes in like A1 I can look at the A and X over here

Say you know what I must have already seen this already and now I’m good to go cool however we are going to have to continue this a little bit so the issue is that we’re not actually guaranteeing the order in which rights are getting to the client so even though they’re all

Being processed by a uh you know the first time around I’m not actually saying how this right is going from a to say this guy over here it could go through b it could go over here it could go go over multicast it could go over UDP it doesn’t really matter the point

Is I’m trying to make a system that is completely arbitrary in terms of the number of servers that accept document rights and also how those servers communicate with one another now if I made some sort of guarantee where every single time a server received a right it would just synchronously go and transmit

That right to every other client over TCP this type of thing couldn’t happen but if I don’t what could happen is that you know I could write A2 over here and then what could actually happen now is this guy sees A2 before it sees A1 and that would be a problem because you

Know now I would increment my version Vector to 2 comma 0 and if I were to see A1 I would think it was a duplicate even though it isn’t so how can we avoid actually receiving and processing those rights out of order right so this is what we’re going to be

Talking about over here so let’s imagine I am over here this is me I’m maintaining some sort of document and in order to get the state of this document I’ve seen six rights that originated from server a I’ve seen nine rights that have originated from server B and I’ve seen four rights that

Have originated from server C now let’s imagine just randomly I happen to get a notification that hey right C6 has just come in well the issue now is wait a second I can’t process C6 I haven’t seen C5 and I need it so what can we actually

Do in order to get it well that’s the question but again the idea is um you know it is possible depending on my network topology of document writing servers that I could receive a right like C6 before I receive a right like C5 there are ways to avoid this happening

By actually just using guaranteed ordered messaging through TCP and having every single right server instantly send that message out to every single client uh but again I want to make this as flexible as possible because I don’t want these right servers to become a bottle NE I want to make rights as

Quickly and seamlessly as possible so again you know we have choices to do TCP versus UDP uh we have choices on you know which right server is going to you know write to another right server which is then going to forward those messages out to clients these are all things that

Have trade-offs and uh honestly we could probably discuss them for an hour you know you could do like a a round topology where you pass the messages from one server to the next to the next to the next and then to clients there are many different things that we can do

Here and the point is I want to come up with a solution that is robust regardless of what order the messages are delivered to the client in cool so let’s actually go ahead and keep talking about that so like I mentioned you can receive messages out of order and when

We do what we can do is actually go ahead and when we first make a right we can persist it to a database right so we want to be sure that uh none of our rights are actually getting lost the reason being that that ultimately if I

Need to uh get a right because I haven’t seen C5 over here then I want to be able to actually just request it from some server so we’ll talk about that in a second but the idea is you know here’s me I’m writing I write to server C apologies

For the ambulance I write to server C and right before server C actually sends this right out to the clients it is going to put this right C5 over in a database which I’ll just call the rights database and then after that it can actually send to everyone else as

Necessary so we now know that C5 is guaranteed to be in a database before a single client even sees it and so now what’s going to happen is let’s say I’m this guy over here and I’ve seen C4 and all of a sudden for whatever reason I

See message C8 I can’t process this thing yet because I’m missing C5 C6 and C7 and I know that if I’m seeing C8 it means that this server over here has already put it in a database and gotten an acknowledgement back because of the fact that I’m waiting on that

Acknowledgement to send this right to everyone else so if I see a right I know it’s already in the database and now if I really need to because C5 C6 and C7 haven’t come in yet I can just request them from the database so hopefully that makes sense ideally you should just be

Receiving most of these rights from the server but occasionally one might get dropped and you might not receive it and then you can just pull it from the database cool so another small thing to note is uh you know if this database is actually indexed by both the server and

The number of the right it could make your reads a little bit faster so if I have C1 C2 C3 and they’re all ordered and sorted that is going to make fetching C5 C6 and C7 a lot faster than having to do a linear scan through that

Table and just looking for all of them individually cool so the reason I also wanted to talk about that was because it is relevant for new clients right so it’s great when you’re connected to all of the right servers the whole time and you can just hear wrs as they come in

However if I’m a new person and I haven’t been writing to the document and now I need to load a snapshot of the document so that I can start getting the active changes to it I have to be able to fetch it from somewhere cool so how

Is this actually going to work well the first question is can we read it from our rights database right can I from that rights database that we discussed in the last slide recreate our document and the answer is yes right because I know that all of the rights from server

C are going to be in here there might be a server a database just like that and I can read those there might be a server B database and so what would have to happen is my client would have to read from here here and here in order to get

All of those rights assuming there are three different right servers and the problem with that is that even though it works it is going to be slow because as We Know when we’re trying to make a quick write having to do some sort of aggregation over multiple different uh

Distributed nodes is always going to be pretty tough so the idea is uh that in addition to that uh you know besides having to do like an aggregation we would also have to actually sort these rights by character because right now they’re sorted by the order that they

Came in but you know this could have like character .56 this could have character. 21 this could be99 and I’ll make that more Concrete in a second but ideally instead what we want to do in order to simplify this problem a little bit is use derived data because

Reading from those writs tables is probably just going to be too slow when we’re first starting up so what would be a little bit better well let’s say I’m a writer I want to write to the server so I make a write saying insert X at y okay the rights table is

Now going to say hey this was right C5 insert ERT X at Y and then from there what we can use is change data capture to take this right and actually transform it into a better format for us to then read it down the line so now

Rather than saying hey C5 insert X at Y what we can do is put it into this document snapshots table right which we’re going to be deriving via change data capture so it stays consistent and then from there once we actually do that now as opposed to

Having C1 C2 C3 like that order we would actually have you know character Zero character1 character Point 2 and so it’s going to be a lot faster to read that database we don’t have to resort the characters now I know that’s all a bit abstract which is why I decided to write

It out in the next uh slide over here so now we’re going to actually visualize those two database schemas so we have the wrs table which is what we synchronously write to every single time that we want to make a new right to our document and then we have the document

Snapshot table which is is going to be derived from the rights table via change data capture CDC so let’s imagine that we’ve written the word Jordan right so let’s say R is the first character that’s getting getting written that gets put at 0. five then you know we put in O

At 0.25 and at 75 oh you know J gets put between uh the starting character and O so it’s at 0.125 then we’ve got a at 625 and D at 56125 now again if I wanted to create the whole document I would have to take all of this and I would have to

Sort it by this column right here which is again going to be o of n log n to the number of characters in our document and if you have a 100 page document that is not trivial additionally if our document is partitioned then it’s really not

Trivial because now we have to go to a bunch of different partitions and it’s probably partitioned like this right where each server is writing to the same database and so now we have to fetch these guys then we have to fetch these guys and then we have to actually sort

Them and let’s say we only wanted to fetch you know out of an 100 page document we wanted to fetch the first page of our document now I have to actually fetch all of these guys do a full sort on this column and then only

Take the first page of it which is maybe these two values right or not even those two values it would be uh the J and the O so I’d have to do all this sorting uh of a huge document only to then uh you know have to only take a small chunk of

It and do that over multiple different noes on the other hand if I wanted to do this in our document snapshot table it’s going to be a lot easier why is that well as you can see all of our characters are actually ordered by their

Indexs as you can see it’s clearly a lot easier to just read Jordan here as opposed to having to have it all jumbled up on the writer’s table side and so now let’s imagine you know instead of having just six characters here I had a th000

Characters and I only wanted to load the first third of them so I want the first page well what I could do is partition my document actually by the the range of the characters right so the first third of them are going to go on one partition the second third are on another

Partition the last third are on another partition hey now I only want the first third of my document I just go read this partition and that’s super easy to do as opposed to the other solution where I have to load everything and then sort it so hopefully that makes a little bit of

Sense for the logic behind why we want a document snapshot table so what does this show General process look like for new clients that are starting up well the first thing that’s going to happen is we’re going to subscribe to changes right because we want to make sure that when we subscribe

Uh we aren’t going to be missing anything so now what might happen is you know we might get something like a C5 right before we’ve even actually fetched the document from our document snapshot database but because we don’t have um you know the the precursor rights you

Know we aren’t having C4 yet uh we have to go fetch that so now we’re fetching that from the database and what’s going to happen is we now get a snapshot of the database with our version Vector associated with that snapshot which is hey we’ve seen 10 rights from uh server

A 12 rights from server B three rights from server C and now remember I just received C5 so I’m saying shoot I need C4 where can I go to get it I can go to the rights database give me C4 C4 comes back now all of a sudden I’m completely

Caught up and all subsequent mess mesages will come from the right servers so this is a way to guarantee that we will never drop an individual right once we get our actual document state there’s no race condition in which we can you know somehow miss one of these rights

And then never get it back if we do miss a right we’ll know because we’ll see a higher one coming in and then we can request the misr from the rights database so think of this sort of like a Delta protocol where this is like your state and this is your

Operation Operation poll and this is your Operation Push is this a bit over engineered for Google Docs maybe so if we want to make an arbitrary type of real-time editor where you can have as many right servers as you want communicating with each other in whatever method that they want

To communicate in whether that’s multicast or you know a star method or any sort of just around the world method it doesn’t really matter the point is that using a solution like this will ensure that we all are going to converge to the same state and we’re also not

Going to be doubly applying certain rights cool so the last thing I wanted to touch upon quickly was you know Jordan you’ve made this whole over engineered solution couldn’t we just do this with just a document snapshot database and the answer is actually yes you could right so you know if this is

Server a server B and server C and these guys are set up so that any of our clients can make rights to them you know insert X at y what we could actually do is just you know store our document state in here you know so again it would be like

First you write to the document database and then second you can propagate that right out to the world and that would work it would work just fine however if you think about it guess what’s a bottleneck now a database so it used to be the case that you know just having

One server was a bottleneck now we have many servers but just one database and so making sure that uh we actually have this kind of three-prong system of as many servers as we want as many write databases as we want and then a document snapshot database keep in mind the

Document snapshot database is updated via change data capture which means it can go as slowly as we need it to so it’s not going to become a bottleneck we actually have a pretty robust system here cool so the last question is going to be you know we have this document snapshot

Database down here and the question now becomes where is this version Vector coming from right we need to actually be able to store this with the data well there’s a couple things that we can do for starters we can basically reconstruct the uh version Vector from

Our right IDs how would we do that well if I go back up here to the schema I could do a linear scan down this table and say okay well the highest right from B that I’ve seen is three the highest right from a that I’ve seen is three so

Our version Vector is actually going to be 3 comma 3 now that’s feasible except it’s pretty darn stupid if our data is on multiple different partitions because then all of a sudden we have to make some sort of uh distributed transaction because one of the partitions could change after we’ve

Already queried it uh it actually gets pretty complicated so maybe that’s not actually going to be the best thing to do what we could do instead is actually keep a separate version Vector table for our document snapshots right so now we’re getting even more complicated here instead of of just having a document

Snapshot table we’re also going to have a version Vector table and so what this is going to allow us to do is uh as long as it stays in sync with our document snapshot uh we are in good shape otherwise the client gets messed up

Right so if I say you know I just loaded a document and snapshot uh with a version Vector that’s less recent than it now all of a sudden I might end up applying double rights because you know my document uh I’m going to basically see a right coming in uh I already

Incorporated that right into my document snapshot however the version Vector is telling me that I haven’t Incorporated that right so now I’m going to double apply it or another bad thing would be okay so I have a really recent version Vector but a not recent document

Snapshot now all of a sudden I’m going to see a right that comes in it’s not reflected in my document snapshot but my version Vector is telling me that I’ve already seen it now I never apply that right so again it’s really important that we keep these two in sync and uh

You guys aren’t going to like this one but we’re actually going to use a two-phase commit the reason being that again like I mentioned when we have something like derived data that is going over change data capture I don’t really actually care about keeping this thing that fast because it’s not in the

Critical path right the critical path of me being able to submit a right as a client is just these two guys right here anything happening over here can be nice and slow and that includes the two-phase commit so what’s actually going to happen now is we’ve got our version Vector store

We do some you know we have some intermediary server that’s reading these events and then two-phase committing over to these two cool so I think that covers everything that I wanted to cover just about I’ve over engineered the crap out of this thing as I love to do and as a result

Now we can just have as many right servers as we could possibly want in our life submit one right to literally any of them send it around to our heart’s content uh and then everyone should be in good shape so we’ve got our right cluster servers because this guy is

Going to be the writer this guy is going to be the reader on the right and so we can submit a right to any of these uh servers in a right cluster server the first thing that it actually has to do is go submit to our right

Databases and the idea here is uh it doesn’t really matter um you know how we partition these things but it’s going to be super useful if uh we make sure that basically you know if I have a all of the rights from a should go onto the

Same server because this guy at some point might request a bunch of different rights from server a to catch up and so as a result it would be great if they’re on the same location uh it would also be nice if you know these guys were writing

For the same document and were on the same server but uh they don’t have to be again I’m typically just just going to be requesting for one server at a time and so as a result we can Partition by this server ID it would also be nice if

We index on the right order because let’s say you know C8 comes in and I’ve only seen C4 now I want C5 C6 and C7 it would be a lot faster for me to know exactly where those are if those numbers are sorted in our database cool so the

Next aspect of this is this guy can now go sending out rights you know in step two to their heart’s content they can send them to all the other clients as they please and the clients can go merge them in they’re a crdt so it doesn’t

Really matter what the order is going to be when they get there okay so this is going to be our CR path right here everything on the bottom left and now what’s going to happen is we also perform change data capture where we take all of the rights from our right

Database and we actually are going to convert them into document snapshots and the document snapshot has two components it’s got the version Vector associated with that document snapshot and is also going to have the document snapshot itself where a document snapshot is just going to be all of the characters in the

Document but just ordered uh in terms of their index in that document so that way we can actually partition them by the range of the index of that character and uh it’s going to allow us to you know just say load one page of the text

Document at a time which is going to be really nice cool we can also of course get that version vector and that’s going to allow us to know the version Vector corresponding to one particular document over here um the one thing I should note is that I thought it would be kind of

Useful to use hbase for our document snapshot database the reason being that there’s probably going to be a decent amount of metadata associated with each write and to keep reads as possible it would be great if we could just grab the columns that we care about which is just

Going to be the character and the position that the character is at so H Bas I think could help a little bit in practice who knows you’d have to run a bunch of benchmarks cool and then finally the last aspect of this puzzle that I haven’t really spoken about

Because it’s kind of given for any problem like this is a cache if there are certain documents that tons of people are opening up per day uh we probably want some sort of lru in memory cache to actually store for all of those characters and their positions we can

Just do that in rdus uh you know you can pull these in as we’re trying to read them and then use lru to make sure that the unpopular ones get evicted and the popular ones stay in there so to actually go ahead and read a document just to reiterate the first step is

Going to be that we would get its document snapshot so 1 a and one rather sorry 1 a and 1B is going to be reading from these two guys then all of a sudden I have a snapshot in the version Vector associated with it I can start subscribing to these servers over here

The current right servers I’m going to receive a right and I’m going to say shoot you know what I might be missing a couple of wres let me actually reach out to one of these right databases over here to catch up and then once you’re caught up you’re good to go and you’re

Finished well guys that was a lot um I didn’t really focus nearly as much on the actual right ordering inv version Vector stuff in the last video but I actually think it’s quite important because uh when you have a crd like this um actually making sure that you don’t

Lose any of the rights and you don’t double apply any of the rights is uh is a bit of a struggle so with that being said I hope you enjoyed the video and I will see you in the next one

2 Comments

  1. I wonder why you would use another db plus two phase commit for the version vector table, instead of using the same db and use transactions instead.

Leave A Reply