From 94fda7935a3cc37488962fa4c8e957430aa635ff Mon Sep 17 00:00:00 2001 From: Martin von Zweigbergk Date: Sat, 8 Jan 2022 10:02:12 -0800 Subject: [PATCH] docs: add technical doc about lock-free concurrency design --- docs/technical/concurrency.md | 112 ++++++++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 docs/technical/concurrency.md diff --git a/docs/technical/concurrency.md b/docs/technical/concurrency.md new file mode 100644 index 000000000..161c4c71d --- /dev/null +++ b/docs/technical/concurrency.md @@ -0,0 +1,112 @@ +# Concurrency + +## Introduction + +Concurrent editing is a key feature of DVCSs -- that's why they're called +*Distributed* Version Control Systems. A DVCS that didn't let users edit files +and create commits on separate machines at the same time wouldn't be much +of a distributed VCS. + +When conflicting changes are made in different clones, a DVCS will have to deal +with that when you push or pull. For example, when using Mercurial, if the +remote has updated a bookmark called `main` (Mercurial's bookmarks are similar +to a Git's branches) and you had updated the same bookmark locally but made it +point to a different target, Mercurial would add a bookmark called `main@origin` +to indicate the conflict. Git instead prevents the conflict by renaming pulled +branches to `origin/main` whether or not there was a conflict. However, most +DVCSs treat local concurrency quite differently, typically by using lock files +to prevent concurrent edits. Unlike those DVCSs, Jujutsu treats concurrent edits +the same whether they're made locally or remotely. + +One problem with using lock files is that they don't work when the clone is in a +distributed file system. Most clones are of course not stored in distributed +file systems, but it is a *big* problem when they are (Mercurial repos +frequently get corrupted, for example). + +Another problem with using lock files is related to complexity of +implementation. The simplest way of using lock files is to take coarse-grained +early: every command that may modify the repo takes a lock at the very +beginning. However, that means that operations that wouldn't actually conflict +would still have to wait for each other. The user experience can be improved by +using finer-grained locks and/or taking the locks later. The drawback of that is +complexity. For example, ou need to verify that any assumptions you made before +locking are still valid. + +To avoid depending on lock files, Jujutsu takes a different approach by +accepting that concurrent changes can always happen. It instead exposes any +conflicting changes to the user, much like other DVCSs do for conflicting +changes done remotely. + +Jujutsu's lock-free concurrency means that it's possible to update copies of the +clone on different machines and then let `rsync` (or Dropbox, or NFS, etc.) +merge them. The working copy may mismatch what's supposed to be checked out, but +no changes to the repo will be lost (added commits, moved branches, etc.). If +conflicting changes were made, they will appear as conflicts. For example, if a +branch was moved to two different locations, they will appear in `jj log` in +both locations but with a "?" after the name, and `jj status` will also inform +the user about the conflict. + +The most important piece in the lock-free design is the "operation log". That is +what allows us to detect and merge concurrent operations. + + +## Operation log + +The operation log is similar to a commit DAG (such as in Git), but each commit +object is instead an "operation" and each tree object is instead a "view". The +view object contains the set of visible head commits, branches, tags, and the +current checkout. The operation object contains a pointer to the view object +(like how commit objects point to tree objects), pointers to parent operation(s) +(like how commit objects point to parent commit(s)), and metadata about the +operation. These types are defined [here](../../lib/protos/op_store.proto). The +operation log is normally linear. It becomes non-linear if there are concurrent +operations. + +When a command starts, it loads the repo at the latest operation. Because the +associated view object completely defines the repo state, the running command +will not see any changes made by other processes thereafter. When the operation +completes, it is written with the start operation as parent. The operation +cannot fail to commit (except for disk failures and such). It is left for the +next command to notice if there were concurrent operations. It will have to be +able to do that anyway since the concurrent operation could have arrived via a +distributed file system. This model -- where each operation sees a consistent +view of the repo and are guaranteed to be able to commit their changes -- +greatly simplifies the implementation of commands. + +It is possible to load the repo at a particular operation with +`jj --at-operation `. If the command is mutational, that will result +in a fork in the operation log. That works exactly the same as if any later +operations had not existed when the command started. In other words, running +commands on a repo loaded at an earlier operation works the same way as if the +operations had been concurrent. This can be useful for simulating concurrent +operations. + +### Merging concurrent operations + +If Jujutsu tries to load the repo and finds multiple heads in the operation log, +it will do a 3-way merge of the view objects based on their common ancestor +(possibly several 3-way merges if there were more than two heads). Conflicts +are recorded in the resulting view object. For example, if branch `main` was +moved from commit A to commit B in one operation and moved to commit C in +concurrent operation, then `main` will be recorded as "moved from A to B or C". +See the `RefTarget` [definition](../../lib/protos/op_store.proto). + +Because we allow branches (etc.) to be in a conflicted state rather than just +erroring out when there are multiple heads, the user continue to use the repo, +including performing further operations on the repo. Of course, some commands +will fail when using a conflicted branch. For example, `jj checkout main` when +`main` is in a conflicted state will result in an error telling you that `main` +resolved to multiple revisions. + +### Storage + +The operation objects and view objects are stored in content-addressed storage +just like Git commits are. That makes them safe to write without locking. + +We also need a way of finding the current head of the operation log. We do that +by keeping the ID of the current head(s) as a file in a directory. The ID is the +name of the file; it has no contents. When an operation completes, we add a file +pointing to the new operation and then remove the file pointing to the old +operation. Writing the new file is what makes the operation visible (if the old +file didn't get properly deleted, then future readers will take care of that). +This scheme ensures that transactions are atomic.