mirror of
https://github.com/martinvonz/jj.git
synced 2025-01-10 06:19:27 +00:00
247 lines
9.9 KiB
Markdown
247 lines
9.9 KiB
Markdown
|
# Architecture
|
||
|
|
||
|
## Data model
|
||
|
|
||
|
The commit data model is similar
|
||
|
to [Git's object model](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects)
|
||
|
, but with some differences.
|
||
|
|
||
|
## Separation of library from UI
|
||
|
|
||
|
The `jj` binary consists of two Rust crates: the library crate
|
||
|
(`jujutsu-lib`) and the CLI crate (`jujutsu`). The library crate is currently
|
||
|
only used by the CLI crate, but it is meant to also be usable from a GUI or TUI,
|
||
|
or in a server serving requests from multiple users. As a result, the library
|
||
|
should avoid interacting directly with the user via the terminal or by other
|
||
|
means; all input/output is handled by the CLI crate [^1]. Since the library
|
||
|
crate is meant to usable in a server, it also cannot read configuration from the
|
||
|
user's home directory, or from user-specific environment variables.
|
||
|
|
||
|
[^1]: There are a few exceptions, such as for messages printed during automatic
|
||
|
upgrades of the repo format
|
||
|
|
||
|
A lot of thought has gone into making the library crate's API easy to use, but
|
||
|
not much has gone into "details" such as which collection types are used, or
|
||
|
which symbols are exposed in the API.
|
||
|
|
||
|
## Storage-independent APIs
|
||
|
|
||
|
One overarching principle in the design is that it should be easy to change
|
||
|
where data is stored. The goal was to be able to put storage on local-disk by
|
||
|
default but also be able to move storage to the cloud at Google
|
||
|
(and for anyone). To that end, commits (and trees, files, etc.) are stored by
|
||
|
the commit backend, operations (and views) are stored by the operation backend,
|
||
|
the heads of the operation log are stored by the "op heads" backend, the commit
|
||
|
index is stored by the index backend, and the working copy is stored by the
|
||
|
working copy backend. The interfaces are defined in terms of plain Rust data
|
||
|
types, not tied to a specific format. The last working copy doesn't have its own
|
||
|
trait defined yet, but its interface is small and easy to create traits for when
|
||
|
needed.
|
||
|
|
||
|
The commit backend to use when loading a repo is specified in
|
||
|
the `.jj/repo/store/backend` file. We don't yet have support for choosing
|
||
|
different implementations for other kinds of backends than the commit backend.
|
||
|
|
||
|
## Design of the library crate
|
||
|
|
||
|
### Overview
|
||
|
|
||
|
Here's a diagram showing some important types in the library crate. The
|
||
|
following sections describe each component.
|
||
|
|
||
|
```mermaid
|
||
|
graph TD;
|
||
|
ReadonlyRepo-->Store;
|
||
|
ReadonlyRepo-->OpStore;
|
||
|
ReadonlyRepo-->OpHeadsStore;
|
||
|
ReadonlyRepo-->ReadonlyIndex
|
||
|
MutableIndex-->ReadonlyIndex;
|
||
|
Store-->Backend;
|
||
|
GitBackend-->Backend;
|
||
|
LocalBackend-->Backend;
|
||
|
LocalBackend-->StackedTable;
|
||
|
MutableRepo-->ReadonlyRepo;
|
||
|
MutableRepo-->MutableIndex;
|
||
|
Transaction-->MutableRepo;
|
||
|
WorkingCopy-->TreeState;
|
||
|
Workspace-->WorkingCopy;
|
||
|
Workspace-->RepoLoader;
|
||
|
RepoLoader-->Store;
|
||
|
RepoLoader-->OpStore;
|
||
|
RepoLoader-->OpHeadsStore;
|
||
|
RepoLoader-->ReadonlyRepo;
|
||
|
Git-->GitBackend;
|
||
|
GitBackend-->StackedTable;
|
||
|
```
|
||
|
|
||
|
### Backend
|
||
|
|
||
|
The [`Backend`](../../lib/src/backend.rs) trait defines the interface each
|
||
|
commit backend needs to implement. The current in-tree commit backends
|
||
|
are [`GitBackend`]((../../lib/src/git_backend.rs))
|
||
|
and [`LocalBackend`](../../lib/src/local_backend.rs).
|
||
|
|
||
|
Since there are non-commit backends, the `Backend` trait should probably be
|
||
|
renamed to `CommitBackend`.
|
||
|
|
||
|
### GitBackend
|
||
|
|
||
|
The `GitBackend` stores commits in a Git repository. It uses `libgit2` to read
|
||
|
and write commits and refs.
|
||
|
|
||
|
To prevent GC from deleting commits that are still reachable from the operation
|
||
|
log, the `GitBackend` stores a ref for each commit in the operation log in
|
||
|
the `refs/jj/keep/` namespace.
|
||
|
|
||
|
Commit data that is available in Jujutsu's model but not in Git's model is
|
||
|
stored in a `StackedTable` in `.jj/repo/store/extra/`. That is currently the
|
||
|
change ID and the list of predecessors. For commits that don't have any data in
|
||
|
that table, which is any commit created by `git`, we use an empty list as
|
||
|
predecessors, and the bit-reversed commit ID as change ID.
|
||
|
|
||
|
Because we use the Git Object ID as commit ID, two commits that differ only in
|
||
|
their change ID, for example, will get the same commit ID, so we error out when
|
||
|
trying to write the second one of them.
|
||
|
|
||
|
### LocalBackend
|
||
|
|
||
|
The `LocalBackend` is just a proof of concept. It stores objects addressed by
|
||
|
their hash, with one file per object.
|
||
|
|
||
|
### Store
|
||
|
|
||
|
The `Store` type wraps the `Backend` and returns wrapped types for commits and
|
||
|
trees to make them easier to use. The wrapped objects have a reference to
|
||
|
the `Store` itself, so you can do e.g. `commit.parents()` without having to
|
||
|
provide the `Store` as an argument.
|
||
|
|
||
|
The `Store` type also provides caching of commits and trees.
|
||
|
|
||
|
### ReadonlyRepo
|
||
|
|
||
|
A `ReadonlyRepo` represents the state of a repo at a specific operation. It
|
||
|
keeps the view object associated with that operation.
|
||
|
|
||
|
The repository doesn't know where on disk any working copies live. It knows, via
|
||
|
the view object, which commit is supposed to be the current working-copy commit
|
||
|
in each workspace.
|
||
|
|
||
|
### MutableRepo
|
||
|
|
||
|
A `MutableRepo` is a mutable version of `ReadonlyRepo`. It has a reference to
|
||
|
its base `ReadonlyRepo`, but it has its own copy of the view object and lets the
|
||
|
caller modify it.
|
||
|
|
||
|
### Transaction
|
||
|
|
||
|
The `Transaction` object has a `MutableRepo` and metadata that will go into the
|
||
|
operation log. When the transaction commits, the `MutableRepo` becomes a view
|
||
|
object in the operation log on disk, and the `Transaction` object becomes an
|
||
|
operation object. In memory, `Transaction::commit()` returns a
|
||
|
new `ReadonlyRepo`.
|
||
|
|
||
|
### RepoLoader
|
||
|
|
||
|
The `RepoLoader` represents a repository at an unspecified operation. You can
|
||
|
think of as a pointer to the `.jj/repo/` directory. It can create
|
||
|
a `ReadonlyRepo` given an operation ID.
|
||
|
|
||
|
### TreeState
|
||
|
|
||
|
The `TreeState` type represents the state of the files in a working copy. It
|
||
|
keep track of the mtime and size for each tracked file. It knows the `TreeId`
|
||
|
that the working copy represents. It has a `snapshot()` method that will use the
|
||
|
recorded mtimes and sizes and detect changes in the working copy. If anything
|
||
|
changed, it will return a new `TreeId`. It also has `checkout()` for updating
|
||
|
the files on disk to match a requested `TreeId`.
|
||
|
|
||
|
The `TreeState` type supports sparse checkouts. In fact, all working copies are
|
||
|
sparse; they simply track the full repo in most cases.
|
||
|
|
||
|
### WorkingCopy
|
||
|
|
||
|
The `WorkingCopy` type has a `TreeState` but also knows which `WorkspaceId` it
|
||
|
has and at which operation it was most recently updated.
|
||
|
|
||
|
### Workspace
|
||
|
|
||
|
The `Workspace` type represents the combination of a repo and a working copy (
|
||
|
like Git's 'worktree' concept).
|
||
|
|
||
|
The repo view at the current operation determines the desired working-copy
|
||
|
commit in each workspace. The `WorkingCopy` determines what is actually in the
|
||
|
working copy. The working copy can become stale if the working-copy commit was
|
||
|
changed from another workspace (or if the process updating the working copy
|
||
|
crashed, for example).
|
||
|
|
||
|
### Git
|
||
|
|
||
|
The `git` module contains functionality for interoperating with a Git repo, at a
|
||
|
higher level than the `GitBackend`. The `GitBackend` is restricted by
|
||
|
the `Backend` trait; the `git` module is specifically for Git-backed repos. It
|
||
|
has functionality for importing refs from the Git repo and for exporting to refs
|
||
|
in the Git repo. It also has functionality for pushing and pulling to/from Git
|
||
|
remotes.
|
||
|
|
||
|
### Revsets
|
||
|
|
||
|
A user-provided revset expression string goes through a few different stages to
|
||
|
be evaluated:
|
||
|
|
||
|
1. Parse the expression into a `RevsetExpression`, which is close to an AST
|
||
|
2. Resolve symbols and functions like `tags()` into specific commits. After
|
||
|
this stage, the expression is still a `RevsetExpression`, but it won't have
|
||
|
any `CommitRef` variants in it.
|
||
|
3. Resolve visibility. This stage resolves `visible_heads()` and `all()` and
|
||
|
produces a `ResolvedExpression`.
|
||
|
4. Evaluate the `ResolvedExpression` into a `Revset`.
|
||
|
|
||
|
This evaluation step is performed by `Index::evaluate_revset()`, allowing
|
||
|
the `Revset` implementation to leverage the specifics of a custom index
|
||
|
implementation. The first three steps are independent of the index
|
||
|
implementation.
|
||
|
|
||
|
### StackedTable
|
||
|
|
||
|
`StackedTable` (actually `ReadonlyTable` and `MutableTable`) is a simple disk
|
||
|
format for storing key-value pairs sorted by key. The keys have to have the same
|
||
|
size but the values can have different sizes. We use our own format because we
|
||
|
want [lock-free concurrency](concurrency.md) and there doesn't seem to be an
|
||
|
existing key-value store we could use.
|
||
|
|
||
|
The file format contains a lookup table followed by concatenated values. The
|
||
|
lookup table is a sorted list of keys, where each key is followed by the
|
||
|
associated value's offset in the concatenated values.
|
||
|
|
||
|
A table can have a parent table. When looking up a key, if it's not found in the
|
||
|
current table, the parent table is searched. We never update a table in place.
|
||
|
If the number of new entries to write is less than half the number of entries in
|
||
|
the parent table, we create a new table with the new entries and a pointer to
|
||
|
the parent. Otherwise, we copy the entries from the parent table and the new
|
||
|
entries into a new table with the grandparent as the parent. We do that
|
||
|
recursively so parent tables are at least 2 times as large as child tables. This
|
||
|
results in O(log N) amortized insertion time and lookup time.
|
||
|
|
||
|
There's no garbage collection of unreachable tables yet.
|
||
|
|
||
|
The tables are named by their hash. We keep a separate directory of pointers to
|
||
|
the current leaf tables, in the same way as we
|
||
|
do [for the operation log](concurrency.md#storage).
|
||
|
|
||
|
|
||
|
## Design of the CLI crate
|
||
|
|
||
|
### Templates
|
||
|
|
||
|
The concept is copied from Mercurial, but the syntax is different. The main
|
||
|
difference is that the top-level expression is a template expression, not a
|
||
|
string like in Mercurial. There is also no string interpolation (e.g.
|
||
|
`"Commit ID: {node}"` in Mercurial).
|
||
|
|
||
|
### Diff-editing
|
||
|
|
||
|
Diff-editing works by creating two very sparse working copies, containing only
|
||
|
the files we want the user to edit. We then let the user edit the right-hand
|
||
|
side of the diff. Then we simply snapshot that working copy to create the new
|
||
|
tree.
|