To support tree-level conflicts, we're going to need to update the
working copy from one `MergedTree` to another. We're going need to
store multiple tree ids in the `tree_state` file. This patch gets us
closer to that by getting the diff from `MergedTree`s`, even though we
assume that they are legacy trees for now, so we can write to the
single-tree `tree_state` file.
It's currently a bit complicated to snapshot the working copy and
there's a lot of duplication in tests. This commit introduces a
function to simplify it. I made the function snapshot the working copy
and save the updated state. Some of the tests I changed previously
discarded the changes instead of saving them, but I think they all did
so because it was simpler. I left a few call sites unchanged because
they make concurrent changes.
We don't seem to have any tests that our protection from undetected
changes caused by writes happening right after checkout, so let's add
one. The test case loops 100 times and each iteration fails slightly
more than 80% of the time on my machine (if I remove the protection in
`TreeState::update_file_state()`), so it seems quite good at
triggering the race.
I don't see any reason for these tests to differ depending on backend,
so let's avoid running them twice (there are probably lots of other
tests that we're also running with different backends for little
reason).
Almost everyone calls the project "jj", and there seeems to be
consensus that we should rename the crates. I originally wanted the
crates to be called `jj` and `jj-lib`, but `jj` was already
taken. `jj-cli` is probably at least as good for it anyway.
Once we've published a 0.8.0 under the new names, we'll release 0.7.1
versions under the old names with pointers to the new crates names.
I ran an upgraded Clippy on the codebase. All the changes seem to be
about using variables directly in format strings instead of passing
them as separate arguments.
Let's acknowledge everyone's contributions by replacing "Google LLC"
in the copyright header by "The Jujutsu Authors". If I understand
correctly, it won't have any legal effect, but maybe it still helps
reduce concerns from contributors (though I haven't heard any
concerns).
Google employees can read about Google's policy at
go/releasing/contributions#copyright.
The `testutils` module should ideally not be part of the library
dependencies. Since they're used by the integration tests (and the CLI
tests), we need to move them to a separate crate to achieve that.
In many of these places, we don't need an owned value, so using a
reference means we don't force the caller to clone the value. I really
doubt it will have any noticeable impact on performance (I think these
are all once-per-repo paths); it's just a little simpler this way.
`wc_commit` seems clearer than `checkout` and not too much longer. I
considered `working_copy` but it was less clear (could be the path to
the working copy, or an instance of `WorkingCopy`). I also considered
`working_copy_commit`, but that seems a bit too long.
I think I copied the name `write_tree()` from Git, but I find it quite
confusing, since it's not clear if it write a tree to the working copy
or reads the working copy and writes a tree to the store (it's the
former).
The library crate shouldn't look up the user's `$HOME` directory
(maybe the library is used by a server process), so let's have the
caller pass it into the library crate instead.
We no longer need the commit ID, so we shouldn't make the callers pass
it. This lets us simplify several tests, because they no longer to
create commits just to check out a tree in the working copy.
Most tests need a repo but don't need a working copy. Let's have a
function for setting up a test repo. But first, let's free up the name
`init_repo()` by renaming it to `init_workspace()` (which is also more
accurate).
When there are concurrent operations that want to update the working
copy, it's useful to know which operation was the last to successfully
update the working copy. That can help use decide how to resolve a
mismatch between the repo view's record and the working copy's
record. If we detect such a difference, we can look at the working
copy's operation ID to see if it was updated by an operation before or
after we loaded the repo.
If the working copy's record says that it was updated at operation A
and we have loaded the repo at operation B (after A), we know that the
working copy is stale, so we can automatically update it (or tell the
user to run some command to update it if we think that's more
user-friendly).
Conversely, if we have loaded the repo at operation A and the working
copy's record says that it was updated at operation B, we know that
there was some concurrent operation that updated it. We can then
decide to print a warning telling the user that we skipped updating
because of the conflict. We already have logic for not updating the
working copy if the repo is loaded at an earlier operation, but maybe
we can drop that if we record the operation in the working copy (as
this patch does).
`WorkingCopy::check_out()` currently fails if the commit recorded on
disk has changed since it was last read. It fails with a "concurrent
checkout" error. That usually works well in practice, but one can
imagine cases where it's not correct. For an example where the current
behavior is wrong, consider this sequence of events:
1. Process A loads the repo and working copy.
2. Process B loads the repo at operation A. It has not loaded the
working copy yet.
3. Process A writes an operation and updates the working copy.
4. Process B loads the working copy and sees that it is checked out
to the commit process B set it to. We don't currently have any
checks that the working copy commit matches the view's checkout
(though I plan to add that).
5. Process B finishes its operation (which is now divergent with the
operation written by process A). It updates the working copy to
the checkout set in the repo view by process B. There's no data
loss here, but the behavior is surprising because we would usually
tell the user that we detected a concurrent update to the working
copy.
We should instead check that the working copy's commit on disk matches
what the previous repo view said, i.e. the view at the start of the
operation we just committed. This patch does that by having the caller
pass in the expected old commit ID.
This patch changes the interface for making changes to the working
copy by replacing `write_tree()` and `untrack()` by a single
`start_mutation()` method. The two functions now live on the returned
`LockedWorkingCopy` object instead. That is more flexible because the
caller can make multiple changes while the working copy is locked. It
also helps us reduce the risk of buggy callers that read the commit ID
before taking the lock, because we can now make it accessible only on
`LockedWorkingCopy`.
The `Repo` doesn't do anything with the `WorkingCopy` except keeping a
reference to it for its users to use. In fact, the entire lib crate
doesn't do antyhing with the `WorkingCopy`. It therefore seems simpler
to have the users of the crate manage the `WorkingCopy` instance. This
patch does that by letting `Workspace` own it. By not keeping an
instance in `Repo`, which is `Sync`, we can also drop the
`Arc<Mutex<>>` wrapping.
I left `Repo::working_copy()` for convenience for now, but now it
creates a new instance every time. It's only used in tests.
This further decoupling should help us add support for multiple
working copies (#13).
The auto-rebasing of descendants doesn't work if you have an open
commit checked out, which means that you may still end up with orphans
in that case (though that's usually a short-lived problem since they
get rebased when you close the commit). I'm also about to make
branches update to successors, but that also doesn't work when the
branch is on a working copy commit that gets rewritten. To fix this
problem, I've decided to let the caller of `WorkingCopy::commit()`
responsible for the transaction.
I expect that some of the code that this change moves from the lib
crate to the cli crate will later move back into the lib crate in some
form.
I had initially hoped that the type-safety provided by the separate
`FileRepoPath` and `DirRepoPath` types would help prevent bugs. I'm
not sure if it has prevented any bugs so far. It has turned out that
there are more cases than I had hoped where it's unknown whether a
path is for a directory or a file. One such example is for the path of
a conflict. Since it can be conflict between a directory and a file,
it doesn't make sense to use either. Instead we end up with quite a
bit of conversion between the types. I feel like they are not worth
the extra complexity. This patch therefore starts simplifying it by
replacing uses of `FileRepoPath` by `RepoPath`. `DirRepoPath` is a
little more complicated because its string form ends with a '/'. I'll
address that in separate patches.
I just changed my `~/.gitignore` and some tests started failing
because the working copy respects the user's `~/.gitignore`. We should
probably not depend on `$HOME` in the library crate. For now, this
patch just makes sure we set it to an arbitrary directory in the tests
where it matters.
This commits makes it so that running commands outside a repo results
in an error message instead of a panic.
We still don't look for a `.jj/` directory in ancestors of the current
directory.
I'm preparing to publish an early version before someone takes the
name(s) on crates.io. "jj" has been taken by a seemingly useless
project, but "jujube" and "jujube-lib" are still available, so let's
use those.