Tree-level conflicts (#1624) will be stored as multiple trees
associated with a single commit. This patch adds support for that in
`backend::Commit` and in the backends.
When the Git backend writes a tree conflict, it creates a special root
tree for the commit. That tree has only the individual trees from the
conflict as subtrees. That way we prevent the trees from getting
GC'd. We also write the tree ids to the extra metadata table
(i.e. outside of the Git repo) so we don't need to load the tree
object to determine if there are conflicts.
I also added new flag to `backend::Commit` indicating whether the
commit is a new-style commit (with support for tree-level
conflicts). That will help with the migration. We will remove it once
we no longer care about old repos. When the flag is set, we know that
a commit with a single tree cannot have conflicts. When the flag is
not set, it's an old-style commit where we have to walk the whole tree
to find conflicts.
With `MergedTree`, we can iterate over conflicts by descending into
only the subdirectories that cannot be trivially resolved. We assume
that the trees have previously been resolved as much as possible, so
we don't attempt to resolve conflicts again.
This adds a function for resolving conflicts that can be automatically
resolved, i.e. like our current `merge_trees()` function. However, the
new function is written to merge an arbitrary number of trees and, in
case of unresolvable conflicts, to produce a `Conflict<TreeId>` as
result instead of writing path-level conflicts to the backend. Like
`merge_trees()`, it still leaves conflicts unresolved at the file
level if any hunks conflict, and it resolves paths that can be
trivially resolved even if there are other paths that do conflict.
In order to store conflicts in the commit, as conflicts between a set
of trees, we want to be able merge those trees on the fly. This
introduces a type for that. It has a `Merge(Conflict(Tree))` variant,
where the individual trees cannot have path-level conflicts. It also
has a `Legacy(Tree)` variant, which does allow path-level conflicts. I
think that should help us with the migration.
Alternatively, we can wrap BTreeMap<String, Option<RefTarget>> to flatten
Option<&Option<..>> internally, but doing that would be tedious. It would
also be unclear if map.remove(name) should construct an absent RefTarget if
the ref doesn't exist.
The next commit will change these maps to store Option<RefTarget> entries, but
None entries will still be omitted from the serialized data. Since ContentHash
should describe the serialized data, relying on the generic ContentHash would
cause future hash conflict where absent RefTarget entries will be preserved.
For example, ([remove], [None, add]) will be serialized as ([remove], [add]),
and deserialized to ([remove], [add, None]). If we add support for lossless
serialization, hash(([remove], [None, add])) should differ from the lossy one.
Summary: Unneeded with the MSRV bump. The other one
will have to wait until Rust 1.72.0
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Ifb3d862aedd3f2aeb05a86ce76978d4f
Summary: Now that we have Rust 1.71.0 at our fingertips, the `map_first_last`
feature has been stabilized. That means we can get rid of the `jj-lib` build
script and also the `nightly_shims` module.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Ibb5ce3258818a2de670763fbbaf3c2e7
Summary: Let's be more aggressive about tracking the latest stable Rust release.
There's little benefit to being conservative so early on, especially when no
users seem to have faced any issue with upgrading, or strictly required an old
Rust version.
Right now, just lagging Rust by 1 major release probably seems fine. We're
targeting 1.71.0 to get ahead of the curve, since 1.72.0 will likely release
sometime before the next `jj` release.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: I4e691b6ba63b5b9023a75ae0a6917672
Since RefTarget will be reimplemented on top of Conflict<Option<CommitId>>,
we won't be able to simply return a slice of type &[CommitId]. These functions
are also renamed in order to disambiguate from Conflict::adds()/removes().
We saw a failure in `test_merge_views_git_heads()` in the GitHub CI,
but I wasn't able to reproduce it locally. Using the
`commit_transactions()` helper in the test should fix it.
.unwrap() calls will be removed if we migrate RefTarget to new Conflict-based
type. Some of them were previously .unwrap_or_default(), but they would panic
anyway inside ref_target_from_proto().
is_none()/is_some() are wrapped as is_absent()/is_present() respectively.
If we see new RefTarget as a Conflict, an "absent" target isn't an emptyish
value like None, but an add of single [None]. It seemed a bit weird to call
such target is_none().
RefTarget type will be a wrapper around Conflict<Option<CommitId>>, and the
match arms would be more verbose. Fortunately, new code looks simpler since
we can merge non-conflicting cases.
It's named after Conflict::from_legacy_form(). If RefTarget is migrated to
new Conflict type, from_legacy_form([], [add]) will create a normal target,
and from_legacy_form([], []) will be equivalent to the current None target.
That's why this function isn't named as RefTarget::conflict().
If RefTarget is migrated to new Conflict type, this function will create
RefTarget(Conflict::resolved(Some(id))).
We still need some .unwrap() to insert Option<RefTarget> into map, but maps
will be changed to store new RefTarget type, and their mutation API will
guarantee that Conflict::resolved(None) is eliminated.
Perhaps, all view.get_<kind>(name) functions can return reference, but I'm
not willing to change the interface at this point. I'll revisit this after
migrating Option<RefTarget> to new Conflict-based type.
If we migrate RefTarget to new Conflict-based type, it won't store
Conflict<CommitId>, but Conflict<Option<CommitId>>. As the Option will
be internalized, new RefTarget type will also represent an absent target.
The 'target: Option<RefTarget>' argument will be replaced with new RefTarget
type.
I've also renamed the function for consistency with the following changes.
It would be surprising if set_local_branch(name, target) could remove the
branch. I feel the name set_local_branch_target() is less confusing.
This might look more complicated than the original code, but it clarifies
that we always eliminate a (remove, add) pair. The behavior slightly changed
since an absent ref (i.e. None) in "removes" can now be paired with an "add"
even if "removes" contained other ids. Before, it was possible only when
removes.is_empty().
The order of conflicted ids slightly changed since Conflict::simplify()
tries to preserve diff pairs. It shouldn't matter so long as the result is
stable.
Git CLI rejects it (though the data model would probably work fine with
"HEAD" branch.) This ensures that HEAD@git in JJ world is not an exported
branch named "HEAD".
This is (almost) a result of running
cargo +nightly clippy --workspace --all-targets --fix \
-- -A 'clippy::needless_raw_string_hashes'
with yesterday's nightly clippy.
https://github.com/mitsuhiko/insta/issues/389 causes numerous additional
`needless_raw_string_hashes` warnings, but it will hopefully be fixed soon.
For now, I recommend appending the second line to your invocations.
This will help to rewrite refs::merge_ref_targets() to leverage Conflict
type. I'm not pretty sure if we'll want to do that, but this change seems
also good for the existing code.
Almost everyone calls the project "jj", and there seeems to be
consensus that we should rename the crates. I originally wanted the
crates to be called `jj` and `jj-lib`, but `jj` was already
taken. `jj-cli` is probably at least as good for it anyway.
Once we've published a 0.8.0 under the new names, we'll release 0.7.1
versions under the old names with pointers to the new crates names.
It seemed too verbose to always include @remote branches, so synced remotes
are omitted by default. If the given symbol contained '@', all remote symbols
are populated so that the distance of remote fragment is taken into account.
Errors that may occur while loading backend would vary per backends, and
it's unlikely that these errors could be mapped to BackendError variants
other than BackendError::Other. So let's extract Other(_) of that kind as
a separate type to clarify there would be no other error variants.
Perhaps, Backend/Error will be renamed to CommitBackend/Error or
CommitStore/Error?, whereas I think BackendInit/LoadError can be shared
among store factories.
This is much simpler and I was slightly surprised that it doesn't have
much impact on performance. I tried `jj --ignore-working-copy diff -s
--from root --to v5.15` in the Linux kernel repo, and there was
perhaps a 1.5% slowdown (508 ms -> 515 ms). In more normal cases (like
diffing a single commit against its parent), I couldn't measure any
difference at all.
This helps to map initialization error to BackendError without too general
From impl. I don't think io::Error (or our PathError) should be automatically
translated to BackendError::Other because BackendError has more specific
variants depending on context. If the error is specific to initialization,
it makes sense to translate it to Other variant.
I don't think we'll want to record a label for each term, because such
labels would get stale, and it seems hard to make them make sense
after transferring a remote to another repo. I think we'll probably
want to infer labels on demand instead (#1176).
Perhaps, this would handle patterns like ["a(b", "c)"] better. It might not
be correct to error out on "(", but should be better than building wrong
regexp pattern "a(b|c)".
Now that we've replaced `MergeHunk` by a `Conflict`, it makes sense to
convert the input `Conflict<FileId>` by mapping each term. Unlike
`Option::map()` I made `Conflict::map()` take a reference `self`,
because it's not uncommon to want to map the same conflict multiple
times. I'm going to use that for producing a
`Conflict<Option<TreeValue>>` from a `Conflict<Tree>` and a set of
paths.
Since `Conflict`s can represent the resolved state, so
`Conflict<ContentHunk>` can represent the states that we use
`MergeHunk` for. `MergeHunk` does force the user to handle the
resolved case, which may be useful. I suppose one could use the same
argument for making `Conflict` an enum, i.e. if we think that
`MergeHunk`'s two variants are beneficial, then we should consider
making `Conflict` an enum with those two variants.
It's useful to have a more readable `Debug` format for `Vec<u8>`
(`"foo"` is better than `[102, 111, 111]`). It might also make types
in function signatures and elsewhere more readable.
This only parses the fields relevant to us, i.e.:
- name: the stable identifier of the submodule
- path: the path to the submodule in the current commit
- url: the remote we can clone the submodule from
The full list of .gitmodules fields can be found at
https://git-scm.com/docs/gitmodules.
This eliminates indirect access through Vec<u8> and improves cache locality
while sorting the index entries. We can achieve a similar result by using
SmallVec<[u8; 24]> in place of Commit/ChangeId(Vec<u8>), but we would have
to determine a reasonable id length across backends. Indexing [u8; 4] performs
better, at the cost of the API and implementation complexity.
For temporary Commit/ChangeId allocation in general, I think a borrowed type
like Path/PathBuf will help.
Testing with my "linux" repo, this saves ~670ms needed to initialize both
change id index and disambiguation indexes.
I'll rewrite resolve_prefix_range() to branch depending on the prefix length,
and the easiest way to do that is passing iterator to continuation function
instead of returning iterator as an either (or boxed) type.
I'm going to rewrite IdIndex to store only first few bytes of the key. A
separate table helps there.
At this point, it wouldn't make sense to convert usize to u32, but the new
index will store ([u8; 4], u32) pairs.
It allows us to build multiple IdIndex instances within a single loop. As the
final sorting is heavy operation, I don't want to implement Default + Extend
for IdIndex to be compatible with Iterator::unzip().
With my colocated "linux" repo, this appears to save ~50ms startup overhead.
Since the repo has lots of indirect tags, we can't eliminate tag object
loading at all. But still, it's faster than falling back to peel_to_commit().