This partially reverts 4c8f484278 "graphlog: key by commit id (not index
position)." As Martin pointed out, it made "log -r 'tags()' -T.." in git
repo super slow. Apparently, both clone() and hash map insertion/lookup costs
increased by that change. Since we don't need CommitId inside the graph
iterator, we can simply replace it with IndexPosition, and resolve it to
CommitId later.
Copied from MergedTree::has_conflict(). I feel it's slightly better since
RefTarget "is" always a Conflict-based type. It could be inverted to
RefTarget::is_resolved(), but refs are usually resolved, and all callers
have special case for !is_resolved() state.
`MergedTree` is now ready to be used when checking if a commit has
conflicts, and when listing conflicts. We don't yet a way for the user
to say they want to use tree-level conflicts even for these
cases. However, since the backend can decide, we should be able to
have our backend return tree-level conflicts. All writes will still
use path-level conflicts, so the experimentation we can do at Google
is limited.
Beacause `MergedTree` doesn't yet have a way of walking conflicts
while restricting it by a matcher, this will make `jj resolve` a
little slower. I suspect no one will notice.
Tree-level conflicts (#1624) will be stored as multiple trees
associated with a single commit. This patch adds support for that in
`backend::Commit` and in the backends.
When the Git backend writes a tree conflict, it creates a special root
tree for the commit. That tree has only the individual trees from the
conflict as subtrees. That way we prevent the trees from getting
GC'd. We also write the tree ids to the extra metadata table
(i.e. outside of the Git repo) so we don't need to load the tree
object to determine if there are conflicts.
I also added new flag to `backend::Commit` indicating whether the
commit is a new-style commit (with support for tree-level
conflicts). That will help with the migration. We will remove it once
we no longer care about old repos. When the flag is set, we know that
a commit with a single tree cannot have conflicts. When the flag is
not set, it's an old-style commit where we have to walk the whole tree
to find conflicts.
With `MergedTree`, we can iterate over conflicts by descending into
only the subdirectories that cannot be trivially resolved. We assume
that the trees have previously been resolved as much as possible, so
we don't attempt to resolve conflicts again.
This adds a function for resolving conflicts that can be automatically
resolved, i.e. like our current `merge_trees()` function. However, the
new function is written to merge an arbitrary number of trees and, in
case of unresolvable conflicts, to produce a `Conflict<TreeId>` as
result instead of writing path-level conflicts to the backend. Like
`merge_trees()`, it still leaves conflicts unresolved at the file
level if any hunks conflict, and it resolves paths that can be
trivially resolved even if there are other paths that do conflict.
In order to store conflicts in the commit, as conflicts between a set
of trees, we want to be able merge those trees on the fly. This
introduces a type for that. It has a `Merge(Conflict(Tree))` variant,
where the individual trees cannot have path-level conflicts. It also
has a `Legacy(Tree)` variant, which does allow path-level conflicts. I
think that should help us with the migration.
Alternatively, we can wrap BTreeMap<String, Option<RefTarget>> to flatten
Option<&Option<..>> internally, but doing that would be tedious. It would
also be unclear if map.remove(name) should construct an absent RefTarget if
the ref doesn't exist.
The next commit will change these maps to store Option<RefTarget> entries, but
None entries will still be omitted from the serialized data. Since ContentHash
should describe the serialized data, relying on the generic ContentHash would
cause future hash conflict where absent RefTarget entries will be preserved.
For example, ([remove], [None, add]) will be serialized as ([remove], [add]),
and deserialized to ([remove], [add, None]). If we add support for lossless
serialization, hash(([remove], [None, add])) should differ from the lossy one.
Summary: Unneeded with the MSRV bump. The other one
will have to wait until Rust 1.72.0
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Ifb3d862aedd3f2aeb05a86ce76978d4f
Summary: Now that we have Rust 1.71.0 at our fingertips, the `map_first_last`
feature has been stabilized. That means we can get rid of the `jj-lib` build
script and also the `nightly_shims` module.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Ibb5ce3258818a2de670763fbbaf3c2e7
Summary: Let's be more aggressive about tracking the latest stable Rust release.
There's little benefit to being conservative so early on, especially when no
users seem to have faced any issue with upgrading, or strictly required an old
Rust version.
Right now, just lagging Rust by 1 major release probably seems fine. We're
targeting 1.71.0 to get ahead of the curve, since 1.72.0 will likely release
sometime before the next `jj` release.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: I4e691b6ba63b5b9023a75ae0a6917672
Since RefTarget will be reimplemented on top of Conflict<Option<CommitId>>,
we won't be able to simply return a slice of type &[CommitId]. These functions
are also renamed in order to disambiguate from Conflict::adds()/removes().
We saw a failure in `test_merge_views_git_heads()` in the GitHub CI,
but I wasn't able to reproduce it locally. Using the
`commit_transactions()` helper in the test should fix it.
.unwrap() calls will be removed if we migrate RefTarget to new Conflict-based
type. Some of them were previously .unwrap_or_default(), but they would panic
anyway inside ref_target_from_proto().
is_none()/is_some() are wrapped as is_absent()/is_present() respectively.
If we see new RefTarget as a Conflict, an "absent" target isn't an emptyish
value like None, but an add of single [None]. It seemed a bit weird to call
such target is_none().
RefTarget type will be a wrapper around Conflict<Option<CommitId>>, and the
match arms would be more verbose. Fortunately, new code looks simpler since
we can merge non-conflicting cases.
It's named after Conflict::from_legacy_form(). If RefTarget is migrated to
new Conflict type, from_legacy_form([], [add]) will create a normal target,
and from_legacy_form([], []) will be equivalent to the current None target.
That's why this function isn't named as RefTarget::conflict().
If RefTarget is migrated to new Conflict type, this function will create
RefTarget(Conflict::resolved(Some(id))).
We still need some .unwrap() to insert Option<RefTarget> into map, but maps
will be changed to store new RefTarget type, and their mutation API will
guarantee that Conflict::resolved(None) is eliminated.
Perhaps, all view.get_<kind>(name) functions can return reference, but I'm
not willing to change the interface at this point. I'll revisit this after
migrating Option<RefTarget> to new Conflict-based type.
If we migrate RefTarget to new Conflict-based type, it won't store
Conflict<CommitId>, but Conflict<Option<CommitId>>. As the Option will
be internalized, new RefTarget type will also represent an absent target.
The 'target: Option<RefTarget>' argument will be replaced with new RefTarget
type.
I've also renamed the function for consistency with the following changes.
It would be surprising if set_local_branch(name, target) could remove the
branch. I feel the name set_local_branch_target() is less confusing.
This might look more complicated than the original code, but it clarifies
that we always eliminate a (remove, add) pair. The behavior slightly changed
since an absent ref (i.e. None) in "removes" can now be paired with an "add"
even if "removes" contained other ids. Before, it was possible only when
removes.is_empty().
The order of conflicted ids slightly changed since Conflict::simplify()
tries to preserve diff pairs. It shouldn't matter so long as the result is
stable.
Git CLI rejects it (though the data model would probably work fine with
"HEAD" branch.) This ensures that HEAD@git in JJ world is not an exported
branch named "HEAD".
This is (almost) a result of running
cargo +nightly clippy --workspace --all-targets --fix \
-- -A 'clippy::needless_raw_string_hashes'
with yesterday's nightly clippy.
https://github.com/mitsuhiko/insta/issues/389 causes numerous additional
`needless_raw_string_hashes` warnings, but it will hopefully be fixed soon.
For now, I recommend appending the second line to your invocations.
This will help to rewrite refs::merge_ref_targets() to leverage Conflict
type. I'm not pretty sure if we'll want to do that, but this change seems
also good for the existing code.
Almost everyone calls the project "jj", and there seeems to be
consensus that we should rename the crates. I originally wanted the
crates to be called `jj` and `jj-lib`, but `jj` was already
taken. `jj-cli` is probably at least as good for it anyway.
Once we've published a 0.8.0 under the new names, we'll release 0.7.1
versions under the old names with pointers to the new crates names.
It seemed too verbose to always include @remote branches, so synced remotes
are omitted by default. If the given symbol contained '@', all remote symbols
are populated so that the distance of remote fragment is taken into account.
Errors that may occur while loading backend would vary per backends, and
it's unlikely that these errors could be mapped to BackendError variants
other than BackendError::Other. So let's extract Other(_) of that kind as
a separate type to clarify there would be no other error variants.
Perhaps, Backend/Error will be renamed to CommitBackend/Error or
CommitStore/Error?, whereas I think BackendInit/LoadError can be shared
among store factories.
This is much simpler and I was slightly surprised that it doesn't have
much impact on performance. I tried `jj --ignore-working-copy diff -s
--from root --to v5.15` in the Linux kernel repo, and there was
perhaps a 1.5% slowdown (508 ms -> 515 ms). In more normal cases (like
diffing a single commit against its parent), I couldn't measure any
difference at all.
This helps to map initialization error to BackendError without too general
From impl. I don't think io::Error (or our PathError) should be automatically
translated to BackendError::Other because BackendError has more specific
variants depending on context. If the error is specific to initialization,
it makes sense to translate it to Other variant.
I don't think we'll want to record a label for each term, because such
labels would get stale, and it seems hard to make them make sense
after transferring a remote to another repo. I think we'll probably
want to infer labels on demand instead (#1176).
Perhaps, this would handle patterns like ["a(b", "c)"] better. It might not
be correct to error out on "(", but should be better than building wrong
regexp pattern "a(b|c)".
Now that we've replaced `MergeHunk` by a `Conflict`, it makes sense to
convert the input `Conflict<FileId>` by mapping each term. Unlike
`Option::map()` I made `Conflict::map()` take a reference `self`,
because it's not uncommon to want to map the same conflict multiple
times. I'm going to use that for producing a
`Conflict<Option<TreeValue>>` from a `Conflict<Tree>` and a set of
paths.
Since `Conflict`s can represent the resolved state, so
`Conflict<ContentHunk>` can represent the states that we use
`MergeHunk` for. `MergeHunk` does force the user to handle the
resolved case, which may be useful. I suppose one could use the same
argument for making `Conflict` an enum, i.e. if we think that
`MergeHunk`'s two variants are beneficial, then we should consider
making `Conflict` an enum with those two variants.
It's useful to have a more readable `Debug` format for `Vec<u8>`
(`"foo"` is better than `[102, 111, 111]`). It might also make types
in function signatures and elsewhere more readable.
This only parses the fields relevant to us, i.e.:
- name: the stable identifier of the submodule
- path: the path to the submodule in the current commit
- url: the remote we can clone the submodule from
The full list of .gitmodules fields can be found at
https://git-scm.com/docs/gitmodules.
This eliminates indirect access through Vec<u8> and improves cache locality
while sorting the index entries. We can achieve a similar result by using
SmallVec<[u8; 24]> in place of Commit/ChangeId(Vec<u8>), but we would have
to determine a reasonable id length across backends. Indexing [u8; 4] performs
better, at the cost of the API and implementation complexity.
For temporary Commit/ChangeId allocation in general, I think a borrowed type
like Path/PathBuf will help.
Testing with my "linux" repo, this saves ~670ms needed to initialize both
change id index and disambiguation indexes.
I'll rewrite resolve_prefix_range() to branch depending on the prefix length,
and the easiest way to do that is passing iterator to continuation function
instead of returning iterator as an either (or boxed) type.
I'm going to rewrite IdIndex to store only first few bytes of the key. A
separate table helps there.
At this point, it wouldn't make sense to convert usize to u32, but the new
index will store ([u8; 4], u32) pairs.
It allows us to build multiple IdIndex instances within a single loop. As the
final sorting is heavy operation, I don't want to implement Default + Extend
for IdIndex to be compatible with Iterator::unzip().
With my colocated "linux" repo, this appears to save ~50ms startup overhead.
Since the repo has lots of indirect tags, we can't eliminate tag object
loading at all. But still, it's faster than falling back to peel_to_commit().
It was convenient that what the git backend stored in its "extras"
table is exactly a subset of the fields that local backend stores, but
it's bit ugly and limiting. For example, it makes it possible to
populate the `author` field in the git extras, but that would have no
effect. It's better that it's not possible to do that (we store the
author field in the git commit, of course).
What made me notice this now was that I'm working on tree-level
conflicts (#1624) and I'm thinking of adding a field to the git extras
saying "this commit has single tree, but it's still a new-style
commit", so we can know not to walking such trees to find path-level
conflicts. That's only needed for the git backend because we don't
care about compatibility for the local backend.
If we allow `Conflict::simplify()` to swap the removes and adds as
freely as we currently do, we may present the user with a conflict
marker with a diff that has never appeared anywhere before the
simplification. That seems very confusing. Let's instead preserve the
diffs when we simplify conflicts.
It's a bit weird to simplify a conflict like `A B->C D->E C->F` to `A
B->E D->F` because it changes which diffs are in the conflict, but
that's what we currently do. Let's have a test for that.
We actually already have tests showing how `A B->C D->A` gets
simplified to `C B->D`, but those are less obviously weird because
when rendered as `removes = [B], adds = [C, D]`, it doesn't look that
different from the reverse.
This commit fixes#1305
Before this commit, running `jj init --git-repo=./` in a folder that
does not have a .git would cause jj to panick and leave an unfinished corrupted jj repo.
This commit fixes that by changing the call chain to return an error
instead of calling .unwrap() and panicking. This commit also adds logic to delete the unfinished jj
repository when the git backend initialization failed.
Before this commit, running the above command would result in the following
```
Running `jj/target/debug/jj init --git-repo=./`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { code: -3, klass: 2, message: "failed to resolve path '/Users/kevincliao/github/jj/test-repo/.jj/repo/store/../../../.git': No such file or directory" }', lib/src/git_backend.rs:83:75
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
After this commit, the result is the following and the jj repo is deleted:
```
Running `jj/target/debug/jj init --git-repo=./`
Error: Failed to access the repository: Error: Failed to open git repository: failed to resolve path '/Users/kevincliao/github/jj/test-repo/.jj/repo/store/../../../.git': No such file or directory; class=Os (2); code=NotFound (-3)
```
..and other assorted boilerplate. These are just stubs for now, but now
that we've reserved the `submodule_store` subdirectory, we can start
adding more functionality.
This changes the behavior in one of the cases ilyagr@
[mentioned](https://github.com/martinvonz/jj/pull/1610#discussion_r1199823932)
to match his suggestion. After some more thinking while working on
tree-level conflicts, I now think it's clear that the added `+C-C`
terms should have no effect on the result. A very similar argument is
that `Conflict::simplify()` should not change the result of
`trivial_merge()`. I'll add tests for that next.
Before we had `conflicts::Conflict`, most of these functions took a
`backend::Conflict`. I think I didn't want to pollute the `backend`
module with this kind of logic, trying to keep it focused on
storage. Now that we have the type in `conflicts`, however, I think it
makes sense to move these functions onto it.
This gets rid of round-trip conversion from queries like "(main..)-". I have
such expression in my default log/disambiguation revset, and the query could
take ~150ms to convert head positions back and forth if the repository had
tons of unmerged commits.
Since unchanged refs should be pinned by new_git_heads, we only need to
consider about "changed" old_git_targets. This allows us to return early
if hidable_git_heads.is_empty().
In colocated mid-size "linux" repo, this saves ~450ms needed to do
enforce_view_invariants(). We could instead make add_head() to return early,
but the condition would be a bit weird since HEAD@git is typically a parent
of known heads, not a head itself.
While playing with perf.data captured with "jj log", I noticed RepoPath::join()
has measurable cost. The first half is small alloc()s for Vec and Strings, and
the latter is realloc() on Vec::push(). Removing realloc() is easy, so let's
do that.
We already have the new `Conflict::from_backend_conflict()` for
converting from a `backend::Conflict`, but we model conflicts in a
similar way in at least `RefTarget`. I'd like to be able to use
`conflicts::Conflict` there too. To prepare for that, let's extract
generic methods from `Conflict::from_backend_conflict()` and
`Conflict::to_backend_conflict()`.
I'm not sure I'll get around to making `RefTarget` use `Conflict` but
this commit seems like nice cleanup either way. It makes the tests
simpler if nothing else.
If one side changes the contents and one side changes the executable
bit, we get a non-trivial conflict in the `TreeValue`s, but once we've
split them up into `FileId`s and bools, we can trivially resolve them
separately, without having to read file contents.
It seems generally useful to be able to simplify a conflict, and it's
not specific to merging trees, so let's move it to
`conflicts.rs`. Once we're done with the migration to tree-level
conflicts, I think `Conflict::simplify()` will remain but
`tree::simplify_conflict()` will be gone.
The tests I added there are quite similar to those of
`trivial_merge()`. I hope we can make `Conflict::simplify()` call
`trivial_merge()` later. I think it would also make sense to move
`trivial_merge()` onto `Conflict`, or at least have a
`Conflict::resolve_trivial()` calling `trivial_merge()`.
Since we switched to the new `conflicts::Conflict` type, we represent
a missing tree entry by a `None` value in the conflict, not a missing
"add", so the condition removed in this commit will never happen, and
the case will be handled by the case just below it instead.
For tree-level conflicts (#1624), I plan to remove `ConflictId`
completely. This commit removes `ConflictId` from
`update_conflict_from_content()` by instead making it take a
`Conflict<Option<TreeValue>>` and return a possibly different such
value.
I made the call site in `working_copy` avoid writing the conflict to
the store if it's unchanged, but I didn't make the same optimization
in `merge_tools` becuase it's much more likely to have changed there.
Use `br@git` instead.
Before, if there is not a local branch `br`, jj tried to resolve
it as a git ref `refs/heads/br`. Unchanged from before, `br` can
still be resolved as a tag `refs/tag/br`.
This doesn't change the way @git branches are stored in `git_refs` as opposed
to inside `BranchTarget` like normal remote-tracking branches. There are
subtle differences in behavior with e.g. `jj branch forget` and I'm not sure
how easy it is to rewrite `jj git import/export` to support a different
way of storage.
I've decided to call these "local-git tracking branches" since they track
branches in the local git repository. "local git-tracking" branches sounds a
bit more natural, but these could be confused with there are no remote
git-tracking branches. If one had the idea these might exist, they would be
confused with remote-tracking branches in the local git repo.
This addresses a portion of #1666
Suppose the operation log is mostly linear, this means "jj op log" iterator
won't look ahead more than one entry.
Another idea is to either add a "generation" number to operation data, or
build index of operations. Since we'll eventually add GC command, I don't
think op index would be required. I think readdir() is good enough to resolve
hex prefix against ~10k entries.
For now, walk_ancestors() is a free function. If we add Repo-like abstraction
over OpStore + OpHeadsStore, this function will probably be migrated there.
The idea is that the DAG can be split at single fork point while walking
chronologically, and run DFS-based topological sort for each sub graph.
This works well for operation log.
We could also build a topo-sort stack while splitting, but we couldn't detect
cycles in that way. It would also be quite expensive on pessimistic cases.