If we allow `Conflict::simplify()` to swap the removes and adds as
freely as we currently do, we may present the user with a conflict
marker with a diff that has never appeared anywhere before the
simplification. That seems very confusing. Let's instead preserve the
diffs when we simplify conflicts.
It's a bit weird to simplify a conflict like `A B->C D->E C->F` to `A
B->E D->F` because it changes which diffs are in the conflict, but
that's what we currently do. Let's have a test for that.
We actually already have tests showing how `A B->C D->A` gets
simplified to `C B->D`, but those are less obviously weird because
when rendered as `removes = [B], adds = [C, D]`, it doesn't look that
different from the reverse.
This commit fixes#1305
Before this commit, running `jj init --git-repo=./` in a folder that
does not have a .git would cause jj to panick and leave an unfinished corrupted jj repo.
This commit fixes that by changing the call chain to return an error
instead of calling .unwrap() and panicking. This commit also adds logic to delete the unfinished jj
repository when the git backend initialization failed.
Before this commit, running the above command would result in the following
```
Running `jj/target/debug/jj init --git-repo=./`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { code: -3, klass: 2, message: "failed to resolve path '/Users/kevincliao/github/jj/test-repo/.jj/repo/store/../../../.git': No such file or directory" }', lib/src/git_backend.rs:83:75
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
After this commit, the result is the following and the jj repo is deleted:
```
Running `jj/target/debug/jj init --git-repo=./`
Error: Failed to access the repository: Error: Failed to open git repository: failed to resolve path '/Users/kevincliao/github/jj/test-repo/.jj/repo/store/../../../.git': No such file or directory; class=Os (2); code=NotFound (-3)
```
..and other assorted boilerplate. These are just stubs for now, but now
that we've reserved the `submodule_store` subdirectory, we can start
adding more functionality.
This changes the behavior in one of the cases ilyagr@
[mentioned](https://github.com/martinvonz/jj/pull/1610#discussion_r1199823932)
to match his suggestion. After some more thinking while working on
tree-level conflicts, I now think it's clear that the added `+C-C`
terms should have no effect on the result. A very similar argument is
that `Conflict::simplify()` should not change the result of
`trivial_merge()`. I'll add tests for that next.
Before we had `conflicts::Conflict`, most of these functions took a
`backend::Conflict`. I think I didn't want to pollute the `backend`
module with this kind of logic, trying to keep it focused on
storage. Now that we have the type in `conflicts`, however, I think it
makes sense to move these functions onto it.
This gets rid of round-trip conversion from queries like "(main..)-". I have
such expression in my default log/disambiguation revset, and the query could
take ~150ms to convert head positions back and forth if the repository had
tons of unmerged commits.
Since unchanged refs should be pinned by new_git_heads, we only need to
consider about "changed" old_git_targets. This allows us to return early
if hidable_git_heads.is_empty().
In colocated mid-size "linux" repo, this saves ~450ms needed to do
enforce_view_invariants(). We could instead make add_head() to return early,
but the condition would be a bit weird since HEAD@git is typically a parent
of known heads, not a head itself.
While playing with perf.data captured with "jj log", I noticed RepoPath::join()
has measurable cost. The first half is small alloc()s for Vec and Strings, and
the latter is realloc() on Vec::push(). Removing realloc() is easy, so let's
do that.
We already have the new `Conflict::from_backend_conflict()` for
converting from a `backend::Conflict`, but we model conflicts in a
similar way in at least `RefTarget`. I'd like to be able to use
`conflicts::Conflict` there too. To prepare for that, let's extract
generic methods from `Conflict::from_backend_conflict()` and
`Conflict::to_backend_conflict()`.
I'm not sure I'll get around to making `RefTarget` use `Conflict` but
this commit seems like nice cleanup either way. It makes the tests
simpler if nothing else.
If one side changes the contents and one side changes the executable
bit, we get a non-trivial conflict in the `TreeValue`s, but once we've
split them up into `FileId`s and bools, we can trivially resolve them
separately, without having to read file contents.
It seems generally useful to be able to simplify a conflict, and it's
not specific to merging trees, so let's move it to
`conflicts.rs`. Once we're done with the migration to tree-level
conflicts, I think `Conflict::simplify()` will remain but
`tree::simplify_conflict()` will be gone.
The tests I added there are quite similar to those of
`trivial_merge()`. I hope we can make `Conflict::simplify()` call
`trivial_merge()` later. I think it would also make sense to move
`trivial_merge()` onto `Conflict`, or at least have a
`Conflict::resolve_trivial()` calling `trivial_merge()`.
Since we switched to the new `conflicts::Conflict` type, we represent
a missing tree entry by a `None` value in the conflict, not a missing
"add", so the condition removed in this commit will never happen, and
the case will be handled by the case just below it instead.
For tree-level conflicts (#1624), I plan to remove `ConflictId`
completely. This commit removes `ConflictId` from
`update_conflict_from_content()` by instead making it take a
`Conflict<Option<TreeValue>>` and return a possibly different such
value.
I made the call site in `working_copy` avoid writing the conflict to
the store if it's unchanged, but I didn't make the same optimization
in `merge_tools` becuase it's much more likely to have changed there.
Use `br@git` instead.
Before, if there is not a local branch `br`, jj tried to resolve
it as a git ref `refs/heads/br`. Unchanged from before, `br` can
still be resolved as a tag `refs/tag/br`.
This doesn't change the way @git branches are stored in `git_refs` as opposed
to inside `BranchTarget` like normal remote-tracking branches. There are
subtle differences in behavior with e.g. `jj branch forget` and I'm not sure
how easy it is to rewrite `jj git import/export` to support a different
way of storage.
I've decided to call these "local-git tracking branches" since they track
branches in the local git repository. "local git-tracking" branches sounds a
bit more natural, but these could be confused with there are no remote
git-tracking branches. If one had the idea these might exist, they would be
confused with remote-tracking branches in the local git repo.
This addresses a portion of #1666
Suppose the operation log is mostly linear, this means "jj op log" iterator
won't look ahead more than one entry.
Another idea is to either add a "generation" number to operation data, or
build index of operations. Since we'll eventually add GC command, I don't
think op index would be required. I think readdir() is good enough to resolve
hex prefix against ~10k entries.
For now, walk_ancestors() is a free function. If we add Repo-like abstraction
over OpStore + OpHeadsStore, this function will probably be migrated there.
The idea is that the DAG can be split at single fork point while walking
chronologically, and run DFS-based topological sort for each sub graph.
This works well for operation log.
We could also build a topo-sort stack while splitting, but we couldn't detect
cycles in that way. It would also be quite expensive on pessimistic cases.
I added a function for updating the description on an existing
transaction. That way we can create the transaction earlier. I'll try
to make `--change` and `--branch` not mutually exclusive next.
Currently, if the user modifies a modify/delete conflict, we always
consider the result resolved. That happens because we materialize the
missing side of the conflict as an empty string but when we parse the
conflict, we expect only the number of sides in the input
conflict. For example, if the input is a regular modify/delete
conflict with one remove and one add, the materialized markers will
have one remove and two adds (one of them empty), but when we try to
parse it, we expect one remove and only one add. When we fail to parse
it, we consider it resolved.
This commit fixes the bug by using
`conflicts::Conflict<Option<TreeValue>>` and keeping track of which
sides were supposed to be empty. We could have fixed the bug without
switching to `conflicts::Conflict`, but we want to switch anyway, and
the fix happens naturally when switching.
For support for tree-level conflicts (#1624), I'm probably going to
introduce a `MergedTree` type representing a set of trees to
merge. That will be similar to `Tree`, but instead of having values of
type `TreeValue`, it will have values that can represent a single
state or a conflict. The `TreeValue` type itself will eventually lose
its `Conflict` variant.
To prepare for that, this commit introduces a `Conflict<T>` type. That
type is intended to be close to what the future
`MergedTree::path_value()`, `MergedTree::entries()`, etc. The next few
commits will replace most current uses of `backend::Conflict` by this
new `conflicts::Conflict` type. They will use `Option<TreeValue>` as
type parameter. Unlike the current `backend::Conflict` type, the
explicit tracking of `None` values will let us better preserve the
ordering and tying it to the tree-level conflict's order.
Suppose many override entries share the same parent directories, it should
be cheaper to look up the tree_cache from leaf than root. I also think
recursion is easier to follow than for loop.
I made a typo and got something like this:
```
Error: Commit or change id prefix "wl" is ambiguous
```
Since we can tell commit ids from change ids these days, let's make
the error message say which kind of id it is. Changing that also kind
of forced me to make a special error for empty strings. Otherwise we
would have to arbitrarily say that an empty string is a commit id or
change id. A specific error message for empty strings seems helpful,
so that's probably for the better anyway.
This prepares for allowing the base tree to be a conflict at the
root-tree level (#1624).
We could remove the `Conflict` variant completely. I tried doing that
and it slowed down `jj diff` by ~3% in the Linux repo with a clean
working copy with only mtime bumped on all files.
I don't know why I made it return an owned value. It seems like an
unnecessary restriction that the value implements `Clone`, so let's
return a reference instead.
We can't get rid of the other "impl Index"es because .as_composite() must
return a real reference type. Maybe we could turn CompositeIndex into an
owned wrapper, but I don't know if that would be worth the effort.
It might sound scary to add public .mutable_index() accessor, but I think
it's okay because immutable MutableIndex reference has no more power than
Index.
This allows us to implement Index for lifetime-bound type such as
CompositeIndex<'_>.
The idea is that .as_composite() is equivalent to .as_index(), but for the
implementation type. I'm going to add "impl Index for CompositeIndex" to
clean up index references passed to revset engine.
This handles the basic case of where the matcher says that a whole
subtree is not matched. In the Linux repo, That's already enough to
speed up `jj --ignore-working-copy files samples` from 298 ms to 129
ms.
Note that one test changed because the new `trivial_merge()` is more
strict than the old algorithm. I don't think that's a problem because
5-way conflicts are not very common, and I prefer to be strict now and
possibly relax it later if we decide that we would prefer that.
All call paths already check before calling the function that the
condition is true. One caller - `tree::try_resolve_file_conflict()` -
checks it itself. The other caller -
`conflicts::materialize_merge_result()` - doesn't, but its callers
have checked it via `extract_file_conflict_as_single_hunk()`.
The deleted comment about empty strings seems to be obsolete since
e48ace56d1. The caller pads the inputs with empty strings since that
commit.
I think we should ideally change this function's signature to make it
impossible to call it with bad inputs, and I hope to get back to that
soon.
We already resolve merge conflicts between hunks, trees, and refs, and
maybe more. They each have their own code for the handling trivial
merges (where the output is equal to one of the inputs). They look
surprisingly different. This commit adds a generic function for doing
that. Curiously, this new implementation uses implements it in yet
another way (basically using a multi-set).
I've added a helper function because the construction of the range expression
is a bit noisy. It could be a Repo method, but I don't want to make it a
default implementation of the trait method.
revset::walk_revs() let the caller handle RevsetEvaluationError since the
evaluation engine may error out even with such a trivial query. For now, most
callers just .unwrap() the error as before.
Otherwise, "jj init --git-repo ." would create extra table files per commit,
and merge them.
I considered adding an explicit GitBackend method to be called from
git::import_refs(), but the call order matters. The method should be invoked
before calling store.get_commit(..) or mut_repo.add_head(..). Since commits
are likely to be loaded from the head, we can instead make read_commit()
import ancestor metadata at all.
Alternatively, we could make a Git commit hidden until it's inserted into
the extra table. It's rather big change, and I wouldn't like to do that
without thinking more thoroughly.
I'm going to extract a helper function that converts git2::Commit to
backend::Commit struct, and the commit id can also be obtained from the
git2::Commit object.
My first attempt was to fix up corrupted index when merging, but it turned
out to be not easy because the self side may contain corrupted data. It's
also possible that two concurrent commit operations have exactly the same
view state (because change id isn't hashed into commit id), and only the
table heads diverge.
#924
GitBackend will reuse this lock to not assign multiple change ids to a
single commit. We could add a separate lock file that covers the section
from get_head() to save_table(), but I think reusing the table lock is good
enough.
This bug concerns the way `import_refs` that gets called by `fetch` computes
the heads that should be visible after the import.
Previously, the list of such heads was computed *before* local branches were
updated based on changes to the remote branches. So, commits that should have
been abandoned based on this update of the local branches weren't properly
abandoned.
Now, `import_refs` tracks the heads that need to be visible because of some ref
in a mapping keyed by the ref. If the ref moves or is deleted, the
corresponding heads are updated.
Fixes#864
Now that we return the written commit from `write_commit()`, let's
make the timestamps match what was actually written, accounting for
the whole-second precision and the adjustment we do to avoid
collisions.
The internal backend at Google doesn't let you write any value you
want for in the committer field. The `Store` type still caches the
value it attempted to write, which gets a little weird when the
written value is not what we tried to write. We should use the value
the backend actually wrote. However, we don't know if the backend
changed anything without reading the value back, which is often
wasteful. This commit changes the API to return the written value.
I only changed the signature of `write_commit()` for now. Maybe we
should make a similar change to `write_tree()`.
This has several advantages:
* Makes it possible to downcast to non-Git custom backends (might be
useful at Google, but we haven't needed it yet)
* Lets us access more specific functionality on the `GitBackend`,
making it possible to access the `git2::Repository` without
creating a copy of it.
* Removes the dependency on Git from the backend
In large repos, the unique prefixes can get somewhat long (~6 hex
digits seems typical in the Linux repo), which makes them less useful
for manually entering on the CLI. The user typically cares most about
a small set of commits, so it would be nice to give shorter unique ids
to those. That's what Mercurial enables with its
`experimental.revisions.disambiguatewithin` config. This commit
provides an implementation of that feature in `IdPrefixContext`.
In very large repos, it can also be slow to calculate the unique
prefixes, especially if it involves a request to a server. This
feature becomes much more important in such repos.
I would like to copy Mercurial's way of abbreviating ids within a
user-configurable revset. We would do it for both commit ids and
change ids. For that feature, we need a place to keep the set of
commits the revset evaluates to. This commit adds a new
`IdPrefixContext` type which will eventually be that place. The new
type has functions for going back and forth between full and
abbreviated ids. I've updated the templater to use it.
When creating `RevsetExpression` programmatically, I think we should
use commit ids instead of symbols in the expression. This commit adds
a check for that by using a `SymbolResolver` that always errors
out.