There's a comment saying that mutating the file's current state
simplifies later logic, but I don't think that's true. It might have
been true in the past, when we had `FileType::Conflict`.
When snapshotting a conflict, we read the contents and parse conflict
markers. If we determine that it's no longer a conflict, then we write
a normal file entry to the tree instead. When we do that, we re-read
the file from the working copy. Let's instead write the file contents
we already read.
When a file's mtime has changed on disk, we update our record of that
mtime, but we did so in a separate place for conflicts compared to
non-conflicts. Let's reuse it.
The working copy's current tree tracks whether a file is a
conflict. We also track that in the `TreeState` object. That allows us
to not read the trees object to decide if we should try to parse a
file as a conflict.
One disadvantage is that it's redundant information that needs to be
kept in sync with the tree object. Also, for Watchman, we would like
to completely ignore the persisted `FileState`.
This commit removes the `FileType::Conflict` variant and instead
checks in the tree object whether a given path was a conflict. This is
the change I mentioned in dc8a207737. We still skip the check
completely if the file's mtime etc. is unchanged, so it shouldn't have
much effect in the common case of a mostly unchanged working copy. I
measured a slowdown on `jj diff` by ~3% in the Linux repo with a clean
working copy with all mtimes bumped. I think the simpler code and
reduced risk of subtle bugs is worth the performance hit.
The idea is simple. New heads are ignored until the node dependency resolution
stuck. Then, only the first head that will unblock the visit will be queued.
Closes#242
The original idea was similar to Mercurial's "topo" sorting, but it was bad
at handling merge-heavy history. In order to render merges of topic branches
nicely, we need to prioritize branches at merge point, not at fork point.
OTOH, we do also want to place unmerged branches as close to the fork point
as possible. This commit implements the former requirement, and the latter
will be addressed by the next commit.
I think this is similar to Git's sorting logic described in the following blog
post. In our case, the in-degree walk can be dumb since topological order is
guaranteed by the index. We keep HashSet<CommitId> instead of an in-degree
integer value, which will be used in the next commit to resolve new heads as
late as possible.
https://github.blog/2022-08-30-gits-database-internals-ii-commit-history-queries/#topological-sorting
Compared to Sapling's beautify_graph(), this is lazy, and can roughly preserve
the index (or chronological) order. I tried beautify_graph() with prioritizing
the @ commit, but the result seemed too aggressively reordered. Perhaps, for
more complex history, beautify_graph() would produce a better result. For my
wip branches (~30 branches, a couple of commits per branch), this works pretty
well.
#242
This partially reverts 4c8f484278 "graphlog: key by commit id (not index
position)." As Martin pointed out, it made "log -r 'tags()' -T.." in git
repo super slow. Apparently, both clone() and hash map insertion/lookup costs
increased by that change. Since we don't need CommitId inside the graph
iterator, we can simply replace it with IndexPosition, and resolve it to
CommitId later.
Copied from MergedTree::has_conflict(). I feel it's slightly better since
RefTarget "is" always a Conflict-based type. It could be inverted to
RefTarget::is_resolved(), but refs are usually resolved, and all callers
have special case for !is_resolved() state.
`MergedTree` is now ready to be used when checking if a commit has
conflicts, and when listing conflicts. We don't yet a way for the user
to say they want to use tree-level conflicts even for these
cases. However, since the backend can decide, we should be able to
have our backend return tree-level conflicts. All writes will still
use path-level conflicts, so the experimentation we can do at Google
is limited.
Beacause `MergedTree` doesn't yet have a way of walking conflicts
while restricting it by a matcher, this will make `jj resolve` a
little slower. I suspect no one will notice.
Tree-level conflicts (#1624) will be stored as multiple trees
associated with a single commit. This patch adds support for that in
`backend::Commit` and in the backends.
When the Git backend writes a tree conflict, it creates a special root
tree for the commit. That tree has only the individual trees from the
conflict as subtrees. That way we prevent the trees from getting
GC'd. We also write the tree ids to the extra metadata table
(i.e. outside of the Git repo) so we don't need to load the tree
object to determine if there are conflicts.
I also added new flag to `backend::Commit` indicating whether the
commit is a new-style commit (with support for tree-level
conflicts). That will help with the migration. We will remove it once
we no longer care about old repos. When the flag is set, we know that
a commit with a single tree cannot have conflicts. When the flag is
not set, it's an old-style commit where we have to walk the whole tree
to find conflicts.
With `MergedTree`, we can iterate over conflicts by descending into
only the subdirectories that cannot be trivially resolved. We assume
that the trees have previously been resolved as much as possible, so
we don't attempt to resolve conflicts again.
This adds a function for resolving conflicts that can be automatically
resolved, i.e. like our current `merge_trees()` function. However, the
new function is written to merge an arbitrary number of trees and, in
case of unresolvable conflicts, to produce a `Conflict<TreeId>` as
result instead of writing path-level conflicts to the backend. Like
`merge_trees()`, it still leaves conflicts unresolved at the file
level if any hunks conflict, and it resolves paths that can be
trivially resolved even if there are other paths that do conflict.
In order to store conflicts in the commit, as conflicts between a set
of trees, we want to be able merge those trees on the fly. This
introduces a type for that. It has a `Merge(Conflict(Tree))` variant,
where the individual trees cannot have path-level conflicts. It also
has a `Legacy(Tree)` variant, which does allow path-level conflicts. I
think that should help us with the migration.
Alternatively, we can wrap BTreeMap<String, Option<RefTarget>> to flatten
Option<&Option<..>> internally, but doing that would be tedious. It would
also be unclear if map.remove(name) should construct an absent RefTarget if
the ref doesn't exist.
The next commit will change these maps to store Option<RefTarget> entries, but
None entries will still be omitted from the serialized data. Since ContentHash
should describe the serialized data, relying on the generic ContentHash would
cause future hash conflict where absent RefTarget entries will be preserved.
For example, ([remove], [None, add]) will be serialized as ([remove], [add]),
and deserialized to ([remove], [add, None]). If we add support for lossless
serialization, hash(([remove], [None, add])) should differ from the lossy one.
Summary: Now that we have Rust 1.71.0 at our fingertips, the `map_first_last`
feature has been stabilized. That means we can get rid of the `jj-lib` build
script and also the `nightly_shims` module.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Ibb5ce3258818a2de670763fbbaf3c2e7
Since RefTarget will be reimplemented on top of Conflict<Option<CommitId>>,
we won't be able to simply return a slice of type &[CommitId]. These functions
are also renamed in order to disambiguate from Conflict::adds()/removes().
.unwrap() calls will be removed if we migrate RefTarget to new Conflict-based
type. Some of them were previously .unwrap_or_default(), but they would panic
anyway inside ref_target_from_proto().
is_none()/is_some() are wrapped as is_absent()/is_present() respectively.
If we see new RefTarget as a Conflict, an "absent" target isn't an emptyish
value like None, but an add of single [None]. It seemed a bit weird to call
such target is_none().
RefTarget type will be a wrapper around Conflict<Option<CommitId>>, and the
match arms would be more verbose. Fortunately, new code looks simpler since
we can merge non-conflicting cases.
It's named after Conflict::from_legacy_form(). If RefTarget is migrated to
new Conflict type, from_legacy_form([], [add]) will create a normal target,
and from_legacy_form([], []) will be equivalent to the current None target.
That's why this function isn't named as RefTarget::conflict().
If RefTarget is migrated to new Conflict type, this function will create
RefTarget(Conflict::resolved(Some(id))).
We still need some .unwrap() to insert Option<RefTarget> into map, but maps
will be changed to store new RefTarget type, and their mutation API will
guarantee that Conflict::resolved(None) is eliminated.
Perhaps, all view.get_<kind>(name) functions can return reference, but I'm
not willing to change the interface at this point. I'll revisit this after
migrating Option<RefTarget> to new Conflict-based type.
If we migrate RefTarget to new Conflict-based type, it won't store
Conflict<CommitId>, but Conflict<Option<CommitId>>. As the Option will
be internalized, new RefTarget type will also represent an absent target.
The 'target: Option<RefTarget>' argument will be replaced with new RefTarget
type.
I've also renamed the function for consistency with the following changes.
It would be surprising if set_local_branch(name, target) could remove the
branch. I feel the name set_local_branch_target() is less confusing.