Commit graph

2244 commits

Author SHA1 Message Date
Martin von Zweigbergk
120115a20d cli: pass MaterializedTreeValue into git_diff_part()
Just a little preparation for reading the materialized values
concurrently.
2023-11-10 04:54:47 -08:00
Waleed Khan
a60733f632 tree: remove unsafe with ouroboros for self-referential iterators 2023-11-09 21:50:29 -08:00
Yuya Nishihara
6ff3a4f3df repo: reimplement DirtyCell without using unsafe
While the safe implementation is a bit more complex (and probably more branchy),
I don't think the runtime overhead would matter here. Let's remove one more
unsafe for better code maintainability.
2023-11-10 07:42:45 +09:00
Martin von Zweigbergk
9b24d24612 conflicts: add another helper for materializing a tree value
We have a few places where we have a `MergedTreeValue` and need to
read the data associated with it so we can write to the working copy
or include it in a diff. Let's extract some of that shared logic to a
function so we can reuse it. I plan to use it for reading file
contents in advance while streaming a diff in `local_working_copy`
soon (and probably in `jj diff` thereafter), but I think it seems like
an improvement on its own.
2023-11-08 21:21:38 -08:00
Martin von Zweigbergk
65bd5cacba working copy: on checkout, move read from store out of write_*() functions
I'd like to read N files ahead from the backend, to avoid serializing
too many server calls on backends that are backed by a server. Moving
the reads a little earlier is a little step towards that.

The `TreeState::write_*()` functions can now be made into free/static
functions if we prefer.
2023-11-08 21:21:38 -08:00
Yuya Nishihara
084b99e1e2 index: rewrite CompositeIndex::entry_by_pos() by leveraging ancestors iterator
We no longer have "unsafe" in this function, so let's use the iterator API
instead of recursion. Apparently I haven't pushed this change before because
unsafe in .find_map() looked scary.
2023-11-08 12:09:33 +09:00
Anton Bulakh
d27351b978 misc: drop a few low-hanging unsafes
Remove a couple of unnecessary unsafes:
- The NonZeroUsize is a constant where the unwrap will optimize away
anyway and we don't have an unsafe without any good reason there :)
- The other two were simply not needed, lifetimes worked fine, maybe
Rust became better since that code was written? NLL? Anyway, they're
gone now
2023-11-08 02:16:08 +02:00
Yuya Nishihara
2ac9865ce7 revset: exclude @git branches from remote_branches()
As discussed in Discord, it's less useful if remote_branches() included
Git-tracking branches. Users wouldn't consider the backing Git repo as
a remote.

We could allow explicit 'remote_branches(remote=exact:"git")' query by changing
the default remote pattern to something like 'remote=~exact:"git"'. I don't
know which will be better overall, but we don't have support for negative
patterns anyway.
2023-11-08 07:34:30 +09:00
Yuya Nishihara
59640496aa cargo: sort dependencies list alphabetically 2023-11-07 23:46:05 +09:00
Yuya Nishihara
d1b0c4cc48 merge: relax input type of Merge::from_removes_adds() 2023-11-07 17:10:12 +09:00
Yuya Nishihara
e0c35684af merge: rename Merge::new() to Merge::from_removes_adds()
Since (removes, adds) pair is no longer the canonical representation of Merge,
the name Merge::new() seems too generic. Let's give more verbose name.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
2c128f1b61 merged_tree: convert from legacy conflicts through interleaved list
This is basically the same change as the previous commit.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
a734f46130 merged_tree: build unresolved Merge<Tree> from interleaved list
We no longer need to iterate removes and adds separately.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
dd26b7be40 merge: add Merge constructor that accepts interleaved values
Also migrated some callers of 3-way merge, where [left, base, right] order
looks okay.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
803b41c426 merge: load legacy Merge values without allocating intermediate buffers 2023-11-07 17:10:12 +09:00
Yuya Nishihara
09987c1d27 merge: micro-optimize allocation of Merge object for resolved value
It's super common that a Merge object holds a resolved value, so let's inline
up to 1 element. T of Merge<T> usually consists of a couple of pointer-sized
fields. I don't see any measurable speed up, but it's no worse than the
original.
2023-11-07 17:10:12 +09:00
Martin von Zweigbergk
1140295829 merged_tree: extract polling of tree futures into a function 2023-11-07 00:03:50 -08:00
Martin von Zweigbergk
c77417d4e4 merged_tree: drop outer loop in TreeDiffStreamImpl::poll_next()
As suggested by Yuya. I also added a comment and an assertion in the
case where return `Poll::Pending`.
2023-11-07 00:03:50 -08:00
Martin von Zweigbergk
d989d4093d merged_tree: let backend influence whether to use new diff algo
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
f40adb84fc merged_tree: add a Stream for concurrent diff off trees
When diffing two trees, we currently start at the root and diff those
trees. Then we diff each subtree, one at a time, recursively. When
using a commit backend that uses remote storage, like our backend at
Google does, diffing the subtrees one at a time gets very slow. We
should be able to diff subtrees concurrently. That way, the number of
roundtrips to a server becomes determined by the depth of the deepest
difference instead of by the number of differing trees (times 2,
even). This patch implements such an algorithm behind a `Stream`
interface. It's not hooked in to `MergedTree::diff_stream()` yet; that
will happen in the next commit.

I timed the new implementation by updating `jj diff -s` to use the new
diff stream and then ran it on the Linux repo with `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by
~20%, from ~750 ms to ~900 ms. Maybe we can get some of that
performance back but I think it'll be hard to match
`MergedTree::diff()`. We can decide later if we're okay with the
difference (after hopefully reducing the gap a bit) or if we want to
keep both implementations.

I also timed the new implementation on our cloud-based repo at
Google. As expected, it made some diffs much faster (I'm not sure if
I'm allowed to share figures).
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
9af09ec236 test_meregd_tree: test diffing with a matcher
We didn't have any tests at all for `MergedTree::diff()` with a
matcher other than `EverythingMatcher`. This patch adds a few.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
16aa8e8f10 test_merged_tree: nest each part of test_diff_dir_file()
I'm about to add a few more checks for diffing with a matcher. I think
it will help make it readable and reduce the risk of mixing up
variables between each part of the test if we use some nested blocks.

I also removed some unnecessary `.clone()` calls while at it.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
c9ce80a82a merged_tree: extract function for merged iterator of basenames in diff
I'm going to reuse this for stream/async diffing.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
b72f04ba61 merged_tree: rename all_tree_conflict_names() since it's not about conflicts 2023-11-06 23:12:02 -08:00
Yuya Nishihara
3fddc31da8 merge: remove Merge::take() which is no longer used
Merge::take() is no longer a cheap function. We can add into_vec() if needed.
2023-11-07 06:52:35 +09:00
Yuya Nishihara
92dfe59ade refs: run non-trivial merge of ref targets without destructuring Merge object 2023-11-07 06:52:35 +09:00
Yuya Nishihara
93601541cb refs: use swap_remove() in non-trivial merge of ref targets
I'm going to add a Merge method that removes negative/positive terms pair, and
swap_remove() is the easiest option. The order of the conflicted ref targets
doesn't matter.
2023-11-07 06:52:35 +09:00
Yuya Nishihara
895bbce8c0 files: use borrowed Merge iterator in merge()
Since the underlying Merge data type is no longer (Vec<T>, Vec<T>), it doesn't
make sense to build removes/adds Vecs and concatenate them.
2023-11-07 06:52:35 +09:00
Yuya Nishihara
f1898a31b5 merge: simply print interleaved conflict values in debug output
We could apply that for the resolved case, but Resolved/Conflicted label
seems more useful than just printing Merge([value]).
2023-11-06 07:21:06 +09:00
Yuya Nishihara
b07b370ed3 merge: simply generate content hash from interleaved values 2023-11-06 07:21:06 +09:00
Yuya Nishihara
46ffb2f0b2 merge: store negative/positive terms internally in an interleaved Vec
Many callers use interleaved iterators, and recently-added serialization code
is built on top of that, so I think it's better to store terms in that format.

map() functions no longer use MergeBuilder as we know the mapped values are
ordered properly. flatten() and simplify() are reimplemented to work with the
interleaved values. The other changes are trivial.
2023-11-06 07:21:06 +09:00
Yuya Nishihara
287728fee7 merge: extract trivial_merge() that takes interleaved adds/removes iterator
The Merge type will store interleaved terms instead of separate adds/removes
vecs.
2023-11-06 07:21:06 +09:00
Yuya Nishihara
01523ba4f3 merge: rewrite bottom half of trivial_merge() for non-copyable types
The input values of trivial_merge() will be changed to Iterator<Item = T>
where T: Eq + Hash. It could be <Item = &'a T>, but it doesn't have to be.
2023-11-06 07:21:06 +09:00
Martin von Zweigbergk
7c923514ee git: add config to disable abandoning of unreachable commits
Some users prefer to have commits not get abandoned when importing
refs. This adds a config option for that.

Closes #2504.
2023-11-05 06:10:54 -08:00
Martin von Zweigbergk
7bf8906f9c git: extract a function for abandoning unreachable commits
This motivation for this is so we can easily skip calling the function
if the user has opted out of the propagation of abandoned commits we
usually do (#2504). However, it seems like a good piece of code to
extract regardless of that feature.
2023-11-05 06:10:54 -08:00
Yuya Nishihara
d9fbf21794 merge: have Merge::adds()/removes() return iterator
The Merge type will be changed to store interleaved values internally.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
1c6913d618 merge: use Merge::iter() instead of adds()/removes() where order doesn't matter
Merge::iter() will be a slice::Iter, and be more efficient than chaining adds
and removes.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
99e6ff493a merge: fix copy-paste error in doc comment for adds() 2023-11-05 16:43:06 +09:00
Yuya Nishihara
f6d85c51cd merge: add non-optional Merge accessor to the zeroth value
We have a few callers which just need to obtain an object common among all
the merge values. Let's add a non-failing accessor for that purpose.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
b12c688ea0 merge: add method for indexed adds/removes access
The current adds()/removes() will be changed to return iterators.
2023-11-05 16:43:06 +09:00
Martin von Zweigbergk
6a5615c933 rewrite: use MergedTree::diff_stream() when restoring from tree 2023-11-04 21:07:49 -07:00
Yuya Nishihara
602b44258e workspace: add function that initializes colocated git repository
One less git2 API use in CLI.

The function name GitBackend::init_colocated() is a bit odd, but we need to
specify the work-tree path, not the ".git" repo path. So we can't eliminate
the notion of the working copy path anyway.
2023-11-05 08:48:35 +09:00
Yuya Nishihara
77e16243d6 tests: assert paths of initialized GitBackend 2023-11-05 08:48:35 +09:00
Yuya Nishihara
ce46c10c96 git_backend: extract inner function that initializes backend with open git repo 2023-11-05 08:48:35 +09:00
Yuya Nishihara
dce640aaf1 workspace: one less cloning of workspace_root in init_external_git()
Just a trivial code cleanup.
2023-11-05 08:48:35 +09:00
Yuya Nishihara
c866b4a42d workspace: fix repository path in init_internal_git() doc comment
Also rephrased "Git backend" as "Git repo" since the new backend storage will
be created.
2023-11-05 08:48:35 +09:00
Antoine Cezar
5973ab47b9 commands: move rebase_to_dest_parent to jj_lib::rewrite
What make rebase_to_dest_parent a good candidate for jj_lib::rewrite module:

- It is used both in obslog and interdiff. It's a sign that it may be moved to a lower layer
- CommandError is returned by converting from TreeMergeError. Not explicitly.
- It only use jj_lib::rewrite fonctions.
2023-11-03 20:48:00 +01:00
Martin von Zweigbergk
904c37d36d working copy: use MergedTree::diff_stream()
This will make it a little faster to update the working copy at Google
once we've made `MergedTree::diff_stream()` fetch trees
concurrently. (It only makes it a little faster because we still fetch
files serially.)
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
72245cfac5 merged_tree: add Stream-based version of diff(), delegating for now
I'm going to implement a `Stream`-based version optimized for
high-latency (RPC-based) commit backends. So far, that implementation
is about 20% slower in the Linux repo when running `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. I think that's almost
only because the algorithm is different, not because it's async per
se.

This commit adds a `Stream`-based version of `MergedTree::diff()` that
just wraps the regular iterator in stream. I updated `jj diff` to use
it. I couldn't measure any difference on the command above in the
Linux repo. I think that means we can safely use the same
`Stream`-based interface regardless of backend, even if we end up
needing two different implementations of the `Stream`. We would then
be using the wrapped iterator from this commit for local backends, and
the new implementation for remote backends. But ideally we can make
the remote-friendly implementation fast enough that we don't need two
implementations.
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
24b706641f async: switch to pollster's block_on()
During the transition to using more async code, I keep running into
https://github.com/rust-lang/futures-rs/issues/2090. Right now, I want
to convert `MergedTree::diff()` into a `Stream`. I don't want to
update all call sites at once, so instead I'm adding a
`MergedTree::diff_stream()` method, which just wraps
`MergedTree::diff()` in a `Stream. However, since the iterator is
synchronous, it needs to block on the async `Backend::read_tree()`
calls. If we then also block on the `Stream` in the CLI, we run into
the panic.
2023-11-03 08:15:10 -07:00