This commit is a little out of place in this sequence, but
it seems to make more sense for MutRepo to own these maps.
@yuja [pointed out] that any tests written using `create_descendant_rebaser` now
need to do this cleanup, but there are no longer any such tests after the
previous commits and a follow-up commit removes `create_descendant_rebaser`
entirely.
[pointed out]: https://github.com/martinvonz/jj/pull/2737#discussion_r1435754370
This removes uses of `DescendantRebaser::new` or
`MutRepo::create_descendant_rebaser` from most tests. The exceptions are the
tests having to do with abandoning empty commits on rebase, since adjusting
those is a bit more elaborate (see follow-up commits).
A possible use case is when doing some archaeology around a certain operation.
The current implementation is quadratic if + is repeated. Suppose op_id is
usually close to the current op heads, I think it'll practically work better
than building a reverse lookup table.
This removes CommandError dependency from these resolution functions. We might
want to refactor the error types again if we introduce a real "opset" evaluator.
The error message for unresolved op heads now includes "@" instead of the whole
expression.
The resolver callback usually returns wider error type, which I don't think
is a variant of OpHeadResolutionError.
To help type inference, resolver's error type is E, not E1 where E: From<E1>.
If the commit backend has high latency, it can make a big difference
to read files concurrently. This patch updates the working copy code
to do that in the update code (when reading files from the backend to
write to the working copy). Because our backend at Google reads files
from a local daemon process that already does a lot of prefetching,
this patch doesn't actually help us. I think it's still the right
thing to do for backends that don't do the same kind of
prefetching. It speeds up `jj sparse set --add` by >10x when I disable
the prefetching in our daemon (our `Backend::concurrency()` is 100).
My jj repo contains such head commits, and "jj debug reindex" fails. To address
this problem, we'll probably need to implement GC, and the user will discard
operations before the first bad op id.
I'm going to add a public method that rebuilds index, and its return type will
be different. I also added "build_" because "index" could be misinterpreted
as noun.
The method arguments are reordered to follow the public IndexStore interface.
The error types are shared with the commit store backend. We could add per-store
error types, but it's unlikely that the caller needs to discriminate them.
I think the idea behind `handle_ancestor_ops()` was to let our backend
at Google delegate the work to the server, which could then avoid
walking ancestors. However, we're now thinking that we're going to
make our server resolve divergent operations on its own instead, so
the client will never see more than one op head, unless it manually
creates the second op head itself (e.g. because the user ran two
concurrent commands). In those cases it should be fine to do the
walk. So let's simplify the trait by removing the function.
Consider how one would implment the current `OpHeadsStore` interface
for a cloud-based backend. After `OpHeadsStore::add_op_head()` is
called, the set of op heads temporarily contains two heads (typically)
until `OpHeadsStore::remove_op_head()` is called. That's not invalid,
but it's annoying to have to deal with that state more than
necessary. Also, it's unnecessarily inefficient to send the addition
and removal of op heads as separate RPCs. This patch therefore adds a
`update_op_heads()` method that takes a list of old heads to remove
and a single new head to add. Coming patches will start migrating to
that method.
We move `.jj/repo/op_heads/*` into `.jj/repo/op_heads/heads/*` almost
a year ago, in commits 90a66ec262 and 37ba17589d. We said we would
drop support for it in 0.9+. I think we said that before we started
doing monthly releases, but I we're still past the goal of 6 months
(which is what I think we were aiming for).
I'll probably add change id lookup methods to CompositeIndex. The Index trait
won't gain resolve_change_id_prefix(), but I also renamed its resolve_prefix()
for consistency.
Since Readonly/MutableIndexSegment no longer implement Index trait, there's
no ambiguity between segment-local and index-global operations. Let's shorten
the method names.
I'm thinking of changing some IndexSegment methods to return LocalPosition
instead of global IndexPosition, and using u32 there would be a source of bugs.
Both readonly and mutable segments know the commit ids to return, and the
caller only needs the ids. Since segment_commit_id(local_pos) scans the graph
entries, doing that would increase the chance of cache miss.
This requires creating a new public API as a substitute. I took the opportunity
to also add some comments to the
`MutRepo::record_rewritten_commit`/`record_abandoned_commit` functions.
I imade the simplest possible addition to the API; it is not a very elegant
one. Eventually, the entire `record_rewritten_commit` API should probably be
refactored again.
I also added some comments explaining what these functions do.
Spotted while adding error propagation there. This wouldn't likely be a real
problem because "jj debug reindex" removes all of the operation links.
The "} else {" condition is removed because it doesn't make sense to exclude
only the exact parent_op_id operation. This can be optimized to not walk
ancestors of the parent_op_id operation, but I don't see a motivation to add
tests covering such scenarios. It's pretty rare that an intermediate operation
link is missing.
There are many unit tests that call mutable_segment.save_in(), but I don't
think these callers expect that the segment file could be squashed depending
on the size. Let's make it caller's responsibility.
maybe_squash_with_ancestors() should be cheap if segment_num_commits() == 0,
so it's okay to call it before checking the emptiness.
I think "for" loop is easier to follow. Maybe it could be rewritten further to
.find_map() loop, but that would be too clever.
I also made ancestor_index_segments() pub(super) since it doesn't make sense
to only provide ancestor_files_without_local().
If the error is permanent (because the repo predates the no-gc-ref fix for
example), there's no easy way to recover. Still, panicking in this function
seems wrong.
This is the last use of Read/WriteBytesExt. The byteorder crate is great, but
we don't need an abstraction of endianness. Let's simply use the std functions.
If read_exact() or read_u32() reached to EOF, the index file should be
considered corrupted. File not found error is also treated as data corruption
because an invalid file name could be read from the child segment file. It
can't handle special file names like "..", though.
Index file name also applies to io::Error. New error type reuses io::Error to
represent data corruption. We could add an inner Corrupt|Io enum instead, but
we'll need to remap some io::Error variants (e.g. UnexpectedEof) to Corrupt
anyway.
The idea is that we don't have to reload parent files as we already have the
chain of the parent segments. The resulting readonly index will be constructed
from the loaded parent segments + local entries blob.
I thought IndexLoadError and DefaultIndexStoreError would represent "load" and
"store" failures respectively, but they aren't. Actually, DefaultIndexStoreError
is the store-level error, and IndexLoadError should be wrapped in it.
Since OpStoreError can also include io::Error, it doesn't make much sense to
have Io variant at this level. Let's split it to context-specific errors, and
extract helper method that maps io::Error.
As far as I can see in the chat, there's no objection to changing the default,
and git.auto-local-branch = false is generally preferred.
docs/branches.md isn't updated as it would otherwise conflict with #2625. I
think the "Remotes" section will need a non-trivial rewrite.
#1136, #1862
PersistError is basically a pair of io::Error and NamedTempFile instance. It's
unlikely that we would want to propagate the open file instance to the CLI
error handler, leaving the temporary file alive.
Just a minor cleanup to remove lifetime parameter from the types. I tried to
reimplement them by using itertools, but I couldn't find a simple way to
encode short-circuiting at the end of either left or right iterator.
We'll probably need a better abstraction, but a separate method is good
enough to remove unsafe code from ReadonlyRepo.
I'm not sure if this is feasible for the other backends, but I guess there
would be less lifetimed variables than DefaultReadonlyIndex.
The idea is that InternalRevset will store a 'static boilerplate function that
borrows an 'index passed by function argument. This way, we can abstract the
index type over Arc<T> and &T without introducing too much generics.
I don't know why the dependabot didn't catch this, but there are things to
fix manually. EntryMode was changed to a u16 wrapper, and the enum was renamed
to EntryKind. Other than that, I don't find anything breaking our codebase.
Perhaps, this is the most controversial part. It could be moved to new
"segment" module (or something like "common"), but I think IndexSegment can be
considered a trait that enables the CompositeIndex abstraction.
Added pub(super) or pub where needed. I won't implement accessor methods on
IndexPositionByGeneration and IndexPosition as they are purely value types,
and protecting the inner values wouldn't make sense.
It seems better to have the caller pass the transaction description
when we finish the transaction than when we start it. That way we have
all the information we want to include more readily available.
Previously, the function relied on both the `self.parent_mapping` and
`self.rebased`. If `(A,B)` was in `parent_mapping` and `(B,C)` was in `rebased`,
`new_parents` would map `A` to `C`.
Now, `self.rebased` is ignored by `new_parents`. In the same situation,
DescendantRebaser is changed so that both `(A,B)` and `(B,C)` are in
`parent_mapping` before. `new_parents` now applies `parent_mapping` repeatedly,
and will map `A` to `C` in this situation.
## Cons
- The semantics are changed; `new_parents` now panics if `self.parent_mapping`
contain cycles. AFAICT, such cycles never happen in `jj` anyway, except for
one test that I had to fix. I think it's a sensible restriction to live with;
if you do want to swap children of two commits, you can call
`rebase_descendants` twice.
## Pros
- I find the new logic much easier to reason about. I plan to extract it into a
function, to be used in refactors for `jj rebase -r` and `jj new --after`. It
will make it much easier to have a correct implementation of `jj rebase -r
--after`, even when rebasing onto a descendant.
- The de-duplication is no longer O(n^2). I tried to keep the common case fast.
## Alternatives
- We could make `jj rebase` and `jj new` use a separate function with the
algorithm shown here, without changing DescendantRebaser. I believe that the new
algorithm makes DescendatRebaser easier to understand, though, and it feels more
elegant to reduce code duplication.
- The de-duplication optimization here is independent of other changes, and
could be used on its own.
into_segment() could be added instead of save_in(), but I decided to wrap
save_in(). save_in() may squash ancestor files, so it could be considered an
index-level operation.
default_index_store.rs is relatively big, and it contains types and impls in
arbitrary order. Let's split them into sub modules. After everything moved,
mod.rs will only contain tests.
I'm going to remove 'index lifetime from InternalRevset so Revset<'static>
can be easily constructed from DefaultReadonlyIndex. As the first step, this
series removes some lifetime complexity from EvaluationContext methods.
We don't need an descendant iterator API, but it helps to add separate function
to collect into HashSet<IndexPosition> instead of returning a pair of
ordered vec and set.
The wrapper type isn't needed for the mutable layer, but this mirrors the
readonly type structure. Test cases are also migrated to be using the index
wrapper so long as we don't have to care for the nesting of the segment files.
The idea is that the ReadonlyIndexSegment is a sub component of the index. The
Index trait could be implemented for any Segment type, but we don't need a
public interface to access sub segment as an index.
I'm going to split the internal Segment types and the public Index types
in order to clarify the layering concept. The public Index types will be
wrappers like DefaultReadonlyIndex.
Strictly speaking, ReadonlyIndexImpl is a segment + parent pointer pair,
but I think calling it a segment is pretty okay. It could be called a
ReadonlyIndexFile, but "File" can't apply to the mutable part.
Gitoxide errors don't implement PartialEq. We could instead stringify the
errors, but there aren't many callers who expect FailedRefExportReason to
be comparable.
Before, an absolute path would be saved in the git_target file if .git is a
symlink. That's not wrong, but seemed a bit weird. Let's consolidate the
behavior across .git file types.
Apparently, libgit2 doesn't deduce "core.bare" config from the directory name,
but gitoxide implements it correctly. So we shouldn't blindly canonicalize
the Git repository path. Fortunately, the saved git_target path isn't a fully-
canonicalized form (unless user explicitly sepcified "--git-repo ./.git"), so
we don't need a hack to remap git_target back to the symlink path.
is_colocated_git_workspace() is adjusted since the git_workdir is no longer
resolved from the fully-canonicalized repo path, at least in our code. Still we
have the ".git/.." fallback because test_init_git_colocated_symlink_gitlink()
would otherwise fail. I haven't figured out why, and the test might be actually
wrong compared to the git CLI behavior, but let's not change that for now.
Fixes#2668
This adds an initial `jj util gc` command, which simply calls `git gc`
when using the Git backend. That should already be useful in
non-colocated repos because it's not obvious how to GC (repack) such
repos. In my own jj repo, it shrunk `.jj/repo/store/` from 2.4 GiB to
780 MiB, and `jj log --ignore-working-copy` was sped up from 157 ms to
86 ms.
I haven't added any tests because the functionality depends on having
`git` binary on the PATH, which we don't yet depend on anywhere
else. I think we'll still be able to test much of the future parts of
garbage collection without a `git` binary because the interesting
parts are about manipulating the Git repo before calling `git gc` on
it.
I think this is a variant of the problem fixed by 7fda80fc22 "tree: simplify
conflict before resolving at hunk level." We need to simplify() the conflict
before and after extracting file ids because the source conflict values may
contain trees to be cancelled out, and the file values may differ only in exec
bits. Since the legacy tree passes a simplified conflict in to this function,
I made the merged tree do the same.
Fixes#2654
Otherwise an empty subtree would be added to the parent tree.
If the stored tree contained an empty subtree, simplify() wouldn't work
against new "absent" subtree representation. I don't know if there's a
such code path, but I believe it's very rare to encounter the problem.
#2654
This basically backs out the change 1b9a3e27e0 "merged_tree: read before/after
trees concurrently." As we decided to add a separate impl for async access, it
doesn't make sense to read before/after pair in parallel.
The async single_tree() is moved to TreeDiffStreamImpl. It will help remove
the sync version when the performance problem is solved.
In snapshot(), changed_file_states are received in arbitrary order. For the
other callers, entries are in diff_stream order, so we don't have to sort
them.
With watchman enabled, we can see the cost of sorting the sorted proto entries.
I don't think this is significant, but we can mitigate it by adding
is_file_states_sorted flag to the proto message if needed:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux files ~/mirrors/linux/no-match"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux files ~/mirrors/linux/no-match
Time (mean ± σ): 164.8 ms ± 16.6 ms [User: 50.2 ms, System: 111.7 ms]
Range (min … max): 148.1 ms … 195.0 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux files ~/mirrors/linux/no-match
Time (mean ± σ): 171.8 ms ± 13.6 ms [User: 61.7 ms, System: 109.0 ms]
Range (min … max): 159.5 ms … 192.1 ms 20 runs
```
Without watchman:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux files ~/mirrors/linux/no-match"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux files ~/mirrors/linux/no-match
Time (mean ± σ): 367.3 ms ± 30.3 ms [User: 1415.2 ms, System: 633.8 ms]
Range (min … max): 325.4 ms … 421.7 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux files ~/mirrors/linux/no-match
Time (mean ± σ): 327.7 ms ± 24.9 ms [User: 1059.1 ms, System: 654.3 ms]
Range (min … max): 296.0 ms … 385.4 ms 20 runs
```
I haven't measured snapshotting against dirty working copy, but I don't think
it would be slower than the original implementation.
I'll replace the current lazy loading mechanism with this. Read-only methods
are implemented on the borrowed type so that we can narrow lookup scope
recursively.
self.contains(other) means that the self tree contains the other tree (i.e.
the self path is prefix of the other), but it could be confused the other way
around if we were thinking about the path literal, not the tree. Let's add
.starts_with() instead by copying the std::path::Path definition.
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().
I'm going to add borrowed RepoPath type, and most from_internal_string()
callers will be migrated to it. For the remaining callers, it makes more
sense to move the ownership of String to RepoPathBuf.
This follows up on 3967f63 (see that commit's description for more
motivation) and e79c8b6.
In a discussion linked below, it was decided that forbidding `-r --skip-empty`
entirely is preferable to the mixed behavior introduced in 3967f63.
3967f637dc (commitcomment-133539911)
RepoPath::from_components() is removed since it is no longer a primitive
function.
The components iterator could be implemented on top of str::split(), but
it's not as we'll probably want to add components.as_path() -> &RepoPath.
Tree walking and tree_states map construction get slightly faster thanks to
fewer allocations and/or better cache locality. If we add a borrowed RepoPath
type, we can also implement a cheap &str to &RepoPath conversion on top. Then,
we can get rid of BTreeMap<RepoPath, FileState> construction at all.
Snapshot without watchman:
```
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux status"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
Time (mean ± σ): 950.1 ms ± 24.9 ms [User: 1642.4 ms, System: 681.1 ms]
Range (min … max): 913.8 ms … 990.9 ms 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
Time (mean ± σ): 872.1 ms ± 14.5 ms [User: 1922.3 ms, System: 625.8 ms]
Range (min … max): 853.2 ms … 895.9 ms 10 runs
Relative speed comparison
1.09 ± 0.03 target/release-with-debug/jj-0 -R ~/mirrors/linux status
1.00 target/release-with-debug/jj-1 -R ~/mirrors/linux status
```
Tree walk:
```
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux files --ignore-working-copy"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux files --ignore-working-copy
Time (mean ± σ): 375.3 ms ± 15.4 ms [User: 223.3 ms, System: 151.8 ms]
Range (min … max): 359.4 ms … 394.1 ms 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux files --ignore-working-copy
Time (mean ± σ): 357.1 ms ± 16.2 ms [User: 214.7 ms, System: 142.6 ms]
Range (min … max): 341.6 ms … 378.9 ms 10 runs
Relative speed comparison
1.05 ± 0.06 target/release-with-debug/jj-0 -R ~/mirrors/linux files --ignore-working-copy
1.00 target/release-with-debug/jj-1 -R ~/mirrors/linux files --ignore-working-copy
```
The added test would fail if paths were purely ordered by concatenated strings.
I'm not sure if we want to preserve the current ordering, but let's not break
it for the moment.
This follows up on @matts1 's #2609.
We still allow the `-r` commit to become empty. I would be more comfortable if
there was a test for that, but I haven't done that (yet?) and it seems pretty
safe. If that's a problem, I'm happy to forbid `-r --skip-empty` entirely,
since it is far less useful than `-s --skip-empty` or `-b --skip-empty`.
I think it is undesired to abandon emptied descendants. As far as descendants
of `A` are concerned, `jj rebase -r A` should be equivalent to `jj abandon A`,
and `jj abandon` does not remove emptied commits. It also doesn't seem very
useful to do that, since I think descendant commits of an abandoned (or moved
with `-r`) commit only become empty in pathological cases.
Additionally, if we did want -r to empty descendants of `A`, we'd have to add
thorough tests and possibly improve the algorithm. I want to refactor `rebase
-r` and add features to it, and having to consider cases of commits becoming
abandoned makes everything harder.
For example, if we have
```
root -> A -> B -> C
```
and `jj rebase -r A -d C` empties commit `B` (or `C`), I do not know whether
the current algorithm will work correctly. It seems possible that it would, but
that depends on the fact that empty merge commits are not abandoned for
descendants. That seems dangerous to rely on without tests.
I hope (but can't promise) that in the near future, making DescendantRebaser
return more information should help make it possible to create such
functionality in a more robust way. I am likely to attempt this as part of
implementing `-r --after`.
This is a step towards introducing a borrowed RepoPath type. The current
RepoPath type is inefficient as each component String is usually short. We
could apply short-string optimization, but still each inlined component would
consume 24 bytes just for e.g. "src", and increase the chance of random memory
access. If the owned RepoPath type is backed by String, we can implement cheap
cast from &str to borrowed &RepoPath type.
Leading/trailing slashes would introduce a bit of complexity if we migrate
the backing type from Vec<String> to String. Empty components are okay, but
let's reject them as they are cryptic and invalid.
It helps to implement transparent conversion from &str to &Wrapped(str). We
could instead wrap the reference as Wrapped<'a>(&'a str), but it has various
drawbacks. Notably we can't implement Borrow and Deref because these traits
require a reference in return position.
Since the unsafe bits are pretty small, we can instead implement cast functions
without using the ref-cast crate. However, I believe we'll trust ref-cast more
than hand-crafted unsafe code.
https://crates.io/crates/ref-casthttps://docs.rs/ref-cast/1.0.20/ref_cast/attr.ref_cast_custom.html
It seems generally useful to optimize revset expressions in
`evaluate_programmatic()` so the caller doesn't have to remember to do
it. It should generally be cheap to do so even if it's often not
needed.
`RevsetExpression::resolve()` is meant for programmatically created
expressions. In particular, it may not contain symbols. Let's try to
clarify that by renaming the function and documenting it.
Hopefully this will fix the unfinished Windows CI issue. A possible scenario
is that recent migration to gitoxide made this test flaky on Windows. For
example, gitoxide might have in-memory object cache that relies on file mtime,
and occasionally fails to detect new object on Windows.
While it got faster to build a large BTreeMap<RepoPath, _>, there's still
a measurable cost. Let's eliminate it if watchman is enabled and the working
copy is clean. Perhaps, we should introduce new serialization format that
supports instant loading and lookup, but this hack works for the moment.
I'm not sure if the new tree_state format should be flat (RepoPath, _) list,
or tree like the backend storage btw.
In my "linux" repo (watchman enabled):
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux status"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
Time (mean ± σ): 768.9 ms ± 14.2 ms [User: 630.7 ms, System: 131.2 ms]
Range (min … max): 742.3 ms … 783.1 ms 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
Time (mean ± σ): 713.0 ms ± 16.8 ms [User: 587.9 ms, System: 116.2 ms]
Range (min … max): 681.5 ms … 731.1 ms 10 runs
Relative speed comparison
1.08 ± 0.03 target/release-with-debug/jj-0 -R ~/mirrors/linux status
1.00 target/release-with-debug/jj-1 -R ~/mirrors/linux status
This ensures that the root fsmonitor_matcher matches nothing if there are no
working-copy changes. The query result can be observed by "jj debug watchman
query-changed-files".
I don't have expertise on watchman query language, but using the watchman API
is probably better than .filter()-ing the result manually.
Suppose the input list is presorted, sorting a sorted vec would be cheaper
than .insert()-ing sorted items one by one.
In my "linux" repo (watchman eanbled):
- jj-0: baseline
- jj-1: previous (don't randomize by HashMap)
- jj-2: this
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
"target/release-with-debug/{bin} -R ~/mirrors/linux status"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
Time (mean ± σ): 1.034 s ± 0.020 s [User: 0.881 s, System: 0.212 s]
Range (min … max): 1.011 s … 1.068 s 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
Time (mean ± σ): 849.3 ms ± 13.8 ms [User: 710.7 ms, System: 199.3 ms]
Range (min … max): 821.7 ms … 870.2 ms 10 runs
Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/linux status
Time (mean ± σ): 786.2 ms ± 16.7 ms [User: 650.7 ms, System: 204.1 ms]
Range (min … max): 760.8 ms … 805.2 ms 10 runs
Relative speed comparison
1.32 ± 0.04 target/release-with-debug/jj-0 -R ~/mirrors/linux status
1.08 ± 0.03 target/release-with-debug/jj-1 -R ~/mirrors/linux status
1.00 target/release-with-debug/jj-2 -R ~/mirrors/linux status
According to the doc, this is compatible with the map syntax.
https://protobuf.dev/programming-guides/proto3/#maps
This change means that the serialized file states are sorted by RepoPath,
so BTreeMap<RepoPath, _> can be reconstructed with fewer cache misses.
In my "linux" repo (watchman enabled):
- jj-0: baseline
- jj-1: this
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
"target/release-with-debug/{bin} -R ~/mirrors/linux status"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
Time (mean ± σ): 1.034 s ± 0.020 s [User: 0.881 s, System: 0.212 s]
Range (min … max): 1.011 s … 1.068 s 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
Time (mean ± σ): 849.3 ms ± 13.8 ms [User: 710.7 ms, System: 199.3 ms]
Range (min … max): 821.7 ms … 870.2 ms 10 runs
Relative speed comparison
1.32 ± 0.04 target/release-with-debug/jj-0 -R ~/mirrors/linux status
1.08 ± 0.03 target/release-with-debug/jj-1 -R ~/mirrors/linux status
Cache-misses got reduced:
% perf stat -e task-clock,cycles,instructions,cache-references,cache-misses \
-- ./target/release-with-debug/jj-0 -R ~/mirrors/linux --no-pager status
1,091.68 msec task-clock # 1.032 CPUs utilized
4,179,596,978 cycles # 3.829 GHz
6,166,231,489 instructions # 1.48 insn per cycle
134,032,047 cache-references # 122.776 M/sec
29,322,707 cache-misses # 21.88% of all cache refs
1.057474164 seconds time elapsed
0.897042000 seconds user
0.194819000 seconds sys
% perf stat -e task-clock,cycles,instructions,cache-references,cache-misses \
-- ./target/release-with-debug/jj-1 -R ~/mirrors/linux --no-pager status
927.05 msec task-clock # 1.083 CPUs utilized
3,451,299,198 cycles # 3.723 GHz
6,222,418,272 instructions # 1.80 insn per cycle
98,499,363 cache-references # 106.251 M/sec
11,998,523 cache-misses # 12.18% of all cache refs
0.855938336 seconds time elapsed
0.720568000 seconds user
0.207924000 seconds sys
The idea is the same as the heads_pos() change in 9832ee205d. While
IndexEntry::position() should be cheap, saving 20 bytes per entry appears to
improve the performance in mid-size repos.
In my "linux" repo:
revsets/all()
-------------
baseline 1.24 156.0±1.06ms
this 1.00 126.0±0.51ms
I don't see significant difference in small-sized repos like "jj" or "git".
IndexEntryByPosition isn't removed since it's still used by the revset engine.
This removes the last use of `ouroboros`. Since `TreeEntriesDirItem`
is only used in "legacy trees" (before tree-level conflicts), I didn't
bother to check the performance impact. I also didn't bother to check
the matcher before adding the entries to the list, instead leaving
that where it is in `Iterator::next()`.
This removes the last use of `ouroboros` in `merged_tree.rs`. The set
of conflicts to iterate is usually so small that I didn't bother
checking the performance impact.
While importing the `ouroboros` crate and the `aliasable` crate it
depends on, the "unsafe Rust reviewer" expressed some concern that
they contain a lot of unsafe code that's hard to review. We can avoid
the unsafe code altogether by making `TreeEntriesIterator` not
self-refential. Instead, we can collect the matching entries in an
individual tree up front. It does have some performance cost:
```
❯ hyperfine --warmup 3 --runs 30 \
'/tmp/jj-before --ignore-working-copy files -r v6.0' \
'/tmp/jj-after --ignore-working-copy files -r v6.0'
Benchmark 1: /tmp/jj-before --ignore-working-copy files -r v6.0
Time (mean ± σ): 461.4 ms ± 14.3 ms [User: 232.1 ms, System: 229.4 ms]
Range (min … max): 443.4 ms … 496.3 ms 30 runs
Benchmark 2: /tmp/jj-after --ignore-working-copy files -r v6.0
Time (mean ± σ): 482.0 ms ± 14.3 ms [User: 257.2 ms, System: 224.9 ms]
Range (min … max): 461.8 ms … 513.3 ms 30 runs
Summary
'/tmp/jj-before --ignore-working-copy files -r v6.0' ran
1.04 ± 0.04 times faster than '/tmp/jj-after --ignore-working-copy files -r v6.0'
```
I think that's acceptable.
This is much faster (maybe because of better cache locality?) Another option
is to use BTreeSet, but the BinaryHeap version is slightly faster.
"bench revset" result in my linux repo:
revsets/heads(tags())
---------------------
baseline 3.28 560.6±4.01ms
1 2.92 500.0±2.99ms
2 1.98 339.6±1.64ms
3 (this) 1.00 171.2±0.30ms
Apparently, IndexEntry::generation_number() isn't cheap probably because it
involves random access to larger memory region, and the u32 value might not
be aligned. Let's instead store the generation numbers in BinaryHeap.
Also, heads_pos() becomes slightly faster by keeping the BinaryHeap entries
small, so I've removed the IndexEntry at all.
This makes the default log and disambiguation revsets fast, which evaluate
'heads(immutable_heads())'.
"bench revset" result in my linux repo:
revsets/heads(tags())
---------------------
baseline 3.28 560.6±4.01ms
1 2.92 500.0±2.99ms
2 (this) 1.98 339.6±1.64ms
All callers just iterate over the parent entries.
"bench revset" result in my linux repo:
revsets/heads(tags())
---------------------
baseline 3.28 560.6±4.01ms
1 (this) 2.92 500.0±2.99ms
For loose refs, uninteresting directories can be just skipped. For packed refs,
gix will have to do binary search for each prefix to find the starting point.
Still it's better overall if the repository contains tons of refs/jj/keep refs.
With my linux repo containing ~5k loose jj refs, this saves ~40ms:
% hyperfine --warmup 3 --runs 10 \
"/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux" \
"/tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux"
Benchmark 1: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux
Time (mean ± σ): 151.6 ms ± 11.4 ms [User: 38.8 ms, System: 111.6 ms]
Range (min … max): 129.8 ms … 159.5 ms 10 runs
Benchmark 2: /tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux
Time (mean ± σ): 109.9 ms ± 11.6 ms [User: 27.5 ms, System: 82.4 ms]
Range (min … max): 89.4 ms … 117.8 ms 10 runs
Gitoxide errors are boxed since there are various error types and they tend
to exceed the clippy size limit.
Apparently, gitoxide is faster than git2:
% hyperfine --warmup 3 --runs 10 \
"/tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux" \
"/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux"
Benchmark 1: /tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux
Time (mean ± σ): 205.4 ms ± 15.7 ms [User: 59.6 ms, System: 144.6 ms]
Range (min … max): 189.7 ms … 223.9 ms 10 runs
Benchmark 2: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux
Time (mean ± σ): 176.2 ms ± 13.7 ms [User: 41.2 ms, System: 134.0 ms]
Range (min … max): 155.4 ms … 186.5 ms 10 runs
If a commit pointed to by HEAD or ref is missing, the ref is considered
invalid and excluded by import_refs(). The current test behavior appears to
depend on some in-memory cache of git2::Repository.
We need to .collect_vec() the parents iterator to temporary buffer since the
borrowed iterator can't be returned back to the dag_walk functions. Another
option is to clone op_store and parent ids to remove &self lifetime from the
iterator, but that also means a temporary Vec is created.
Unlike dfs_ok(), this function short-circuits at an Err as we use non-lazy
topo_order_forward() internally. I think that's good enough. If we implement
GC on operation log, deleted parents will be excluded (or mapped to tombstone)
by caller. An Err shouldn't mean it's GC-ed.
This unblocks the use of Result<T, E> in op.parents().
There are two ways to encode errors:
a. impl IntoIterator<Item = Result<T, E>>
b. Result<V, E> where V: FromIterator<Item = T>
I think (a) is more natural to algorithms like dfs(), which can process error
nodes transparently.
Still the caller might have to collect the source iterator to temporary Vec
to conform to the neighbors_fn signature. It's not easy for neighbors_fn to
return an iterator borrowing the input node. We already have GAT, but doesn't
have return-position impl Trait in trait yet.
Recognize signature metadata from git commit objects, implement
a basic version of that for the native backend.
Extract the signed data (a commit binary repr without the signature) to
be verified later.
Otherwise, ref updates would fail if we port git::export_refs() to gitoxide.
This change isn't strictly needed for the backend itself, but we'll reuse the
gix::Repository instance created by the backend when importing and exporting
Git refs.
GitBackend will use it to configure gix::Repository. I think UserSettings
is generally useful to pass store-specific parameters, so I've updated all
factory functions.
While the safe implementation is a bit more complex (and probably more branchy),
I don't think the runtime overhead would matter here. Let's remove one more
unsafe for better code maintainability.
We have a few places where we have a `MergedTreeValue` and need to
read the data associated with it so we can write to the working copy
or include it in a diff. Let's extract some of that shared logic to a
function so we can reuse it. I plan to use it for reading file
contents in advance while streaming a diff in `local_working_copy`
soon (and probably in `jj diff` thereafter), but I think it seems like
an improvement on its own.
I'd like to read N files ahead from the backend, to avoid serializing
too many server calls on backends that are backed by a server. Moving
the reads a little earlier is a little step towards that.
The `TreeState::write_*()` functions can now be made into free/static
functions if we prefer.
We no longer have "unsafe" in this function, so let's use the iterator API
instead of recursion. Apparently I haven't pushed this change before because
unsafe in .find_map() looked scary.
Remove a couple of unnecessary unsafes:
- The NonZeroUsize is a constant where the unwrap will optimize away
anyway and we don't have an unsafe without any good reason there :)
- The other two were simply not needed, lifetimes worked fine, maybe
Rust became better since that code was written? NLL? Anyway, they're
gone now
As discussed in Discord, it's less useful if remote_branches() included
Git-tracking branches. Users wouldn't consider the backing Git repo as
a remote.
We could allow explicit 'remote_branches(remote=exact:"git")' query by changing
the default remote pattern to something like 'remote=~exact:"git"'. I don't
know which will be better overall, but we don't have support for negative
patterns anyway.
It's super common that a Merge object holds a resolved value, so let's inline
up to 1 element. T of Merge<T> usually consists of a couple of pointer-sized
fields. I don't see any measurable speed up, but it's no worse than the
original.
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
When diffing two trees, we currently start at the root and diff those
trees. Then we diff each subtree, one at a time, recursively. When
using a commit backend that uses remote storage, like our backend at
Google does, diffing the subtrees one at a time gets very slow. We
should be able to diff subtrees concurrently. That way, the number of
roundtrips to a server becomes determined by the depth of the deepest
difference instead of by the number of differing trees (times 2,
even). This patch implements such an algorithm behind a `Stream`
interface. It's not hooked in to `MergedTree::diff_stream()` yet; that
will happen in the next commit.
I timed the new implementation by updating `jj diff -s` to use the new
diff stream and then ran it on the Linux repo with `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by
~20%, from ~750 ms to ~900 ms. Maybe we can get some of that
performance back but I think it'll be hard to match
`MergedTree::diff()`. We can decide later if we're okay with the
difference (after hopefully reducing the gap a bit) or if we want to
keep both implementations.
I also timed the new implementation on our cloud-based repo at
Google. As expected, it made some diffs much faster (I'm not sure if
I'm allowed to share figures).
I'm about to add a few more checks for diffing with a matcher. I think
it will help make it readable and reduce the risk of mixing up
variables between each part of the test if we use some nested blocks.
I also removed some unnecessary `.clone()` calls while at it.
I'm going to add a Merge method that removes negative/positive terms pair, and
swap_remove() is the easiest option. The order of the conflicted ref targets
doesn't matter.