This addresses the test instability. The underlying problem still exists, but
it's unlikely to trigger user-facing issues because of that. A repo instance
won't be reused after gc() call.
Fixes#3537
Apparently, these gc() invocations rely on that the previous "git gc" packed
all refs so there are no loose refs to compare mtimes. If there were new (or
remaining) loose refs, mtime comparison could fail. I also added +1sec to
effectively turn off the keep_newer option, which isn't important in these
tests.
There are several existing commands that would benefit from an API
that makes it easier to rewrite a whole graph of commits while
transforming them in some way.
`jj squash` is one example. When squashing into an ancestor, that
command currently rewrites the ancestor, then rebases descendants, and
then rewrites the rewritten source commit. It would be better to
rewrite the source commit (and any descendants) only once.
Another example is the future `jj fix`. That command will want to
rewrite a graph while updating the trees. There's currently no good
API for that; you have to manually iterate over descendants and
rewrite them.
This patch adds a new `MutableRepo::transform_descendants()` method
that takes a callback which gets a `CommitRewriter` passed to it. The
callback can then decide to change the parents, the tree, etc. The
callback is also free to leave the commit in place or to abandon it.
I updated the regular `rebase_descendants()` to use the new function
in order to exercise it. I hope we can replace all of the
`rebase_descendant_*()` flavors later.
I added a `replace_parent()` method that was a bit useful for the test
case. It could easily be hard-coded in the test case instead, but I
think the method will be useful for `jj git sync` and similar in the
future.
`CommitRewriter` wraps 3 of the arguments, so I think it makes sense
to pass it instead. More importantly, I hope to continue refactoring
so many of the callers already have a `CommitRewriter`.
It's cheap to look up commits again from the cache in `Store` but it
can be expensive to look up commits we didn't end up needing. This
will make it easier to refactor further and be able to cheaply set
preliminary parents for a rewritten commits and then let the caller
update them.
I'm going to add a helper struct to help with rewriting commits. I
want to make that struct own the old commit and the new parents to
simplify lifetimes. This patch prepares for that by passing the
commits by value to `rebase_commit()`.
This implements the other workaround described in 57167cefda "git: on
import_refs(), don't abandon ancestors of newly fetched refs":
> I think there are two ways to fix the problem:
> a. pin non-tracking remote branches just like local refs
> b. pin newly fetched refs in addition to local refs
> This patch implements (b) because it's simpler and more obvious that the
> fetched commits would never be abandoned immediately.
The idea of (a) is that untracked remote branches are independent read-only
refs, and read-only branches shouldn't be rewritten implicitly. Once the
branch gets rewritten or abandoned by user, these remote refs will be hidden,
and won't be pinned anymore.
Since (a) effectively supersedes (b), this patch also removes the original
workaround.
Fixes#3495
I don't think we have any callers left that call
`record_rewritten_commit()` multiple times within a transaction and
expect it to result in divergence. I think we should consider it a bug
to do that.
Apart from (IMO) looking nicer, this will also sidestep the potential problem
that if the file contains actual jj conflict markers (`>>>>>>>` in the beginning
of a line, for example), jj would currently have trouble materializing and
subsequently parsing conflicts in the file if it actually became conflicted.
I'll demo this bug in either this or a subsequent PR. It's the kind of bug that
sounds serious in theory but might never cause a problem in practice.
After this PR, only `docs/tutorial.md` has a conflict marker that's not indented.
There's only one there, so hopefully it won't be too much of a pain to deal with.
I also indented other strings in `test_conflicts.rs`. IMO, this looks nice and
more consistent with the `insta::assert_snapshot` output. I didn't spend the
time to do the same for `test_resolve_command`.
Suppose the generation value is usually small, it should be faster to do
bounded range look up first 'y-', then walk ancestors with the unwanted set
'y-..x'.
This helps to eliminate higher-ranked trait bounds from RevWalkRevset and
RevWalk combinators to be added. Since &CompositeIndex is now a real reference,
it can be passed to functions as index: &T.
Initially we were thinking to have `Revset` return something like
`CachedRevset`:
```
pub trait CachedRevset {
fn iter(&self) -> Box<dyn Iterator<Item = Commit>>;
fn contains(&self, &CommitId) -> bool;
}
```
But we weren't sure what use case for `iter` would be, so we dropped the `iter`
method. `CachedRevset` with single `contains` method needed a better name. We
weren't able to come up with one, so we decided instead to have a method on
`Revset` that returns a closure to check if a commit is in a revset.
Follows up 7552f939c6 "tests: disable most gpg integration tests on Windows."
I couldn't find this test failing in a few samples before, but it does now.
This removes the special handling of the working-copy commit. By
recording when an empty/emptied commit was abanoned, we rebase
descendants correctly and create a new empty working-copy commit on
top.
This adds a guard to the gpg signing tests which will skip the test if
`gpg` is not installed on the system.
This is done in order to avoid requiring all collaborators to have setup
all the tools on their local machines that are required to test commit
signing.
When doing things like testing snapshot performance differences,
this allows you to turn off the monitor, no matter what the enabled
user or repository configuration has, e.g.
jj st --config-toml='core.fsmonitor="none"'
Signed-off-by: Austin Seipp <aseipp@pobox.com>
It should be useful at least in the presentation layer to know which
operations correspond to working-copy snapshots. They might be
rendered differently in the graph, for example. Or maybe an undo
command wants to warn if you just undid a snapshot operation. This
patch just introduces a field in the metadata to store the
information.
I think the conclusion from #2600 is that at least auto-rebasing
should not simplify merge commits that merge a commit with its
ancestor. Let's start by adding an option for that in the library.
The shortest change id prefix will become a few digits longer, but I think
that's acceptable. Entries included in the "revsets.short-prefixes" set are
unaffected.
The reachable set is calculated eagerly, but this is still faster as we no
longer need to sort the reachable entries by change id. The lazy version will
save another ~100ms in mid-size repos.
"jj log" without working copy snapshot:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
--ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 353.6 ms ± 11.9 ms [User: 266.7 ms, System: 87.0 ms]
Range (min … max): 329.0 ms … 365.6 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 271.3 ms ± 9.9 ms [User: 183.8 ms, System: 87.7 ms]
Range (min … max): 250.5 ms … 282.7 ms 20 runs
Relative speed comparison
1.99 ± 0.16 target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
1.53 ± 0.12 target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
```
"jj status" with working copy snapshot (watchman enabled):
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
status --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 396.6 ms ± 10.1 ms [User: 300.7 ms, System: 94.0 ms]
Range (min … max): 373.6 ms … 408.0 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 318.6 ms ± 12.6 ms [User: 219.1 ms, System: 94.1 ms]
Range (min … max): 294.2 ms … 333.0 ms 20 runs
Relative speed comparison
1.85 ± 0.14 target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
1.48 ± 0.12 target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
```
This basically means that the change ids are interned. We'll implement binary
search over the sorted change ids table. The table could be sorted differently
for better cache locality, but it is in lexicographical order for simplicity.
With my testing, the cost of the id lookup isn't dominant.
Unlike the parent entries, the size of the per-id overflow items isn't saved.
That's s because the number of the same-change-id commits is either 1 or many.
It doesn't make sense to allocate 8 bytes for each change id. Instead, we'll
pay extra indirection cost to determine the size.
Apparently, gix has 100ms timeout. Since this test tries to create contended
situation, it's possible that the ref lock can't be acquired. I've added
upper bound to the retry loop at b37293fa68 "tests: add upper bound to
test_concurrent_read_write_commit() loop", so ignoring arbitrary errors
should be okay.
The problem can be reproduced on my Linux machine by inserting 10ms sleep() to
gix and increasing the concurrency.
Fixes#3069
The `ContentHash` documentation specifies that implementations for enums should
hash the ordinal number of the variant contained in the enum as a 32-bit
little-endian number and then hash the contents of the variant, if any.
The current implementations for `std::Option`, `MergedTreeId`, and
`RemoteRefState` are non-conformant since they hash the ordinal number as a u8
with platform specific endianness.
Fixes#3051
Since IdIndex sorts the entries by using .sort_unstable_by_key(), the order of
the same-key elements is undefined. Perhaps, it's stable for short arrays, and
the test passes because of that.
I'm going to introduce breaking changes in index format. Some of them will
affect the file size, so version number or signature won't be needed. However,
I think it's safer to detect the format change as early as possible.
I have no idea if embedded version number is the best way. Because segment
files are looked up through the operation links, the version number could be
stored there and/or the "segments" directory could be versioned. If we want to
support multiple format versions and clients, it might be better to split the
tables into data chunks (e.g. graph entries, commit id table, change id table),
and add per-chunk version/type tag. I choose the per-file version just because
it's simple and would be non-controversial.
As I'm going to introduce format change pretty soon, this patch doesn't
implement data migration. The existing index files will be deleted and new
files will be created from scratch.
Planned index format changes include:
1. remove unused "flags" field
2. inline commit parents up to two
3. add sorted change ids table
This is #3002 with tests rerun to account for changes
to `strsim`, as @thoughtpolice noticed in
https://github.com/martinvonz/jj/pull/3002#issuecomment-1936763101
The string similarity changes include an example that
seems better and one that seems worse. Decreasing
the threshold definitely makes things worse.
I was a bit surprised to learn (or be reminded?) that checking out
symlinks on Windows leads to a panic. This patch fixes the crash by
materializing symlinks from the repo as regular files. It also updates
the snapshotting code so we preserve the symlink-ness of a path. The
user can update the symlink in the repo by updating the regular file
in the working copy. This seems to match Git's behavior on Windows
when symlinks are disabled.
this greatly speeds up the time to run all tests, at the cost of slightly larger recompile times for individual tests.
this unfortunately adds the requirement that all tests are listed in `runner.rs` for the crate.
to avoid forgetting, i've added a new test that ensures the directory is in sync with the file.
## benchmarks
before this change, recompiling all tests took 32-50 seconds and running a single test took 3.5 seconds:
```
; hyperfine 'touch lib/src/lib.rs && cargo t --test test_working_copy'
Time (mean ± σ): 3.543 s ± 0.168 s [User: 2.597 s, System: 1.262 s]
Range (min … max): 3.400 s … 3.847 s 10 runs
```
after this change, recompiling all tests take 4 seconds:
```
; hyperfine 'touch lib/src/lib.rs ; cargo t --test runner --no-run'
Time (mean ± σ): 4.055 s ± 0.123 s [User: 3.591 s, System: 1.593 s]
Range (min … max): 3.804 s … 4.159 s 10 runs
```
and running a single test takes about the same:
```
; hyperfine 'touch lib/src/lib.rs && cargo t --test runner -- test_working_copy'
Time (mean ± σ): 4.129 s ± 0.120 s [User: 3.636 s, System: 1.593 s]
Range (min … max): 3.933 s … 4.346 s 10 runs
```
about 1.4 seconds of that is the time for the runner, of which .4 is the time for the linker. so
there may be room for further improving the times.
Our virtual file system at Google (CitC) would like to know the commit
so it can scan backwards and find the closest mainline tree based on
it. Since we always record an operation id (which resolves to a
working-copy commit) when we write the working-copy state, it doesn't
seem like a restriction to require a commit.