This is simpler than carefully tracking mutation through old/new git refs and
merged local branches. There are two subtle behavior changes:
a. unimported git refs excluded by git_ref_filter() are not pinned.
b. unexported branches are pinned (so fetched deletion doesn't abandon the
branch if it's referenced by another branch.)
I think (a) is okay (and even more correct) since such refs aren't known to jj
yet. (b) is desired.
This appears to be a bit slower (1.170s -> 1.211s with "log -R git -r 'tags()'
-Tcommit_id --ignore-working-copy"), but seemed better than keeping growing
cache.
This improves `jj status` time by a factor of ~2x on my machine (M1 Macbook Pro 2021 16-inch, uses an SSD):
```sh
$ hyperfine --parameter-list hash before,after --parameter-list repo nixpkgs,gecko-dev --setup 'git checkout {hash} && cargo build --profile release-with-debug' --warmup 3 './target/release-with-debug/jj -R ../{repo} st'
Benchmark 1: ./target/release-with-debug/jj -R ../nixpkgs st (hash = before)
Time (mean ± σ): 1.640 s ± 0.019 s [User: 0.580 s, System: 1.044 s]
Range (min … max): 1.621 s … 1.673 s 10 runs
Benchmark 2: ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
Time (mean ± σ): 760.0 ms ± 5.4 ms [User: 812.9 ms, System: 2214.6 ms]
Range (min … max): 751.4 ms … 768.7 ms 10 runs
Benchmark 3: ./target/release-with-debug/jj -R ../gecko-dev st (hash = before)
Time (mean ± σ): 11.403 s ± 0.648 s [User: 4.546 s, System: 5.932 s]
Range (min … max): 10.553 s … 12.718 s 10 runs
Benchmark 4: ./target/release-with-debug/jj -R ../gecko-dev st (hash = after)
Time (mean ± σ): 5.974 s ± 0.028 s [User: 5.387 s, System: 11.959 s]
Range (min … max): 5.937 s … 6.024 s 10 runs
$ hyperfine --parameter-list repo nixpkgs,gecko-dev --warmup 3 'git -C ../{repo} status'
Benchmark 1: git -C ../nixpkgs status
Time (mean ± σ): 865.4 ms ± 8.4 ms [User: 119.4 ms, System: 1401.2 ms]
Range (min … max): 852.8 ms … 879.1 ms 10 runs
Benchmark 2: git -C ../gecko-dev status
Time (mean ± σ): 2.892 s ± 0.029 s [User: 0.458 s, System: 14.244 s]
Range (min … max): 2.837 s … 2.934 s 10 runs
```
Conclusions:
- ~2x improvement from previous `jj status` time.
- Slightly faster than Git on nixpkgs.
- Still 2x slower than Git on gecko-dev, not sure why.
For reference, Git's default number of threads is defined in the `online_cpus` function: ee48e70a82/thread-utils.c (L21-L66). We are using whatever the Rayon default is.
In preparation of traversing the filesystem in parallel, send updates via `channel`.
An alternative is to modify shared mutable state, e.g. put `self.file_states` behind a mutex or use a concurrent hash-map. This risks leaving the `TreeState` in an invalid state if an error occurs, and makes invariants harder to reason about.
Using a channel introduces a small performance regression. (I didn't try out the concurrent hash-map approach.)
```sh
$ hyperfine --parameter-list hash before,after --setup 'git checkout {hash} && cargo build --profile release-with-debug' --warmup 3 './target/release-with-debug/jj -R ../nixpkgs st'
Benchmark 1: ./target/release-with-debug/jj -R ../nixpkgs st (hash = before)
Time (mean ± σ): 1.533 s ± 0.013 s [User: 0.587 s, System: 0.926 s]
Range (min … max): 1.510 s … 1.559 s 10 runs
Benchmark 2: ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
Time (mean ± σ): 1.563 s ± 0.021 s [User: 0.607 s, System: 0.936 s]
Range (min … max): 1.518 s … 1.595 s 10 runs
Summary
./target/release-with-debug/jj -R ../nixpkgs st (hash = before) ran
1.02 ± 0.02 times faster than ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
```
`.gitignores` in ignored directories should be ignored. Before this
commit, we would visit ignored directories like any others if there
were any ignored paths in them.
I've done a lot of preparation for this commit, but There's still a
bit of duplication between the new code and the existing code. I don't
mind improving it if anyone has suggestions. Otherwise I might end up
doing that when I get back to working on snapshotting tree-level
conflicts soon.
This fixes#1785.
The `sub_path` is created by joining `dir` to a basename. I think
calling it just `path` is clear, especially since its the main path
involved in each iteration of the loop.
It's currently the same code path for handling changes to tracked
paths in ignored directories as outside ignored directories, but I'm
about to change that.
I also updated the assertion in the test to compare all entries
instead of just the tree id, so it's easier to spot errors if it
fails.
The `FileStateUpdate` enum now looks very similar to `Option`, so
let's just use that. I also renamed `get_updated_file_state()` to
`get_updated_tree_value()` since it returns a `TreeValue`.
We currently remove the file state for deleted files after walking the
working copy and noticing that the file is not there. However, in the
case of files that have been replaced by special files like Unix
sockets, we delete the file state inside the loop. Let's simplify a
tiny bit by not doing that.
If we don't have a recorded state for a file, we assume that it's new,
so we add it to the tree as the type it appears on disk. That means we
won't check if it exists as a conflict in the current tree. As another
step towards making the file state just a cache, let's instead treat
this case as a dirty file, so we look up the current value from the
tree. That means that adding files will be a tiny bit slower, but I
doubt it will be noticeable (we need to read the file from disk and
write it to the backend anyway).
I want to replace the `DirEntry` argument to
`get_updated_file_state()` by a `PathBuf` and a `Metadata`. To avoid
always reading the metadata, we need to check for ignored files
outside of `get_updated_file_state()`. I also think that gives the
call site a nice symmetry in how we use the `git_ignore` for
directories (`.matches_all_files_in()`)) and files
(`.matches_file()`).
This removes another little bit (literally) of dependency on the
cached file state by reading the old executable bit from the current
tree instead. That helps make it possible to discard the file states
without affecting the resulting snapshot, as we may want to do with
Watchman.
With this change, `write_path_to_store()` contains all the logic for
reading a file from disk and writing it to a `TreeBuilder`, making the
code for added and modified files more similar.
The `--allow-large-revsets` flag we have on `jj rebase` and `jj new`
allows the user to do e.g. `jj rebase --allow-large-revsets -b
main.. -d main` to rebase all commits that are not in main onto
main. The reason we don't allow these revsets to resolve to multiple
commits by default is that we think users might specify multiple
commits by mistake. That's probably not much of a problem with `jj
rebase -b` (maybe we should always allow that to resolve to multiple
commits), but the user might want to know if `jj rebase -d @-`
resolves to multiple commits.
One problem with having a flag to allow multiple commits is that it
needs to be added to every command where we want to allow multiple
commits but default to one. Also, it should probably apply to each
revset argument those commands take. For example, even if the user
meant `-b main..` to resolve to multiple commits, they might not have
meant `-d main` to resolve to multiple commits (which it will in case
of a conflicted branch), so we might want separate
`--allow-large-revsets-in-destination` and
`--allow-large-revsets-in-source`, which gets quite cumbersome. It
seems better to have some syntax in the individual revsets for saying
that multiple commits are allowed.
One proposal I had was to use a `multiple()` revset function which
would have no effect in general but would be used as a marker if used
at the top level (e.g. `jj rebase -d 'multiple(@-)'`). After some
discussion on the PR adding that function (#1911), it seems that the
consensus is to instead use a prefix like `many:` or `all:`. That
avoids the problem with having a function that has no effect unless
it's used at the top level (`jj rebase -d 'multiple(x)|y'` would have
no effect).
Since we already have the `:` operator for DAG ranges, we need to
change it to make room for `many:`/`all:` syntax. This commit starts
that by allowing both `:` and `::`.
I have tried to update the documentation in this commit to either
mention both forms, or just the new and preferred `::` form. However,
it's useless to search for `:` in Rust code, so I'm sure I've missed
many instances. We'll have to address those as we notice them. I'll
let most tests use `:` until we deprecate it or delete it.
It seemed awkward if merged PR is sometimes rendered as a first branch.
Instead of sorting edges in index order, let's build a HashSet only when
deduplication is needed.
It's faster to add only files matched by the Watchman matcher to the
set of deleted files than to add all files and then removed files not
matched. This speeds up `jj diff` with Watchman in the Linux repo from
~530 ms to ~460 ms.
The intention in this and some future commits is to have `update_file_state` accept `&self` instead of `&mut self` to make clear what data is updated by `update_file_state` and to ensure transactional safety of the `TreeState` contents.
It's currently a bit complicated to snapshot the working copy and
there's a lot of duplication in tests. This commit introduces a
function to simplify it. I made the function snapshot the working copy
and save the updated state. Some of the tests I changed previously
discarded the changes instead of saving them, but I think they all did
so because it was simpler. I left a few call sites unchanged because
they make concurrent changes.
This is breaking change. Old jj binary will panic if it sees a view saved by
new jj. Alternatively, we can store both new and legacy data for backward
compatibility.
This also lets us compare the resulting tree because the working copy
now exactly matches the tree (it used to be that the `.gitignore` file
wasn't initially snapshotted).