ok/jj
1
0
Fork 0
forked from mirrors/jj
Commit graph

220 commits

Author SHA1 Message Date
Martin von Zweigbergk
5641ef9a42 working_copy: don't send unchanged file states over channel
This doesn't seem to make any difference right now, but it will if we
write the state file when there are mtime-only changes, which we
currently don't do.
2023-08-22 14:45:52 -07:00
Benjamin Saunders
6c4b8a7383 settings: support human-readable byte sizes for max-new-file-size 2023-08-17 19:29:38 -07:00
Ben Saunders
351e7feef5 working_copy: don't snapshot new files larger than 1MiB by default 2023-08-17 19:29:38 -07:00
Martin von Zweigbergk
7ad2270c05 working_copy: pass Merge, not ConflictId, to write_conflict()
This is another small step towards making this code work with
tree-level conflicts.
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
1571541214 working_copy: combine blocks for updating added/modified paths
There's a lot of duplication between the blocks of code for updating
modified and added paths. This commit combines them.
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
01a6578ada working_copy: move up special case for exec-bit-only change
This is also just to make the next change simpler.
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
8ded5ae03b working_copy: convert Diff into Options for matching
This just a little refactoring to make the next step of sharing code
between `Modified` and `Added` simpler.
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
5b8c1e013f working_copy: add a helper for getting the current tree
The code for getting the current tree object was repeated a few times
over. I'm going to soon make it return a `MergedTree` and I don't want
to repeat that code (it's more complicated than the current code).
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
9138bb5517 working_copy: use MergedTree for current tree when snapshotting
We now have all the pieces in place to read the current tree as a
`MergedTree` when snapshotting the working copy. For now, it's still
always a legacy tree. We'll need to update the working copy state file
to support storing multiple trees before we can create a `MergedTree`
with multiple sides here.
2023-08-15 07:56:55 -07:00
Martin von Zweigbergk
c126e75b2b working_copy: make write_path_to_store() work with merged values
For tree-level conflicts, we're going to be getting
`Merge<Option<TreeValue>>` from the current tree and produce a new
such value if contents changes on disk. This commit gets us a little
closer to that by passing in a value of that type into
`write_path_to_store()`.

This seems to have a small but measurable performance
impact. Snapshotting the working copy in the git repo with all files
`touch`ed went from 2.36 s to 2.43 s (3%). I think that's okay,
especially since most files' mtimes rarely change, and we only pay the
price when it has.
2023-08-15 07:56:55 -07:00
Martin von Zweigbergk
3f97a6da78 working_copy: avoid adding unchanged values to tree builder
If the value at a path hasn't changed, there's no need to send it over
the channel and have the receiver add it to `TreeBuilder`. I couldn't
measure any performance impact.

Now we should no longer send `TreeValue::Conflict` variants over the
tree entry channel.
2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
eacdad3ebd working_copy: move writing of conflicts to receiver side of channel
When writing tree-level conflicts, we're going to be writing multiple
tree (maybe using some new `MergedTreeBuilder`), so we'll need the
full `Merge<Option<TreeValue>>` object. This gets us closer to that by
sending such objects over the channel and having the receiver write
the conflict object.

Note that we still sometimes send `TreeValue::Conflict` variants over
the channel. That only happens if they're unchanged.
2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
03f00bbf30 working_copy: return Merge<Option<TreeValue>> over channel
When writing tree-level conflicts, we won't pass `TreeValue::Conflict`
over the `tree_entries` channel. Instead, we're going to pass possibly
unresolved `Merge<Option<TreeValue>>` instances. This commit prepares
for that by changing the type even though we'll only pass
`Merge::normal()` over the channel at this point.

I did this partly to see what the performance impact is. I tested that
by touching all files in the git.git repo to force the trees (and
files) to be rewritten. There was no measurable impact at all
(best-of-10 time was 2.44 s before and 2.40 s after, but I assume that
was a fluke).
2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
6c5d6d7e39 working_copy: delete duplicate comment
I copied a comment that I should have just moved in 37a770e8b4.
2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
4eadb06251 working_copy: propagate errors from writing conflict parts to store 2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
f1b817e8ca cleanup: fix warnings from nightly clippy 2023-08-14 22:11:56 -07:00
Martin von Zweigbergk
e414f3b73c cleanup: use fs:read() instead of File::open().read_to_end() 2023-08-13 14:04:59 +00:00
Martin von Zweigbergk
f9e0feaaf8 working_copy: return early from write_path_to_store() for non-files
Almost the entire method deals with `FileType::Normal`, so we can
reduce indentation and repeated matching on the file type by doing it
early and returning in the non-normal-file cases.
2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
23f54b8151 working_copy: propagate errors when reading conflicted file 2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
33a93b6d2d working_copy: reduce scope of a content variable
This also avoids reading non-file conflict from disk.
2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
585c212617 working_copy: reduce scope of an executable variable 2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
2102de94b0 working_copy: inline write_conflict_to_store()
For tree-level conflicts, we're eventually not going to have
`ConflictId`. We'd want to make `write_conflict_to_store()` take a
`Merge<Option<TreeValue>>` and return an updated such value. That
would leave very little logic in the function, so let's just inline it
instead.
2023-08-13 01:00:31 +00:00
Martin von Zweigbergk
4c46398b1c conflicts: make update_from_content() write resolved content to store
`update_from_content()` already writes file content for each term of
an unresolved merge, so it seems consistent for it to also write the
file content for resolved merges. I think this should simplify further
refactoring for tree-level conflicts and for preserving the executable
bit.
2023-08-11 23:59:44 +00:00
Martin von Zweigbergk
0b85f06e3d conflicts: make update_from_content() work with only FileIds
Since `update_from_contents()` only works with file contents and not
the executable or other kinds of paths, I think it makes more sense
for it to deal with `FileId`s instead of `TreeValue`s.
2023-08-11 23:59:44 +00:00
Martin von Zweigbergk
a995c66635 merge: move some methods back to conflicts as free functions
I think I moved way too many functions onto `Merge<Option<TreeValue>>`
in 82883e648d. This effectively reverts almost all of that
commit. The `Merge<T>` type is simple container and it seems like it
should be at fairly low level in the dependency graph. By moving
functions off of it, we can get rid of the back-depdencies from the
`merge` module to the `conflict` module that I introduced when I moved
`Merge` to the `merge` module. I'm thinking the `conflict` module can
focus on materialized conflicts.
2023-08-11 21:11:25 +00:00
Martin von Zweigbergk
abc7312dbc working_copy: avoid an unused variable on Windows 2023-08-11 01:14:52 +00:00
Martin von Zweigbergk
14ddd17673 working_copy: add debug assertion that tree and file states match
Perhaps the most important invariant in `.jj/working_copy/tree_state`
is that its set of files in it matches the files in its tree. In
particular, if a file that exists in the tree doesn't exist in the
file state and doesn't exist on disk either, we won't notice that it's
gone, and we will therefore not delete it from the tree on future
rounds of snapshotting either.
2023-08-06 22:17:18 +00:00
Martin von Zweigbergk
6cce5e758b working_copy: reduce scope of some variables
With the recent refactorings, we don't need the `tree_builder` and
`deleted_files` until a bit later.
2023-08-06 22:17:18 +00:00
Martin von Zweigbergk
16d00581f6 working_copy: add trace scope to tree-writing call
Writing the tree can probably take a bit of time when the working copy
has changed.
2023-08-06 22:17:18 +00:00
Martin von Zweigbergk
d06f51a88c working_copy: split up tracing scope a bit
Now that we process the outputs from the file system traversal by
reading from channels, we can separate the processing from the file
system traversal. When the working copy is unchanged, processing tree
entries and deleted files takes practically no time, but processing
file states and present files takes significant time.
2023-08-06 22:17:18 +00:00
Martin von Zweigbergk
b27b686b4e working_copy: rename deleted_files_tx to present_files_tx
We use the chanell to report the files that exist, so
`deleted_files_tx` seems confusing.
2023-08-06 22:17:18 +00:00
Waleed Khan
e1c194ce67 working_copy: rename WorkItem -> DirectoryToVisit 2023-08-03 19:09:59 +00:00
Waleed Khan
84f807d222 working_copy: traverse filesystem in parallel
This improves `jj status` time by a factor of ~2x on my machine (M1 Macbook Pro 2021 16-inch, uses an SSD):

```sh
$ hyperfine --parameter-list hash before,after --parameter-list repo nixpkgs,gecko-dev --setup 'git checkout {hash} && cargo build --profile release-with-debug' --warmup 3 './target/release-with-debug/jj -R ../{repo} st'
Benchmark 1: ./target/release-with-debug/jj -R ../nixpkgs st (hash = before)
  Time (mean ± σ):      1.640 s ±  0.019 s    [User: 0.580 s, System: 1.044 s]
  Range (min … max):    1.621 s …  1.673 s    10 runs

Benchmark 2: ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
  Time (mean ± σ):     760.0 ms ±   5.4 ms    [User: 812.9 ms, System: 2214.6 ms]
  Range (min … max):   751.4 ms … 768.7 ms    10 runs

Benchmark 3: ./target/release-with-debug/jj -R ../gecko-dev st (hash = before)
  Time (mean ± σ):     11.403 s ±  0.648 s    [User: 4.546 s, System: 5.932 s]
  Range (min … max):   10.553 s … 12.718 s    10 runs

Benchmark 4: ./target/release-with-debug/jj -R ../gecko-dev st (hash = after)
  Time (mean ± σ):      5.974 s ±  0.028 s    [User: 5.387 s, System: 11.959 s]
  Range (min … max):    5.937 s …  6.024 s    10 runs

$ hyperfine --parameter-list repo nixpkgs,gecko-dev --warmup 3 'git -C ../{repo} status'
Benchmark 1: git -C ../nixpkgs status
  Time (mean ± σ):     865.4 ms ±   8.4 ms    [User: 119.4 ms, System: 1401.2 ms]
  Range (min … max):   852.8 ms … 879.1 ms    10 runs

Benchmark 2: git -C ../gecko-dev status
  Time (mean ± σ):      2.892 s ±  0.029 s    [User: 0.458 s, System: 14.244 s]
  Range (min … max):    2.837 s …  2.934 s    10 runs
```

Conclusions:

- ~2x improvement from previous `jj status` time.
- Slightly faster than Git on nixpkgs.
- Still 2x slower than Git on gecko-dev, not sure why.

For reference, Git's default number of threads is defined in the `online_cpus` function: ee48e70a82/thread-utils.c (L21-L66). We are using whatever the Rayon default is.
2023-08-03 18:20:49 +00:00
Waleed Khan
326be7c91e working_copy: send updates via channel
In preparation of traversing the filesystem in parallel, send updates via `channel`.

An alternative is to modify shared mutable state, e.g. put `self.file_states` behind a mutex or use a concurrent hash-map. This risks leaving the `TreeState` in an invalid state if an error occurs, and makes invariants harder to reason about.

Using a channel introduces a small performance regression. (I didn't try out the concurrent hash-map approach.)

```sh
$ hyperfine --parameter-list hash before,after --setup 'git checkout {hash} && cargo build --profile release-with-debug' --warmup 3 './target/release-with-debug/jj -R ../nixpkgs st'
Benchmark 1: ./target/release-with-debug/jj -R ../nixpkgs st (hash = before)
  Time (mean ± σ):      1.533 s ±  0.013 s    [User: 0.587 s, System: 0.926 s]
  Range (min … max):    1.510 s …  1.559 s    10 runs

Benchmark 2: ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
  Time (mean ± σ):      1.563 s ±  0.021 s    [User: 0.607 s, System: 0.936 s]
  Range (min … max):    1.518 s …  1.595 s    10 runs

Summary
  ./target/release-with-debug/jj -R ../nixpkgs st (hash = before) ran
    1.02 ± 0.02 times faster than ./target/release-with-debug/jj -R ../nixpkgs st (hash = after)
```
2023-08-03 17:56:05 +00:00
Waleed Khan
174704d752 working_copy: extract visit_directory function for snapshotting 2023-08-03 17:40:18 +00:00
Waleed Khan
515fb02049 working_copy: extract WorkItem to top-level struct 2023-08-03 09:49:22 -07:00
Yuya Nishihara
d17ef14956 merge_tools: extract 2-way diff checkout helper
The directory prefix is renamed to "jj-diff-" as I'm going to use it for
"jj diff --tool <external-diff-generator>".
2023-08-03 13:53:37 +09:00
Martin von Zweigbergk
48b1a1c533 working_copy: in ignored directories, visit only already tracked paths
`.gitignores` in ignored directories should be ignored. Before this
commit, we would visit ignored directories like any others if there
were any ignored paths in them.

I've done a lot of preparation for this commit, but There's still a
bit of duplication between the new code and the existing code. I don't
mind improving it if anyone has suggestions. Otherwise I might end up
doing that when I get back to working on snapshotting tree-level
conflicts soon.

This fixes #1785.
2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
bcba1c6682 working_copy: rename sub_path to path
The `sub_path` is created by joining `dir` to a basename. I think
calling it just `path` is clear, especially since its the main path
involved in each iteration of the loop.
2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
0dc5d967ae working_copy: move a duplicate statement out of match block 2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
b48b3780c8 working_copy: replace FileStateUpdate by Option
The `FileStateUpdate` enum now looks very similar to `Option`, so
let's just use that. I also renamed `get_updated_file_state()` to
`get_updated_tree_value()` since it returns a `TreeValue`.
2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
035d4bbbae working_copy: remove file state for deleted files in only one place
We currently remove the file state for deleted files after walking the
working copy and noticing that the file is not there. However, in the
case of files that have been replaced by special files like Unix
sockets, we delete the file state inside the loop. Let's simplify a
tiny bit by not doing that.
2023-08-01 06:31:52 +00:00
Martin von Zweigbergk
4fa2a27f38 working_copy: treat a missing file state as dirty
If we don't have a recorded state for a file, we assume that it's new,
so we add it to the tree as the type it appears on disk. That means we
won't check if it exists as a conflict in the current tree. As another
step towards making the file state just a cache, let's instead treat
this case as a dirty file, so we look up the current value from the
tree. That means that adding files will be a tiny bit slower, but I
doubt it will be noticeable (we need to read the file from disk and
write it to the backend anyway).
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
cb8ff84cc8 working_copy: don't pass FileState through get_updated_file_state()
Since the caller now has the `FileState`, there's no need to pass it
in by value only to get it back in the return value.
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
01feb40fbb working_copy: handle deleted files outside get_updated_file_state()
This is simpler, and it will enable further simplfications.
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
5cc2c91453 working_copy: pass in PathBuf and Metadata to get_updated_file_state()
This will let us call the function even if we don't have a `DirEntry`.
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
37d9aae894 working_copy: handle ignored files outside of get_updated_file_state()
I want to replace the `DirEntry` argument to
`get_updated_file_state()` by a `PathBuf` and a `Metadata`. To avoid
always reading the metadata, we need to check for ignored files
outside of `get_updated_file_state()`. I also think that gives the
call site a nice symmetry in how we use the `git_ignore` for
directories (`.matches_all_files_in()`)) and files
(`.matches_file()`).
2023-07-31 05:59:30 +00:00
Martin von Zweigbergk
be8d471e76 working_copy: preserve executable-ness from tree on Windows
This removes another little bit (literally) of dependency on the
cached file state by reading the old executable bit from the current
tree instead. That helps make it possible to discard the file states
without affecting the resulting snapshot, as we may want to do with
Watchman.
2023-07-31 05:48:32 +00:00
Martin von Zweigbergk
37a770e8b4 working_copy: make write_conflict_to_store() also handle conflicts
With this change, `write_path_to_store()` contains all the logic for
reading a file from disk and writing it to a `TreeBuilder`, making the
code for added and modified files more similar.
2023-07-31 05:48:32 +00:00
Martin von Zweigbergk
beb997e85a watchman: don't even add non-watchman files to set of deleted files
It's faster to add only files matched by the Watchman matcher to the
set of deleted files than to add all files and then removed files not
matched. This speeds up `jj diff` with Watchman in the Linux repo from
~530 ms to ~460 ms.
2023-07-28 12:12:09 -07:00