ok/jj - ok.software

ok/jj

Author	SHA1	Message	Date
Martin von Zweigbergk	5641ef9a42	working_copy: don't send unchanged file states over channel This doesn't seem to make any difference right now, but it will if we write the state file when there are mtime-only changes, which we currently don't do.	2023-08-22 14:45:52 -07:00
Benjamin Saunders	6c4b8a7383	settings: support human-readable byte sizes for max-new-file-size	2023-08-17 19:29:38 -07:00
Ben Saunders	351e7feef5	working_copy: don't snapshot new files larger than 1MiB by default	2023-08-17 19:29:38 -07:00
Martin von Zweigbergk	7ad2270c05	working_copy: pass `Merge`, not `ConflictId`, to `write_conflict()` This is another small step towards making this code work with tree-level conflicts.	2023-08-16 22:59:12 -07:00
Martin von Zweigbergk	1571541214	working_copy: combine blocks for updating added/modified paths There's a lot of duplication between the blocks of code for updating modified and added paths. This commit combines them.	2023-08-16 22:59:12 -07:00
Martin von Zweigbergk	01a6578ada	working_copy: move up special case for exec-bit-only change This is also just to make the next change simpler.	2023-08-16 22:59:12 -07:00
Martin von Zweigbergk	8ded5ae03b	working_copy: convert `Diff` into `Options` for matching This just a little refactoring to make the next step of sharing code between `Modified` and `Added` simpler.	2023-08-16 22:59:12 -07:00
Martin von Zweigbergk	5b8c1e013f	working_copy: add a helper for getting the current tree The code for getting the current tree object was repeated a few times over. I'm going to soon make it return a `MergedTree` and I don't want to repeat that code (it's more complicated than the current code).	2023-08-16 22:59:12 -07:00
Martin von Zweigbergk	9138bb5517	working_copy: use `MergedTree` for current tree when snapshotting We now have all the pieces in place to read the current tree as a `MergedTree` when snapshotting the working copy. For now, it's still always a legacy tree. We'll need to update the working copy state file to support storing multiple trees before we can create a `MergedTree` with multiple sides here.	2023-08-15 07:56:55 -07:00
Martin von Zweigbergk	c126e75b2b	working_copy: make `write_path_to_store()` work with merged values For tree-level conflicts, we're going to be getting `Merge<Option<TreeValue>>` from the current tree and produce a new such value if contents changes on disk. This commit gets us a little closer to that by passing in a value of that type into `write_path_to_store()`. This seems to have a small but measurable performance impact. Snapshotting the working copy in the git repo with all files `touch`ed went from 2.36 s to 2.43 s (3%). I think that's okay, especially since most files' mtimes rarely change, and we only pay the price when it has.	2023-08-15 07:56:55 -07:00
Martin von Zweigbergk	3f97a6da78	working_copy: avoid adding unchanged values to tree builder If the value at a path hasn't changed, there's no need to send it over the channel and have the receiver add it to `TreeBuilder`. I couldn't measure any performance impact. Now we should no longer send `TreeValue::Conflict` variants over the tree entry channel.	2023-08-14 23:32:52 -07:00
Martin von Zweigbergk	eacdad3ebd	working_copy: move writing of conflicts to receiver side of channel When writing tree-level conflicts, we're going to be writing multiple tree (maybe using some new `MergedTreeBuilder`), so we'll need the full `Merge<Option<TreeValue>>` object. This gets us closer to that by sending such objects over the channel and having the receiver write the conflict object. Note that we still sometimes send `TreeValue::Conflict` variants over the channel. That only happens if they're unchanged.	2023-08-14 23:32:52 -07:00
Martin von Zweigbergk	03f00bbf30	working_copy: return `Merge<Option<TreeValue>>` over channel When writing tree-level conflicts, we won't pass `TreeValue::Conflict` over the `tree_entries` channel. Instead, we're going to pass possibly unresolved `Merge<Option<TreeValue>>` instances. This commit prepares for that by changing the type even though we'll only pass `Merge::normal()` over the channel at this point. I did this partly to see what the performance impact is. I tested that by touching all files in the git.git repo to force the trees (and files) to be rewritten. There was no measurable impact at all (best-of-10 time was 2.44 s before and 2.40 s after, but I assume that was a fluke).	2023-08-14 23:32:52 -07:00
Martin von Zweigbergk	6c5d6d7e39	working_copy: delete duplicate comment I copied a comment that I should have just moved in `37a770e8b4`.	2023-08-14 23:32:52 -07:00
Martin von Zweigbergk	4eadb06251	working_copy: propagate errors from writing conflict parts to store	2023-08-14 23:32:52 -07:00
Martin von Zweigbergk	f1b817e8ca	cleanup: fix warnings from nightly clippy	2023-08-14 22:11:56 -07:00
Martin von Zweigbergk	e414f3b73c	cleanup: use `fs:read()` instead of `File::open().read_to_end()`	2023-08-13 14:04:59 +00:00
Martin von Zweigbergk	f9e0feaaf8	working_copy: return early from `write_path_to_store()` for non-files Almost the entire method deals with `FileType::Normal`, so we can reduce indentation and repeated matching on the file type by doing it early and returning in the non-normal-file cases.	2023-08-13 01:00:31 +00:00
Martin von Zweigbergk	23f54b8151	working_copy: propagate errors when reading conflicted file	2023-08-13 01:00:31 +00:00
Martin von Zweigbergk	33a93b6d2d	working_copy: reduce scope of a `content` variable This also avoids reading non-file conflict from disk.	2023-08-13 01:00:31 +00:00
Martin von Zweigbergk	585c212617	working_copy: reduce scope of an `executable` variable	2023-08-13 01:00:31 +00:00
Martin von Zweigbergk	2102de94b0	working_copy: inline `write_conflict_to_store()` For tree-level conflicts, we're eventually not going to have `ConflictId`. We'd want to make `write_conflict_to_store()` take a `Merge<Option<TreeValue>>` and return an updated such value. That would leave very little logic in the function, so let's just inline it instead.	2023-08-13 01:00:31 +00:00
Martin von Zweigbergk	4c46398b1c	conflicts: make `update_from_content()` write resolved content to store `update_from_content()` already writes file content for each term of an unresolved merge, so it seems consistent for it to also write the file content for resolved merges. I think this should simplify further refactoring for tree-level conflicts and for preserving the executable bit.	2023-08-11 23:59:44 +00:00
Martin von Zweigbergk	0b85f06e3d	conflicts: make `update_from_content()` work with only `FileId`s Since `update_from_contents()` only works with file contents and not the executable or other kinds of paths, I think it makes more sense for it to deal with `FileId`s instead of `TreeValue`s.	2023-08-11 23:59:44 +00:00
Martin von Zweigbergk	a995c66635	merge: move some methods back to `conflicts` as free functions I think I moved way too many functions onto `Merge<Option<TreeValue>>` in `82883e648d`. This effectively reverts almost all of that commit. The `Merge<T>` type is simple container and it seems like it should be at fairly low level in the dependency graph. By moving functions off of it, we can get rid of the back-depdencies from the `merge` module to the `conflict` module that I introduced when I moved `Merge` to the `merge` module. I'm thinking the `conflict` module can focus on materialized conflicts.	2023-08-11 21:11:25 +00:00
Martin von Zweigbergk	abc7312dbc	working_copy: avoid an unused variable on Windows	2023-08-11 01:14:52 +00:00
Martin von Zweigbergk	14ddd17673	working_copy: add debug assertion that tree and file states match Perhaps the most important invariant in `.jj/working_copy/tree_state` is that its set of files in it matches the files in its tree. In particular, if a file that exists in the tree doesn't exist in the file state and doesn't exist on disk either, we won't notice that it's gone, and we will therefore not delete it from the tree on future rounds of snapshotting either.	2023-08-06 22:17:18 +00:00
Martin von Zweigbergk	6cce5e758b	working_copy: reduce scope of some variables With the recent refactorings, we don't need the `tree_builder` and `deleted_files` until a bit later.	2023-08-06 22:17:18 +00:00
Martin von Zweigbergk	16d00581f6	working_copy: add trace scope to tree-writing call Writing the tree can probably take a bit of time when the working copy has changed.	2023-08-06 22:17:18 +00:00
Martin von Zweigbergk	d06f51a88c	working_copy: split up tracing scope a bit Now that we process the outputs from the file system traversal by reading from channels, we can separate the processing from the file system traversal. When the working copy is unchanged, processing tree entries and deleted files takes practically no time, but processing file states and present files takes significant time.	2023-08-06 22:17:18 +00:00
Martin von Zweigbergk	b27b686b4e	working_copy: rename `deleted_files_tx` to `present_files_tx` We use the chanell to report the files that exist, so `deleted_files_tx` seems confusing.	2023-08-06 22:17:18 +00:00
Waleed Khan	e1c194ce67	working_copy: rename `WorkItem` -> `DirectoryToVisit`	2023-08-03 19:09:59 +00:00
Waleed Khan	84f807d222	working_copy: traverse filesystem in parallel This improves `jj status` time by a factor of ~2x on my machine (M1 Macbook Pro 2021 16-inch, uses an SSD): ```sh $ hyperfine --parameter-list hash before,after --parameter-list repo nixpkgs,gecko-dev --setup 'git checkout {hash} && cargo build --profile release-with-debug' --warmup 3 './target/release-with-debug/jj -R ../{repo} st' Benchmark 1: ./target/release-with-debug/jj -R ../nixpkgs st (hash = before) Time (mean ± σ): 1.640 s ± 0.019 s [User: 0.580 s, System: 1.044 s] Range (min … max): 1.621 s … 1.673 s 10 runs Benchmark 2: ./target/release-with-debug/jj -R ../nixpkgs st (hash = after) Time (mean ± σ): 760.0 ms ± 5.4 ms [User: 812.9 ms, System: 2214.6 ms] Range (min … max): 751.4 ms … 768.7 ms 10 runs Benchmark 3: ./target/release-with-debug/jj -R ../gecko-dev st (hash = before) Time (mean ± σ): 11.403 s ± 0.648 s [User: 4.546 s, System: 5.932 s] Range (min … max): 10.553 s … 12.718 s 10 runs Benchmark 4: ./target/release-with-debug/jj -R ../gecko-dev st (hash = after) Time (mean ± σ): 5.974 s ± 0.028 s [User: 5.387 s, System: 11.959 s] Range (min … max): 5.937 s … 6.024 s 10 runs $ hyperfine --parameter-list repo nixpkgs,gecko-dev --warmup 3 'git -C ../{repo} status' Benchmark 1: git -C ../nixpkgs status Time (mean ± σ): 865.4 ms ± 8.4 ms [User: 119.4 ms, System: 1401.2 ms] Range (min … max): 852.8 ms … 879.1 ms 10 runs Benchmark 2: git -C ../gecko-dev status Time (mean ± σ): 2.892 s ± 0.029 s [User: 0.458 s, System: 14.244 s] Range (min … max): 2.837 s … 2.934 s 10 runs ``` Conclusions: - ~2x improvement from previous `jj status` time. - Slightly faster than Git on nixpkgs. - Still 2x slower than Git on gecko-dev, not sure why. For reference, Git's default number of threads is defined in the `online_cpus` function: `ee48e70a82/thread-utils.c (L21-L66)`. We are using whatever the Rayon default is.	2023-08-03 18:20:49 +00:00
Waleed Khan	326be7c91e	working_copy: send updates via `channel` In preparation of traversing the filesystem in parallel, send updates via `channel`. An alternative is to modify shared mutable state, e.g. put `self.file_states` behind a mutex or use a concurrent hash-map. This risks leaving the `TreeState` in an invalid state if an error occurs, and makes invariants harder to reason about. Using a channel introduces a small performance regression. (I didn't try out the concurrent hash-map approach.) ```sh $ hyperfine --parameter-list hash before,after --setup 'git checkout {hash} && cargo build --profile release-with-debug' --warmup 3 './target/release-with-debug/jj -R ../nixpkgs st' Benchmark 1: ./target/release-with-debug/jj -R ../nixpkgs st (hash = before) Time (mean ± σ): 1.533 s ± 0.013 s [User: 0.587 s, System: 0.926 s] Range (min … max): 1.510 s … 1.559 s 10 runs Benchmark 2: ./target/release-with-debug/jj -R ../nixpkgs st (hash = after) Time (mean ± σ): 1.563 s ± 0.021 s [User: 0.607 s, System: 0.936 s] Range (min … max): 1.518 s … 1.595 s 10 runs Summary ./target/release-with-debug/jj -R ../nixpkgs st (hash = before) ran 1.02 ± 0.02 times faster than ./target/release-with-debug/jj -R ../nixpkgs st (hash = after) ```	2023-08-03 17:56:05 +00:00
Waleed Khan	174704d752	working_copy: extract `visit_directory` function for snapshotting	2023-08-03 17:40:18 +00:00
Waleed Khan	515fb02049	working_copy: extract `WorkItem` to top-level `struct`	2023-08-03 09:49:22 -07:00
Yuya Nishihara	d17ef14956	merge_tools: extract 2-way diff checkout helper The directory prefix is renamed to "jj-diff-" as I'm going to use it for "jj diff --tool <external-diff-generator>".	2023-08-03 13:53:37 +09:00
Martin von Zweigbergk	48b1a1c533	working_copy: in ignored directories, visit only already tracked paths `.gitignores` in ignored directories should be ignored. Before this commit, we would visit ignored directories like any others if there were any ignored paths in them. I've done a lot of preparation for this commit, but There's still a bit of duplication between the new code and the existing code. I don't mind improving it if anyone has suggestions. Otherwise I might end up doing that when I get back to working on snapshotting tree-level conflicts soon. This fixes #1785.	2023-08-01 06:31:52 +00:00
Martin von Zweigbergk	bcba1c6682	working_copy: rename `sub_path` to `path` The `sub_path` is created by joining `dir` to a basename. I think calling it just `path` is clear, especially since its the main path involved in each iteration of the loop.	2023-08-01 06:31:52 +00:00
Martin von Zweigbergk	0dc5d967ae	working_copy: move a duplicate statement out of `match` block	2023-08-01 06:31:52 +00:00
Martin von Zweigbergk	b48b3780c8	working_copy: replace `FileStateUpdate` by `Option` The `FileStateUpdate` enum now looks very similar to `Option`, so let's just use that. I also renamed `get_updated_file_state()` to `get_updated_tree_value()` since it returns a `TreeValue`.	2023-08-01 06:31:52 +00:00
Martin von Zweigbergk	035d4bbbae	working_copy: remove file state for deleted files in only one place We currently remove the file state for deleted files after walking the working copy and noticing that the file is not there. However, in the case of files that have been replaced by special files like Unix sockets, we delete the file state inside the loop. Let's simplify a tiny bit by not doing that.	2023-08-01 06:31:52 +00:00
Martin von Zweigbergk	4fa2a27f38	working_copy: treat a missing file state as dirty If we don't have a recorded state for a file, we assume that it's new, so we add it to the tree as the type it appears on disk. That means we won't check if it exists as a conflict in the current tree. As another step towards making the file state just a cache, let's instead treat this case as a dirty file, so we look up the current value from the tree. That means that adding files will be a tiny bit slower, but I doubt it will be noticeable (we need to read the file from disk and write it to the backend anyway).	2023-07-31 05:59:30 +00:00
Martin von Zweigbergk	cb8ff84cc8	working_copy: don't pass `FileState` through `get_updated_file_state()` Since the caller now has the `FileState`, there's no need to pass it in by value only to get it back in the return value.	2023-07-31 05:59:30 +00:00
Martin von Zweigbergk	01feb40fbb	working_copy: handle deleted files outside `get_updated_file_state()` This is simpler, and it will enable further simplfications.	2023-07-31 05:59:30 +00:00
Martin von Zweigbergk	5cc2c91453	working_copy: pass in PathBuf and Metadata to `get_updated_file_state()` This will let us call the function even if we don't have a `DirEntry`.	2023-07-31 05:59:30 +00:00
Martin von Zweigbergk	37d9aae894	working_copy: handle ignored files outside of `get_updated_file_state()` I want to replace the `DirEntry` argument to `get_updated_file_state()` by a `PathBuf` and a `Metadata`. To avoid always reading the metadata, we need to check for ignored files outside of `get_updated_file_state()`. I also think that gives the call site a nice symmetry in how we use the `git_ignore` for directories (`.matches_all_files_in()`)) and files (`.matches_file()`).	2023-07-31 05:59:30 +00:00
Martin von Zweigbergk	be8d471e76	working_copy: preserve executable-ness from tree on Windows This removes another little bit (literally) of dependency on the cached file state by reading the old executable bit from the current tree instead. That helps make it possible to discard the file states without affecting the resulting snapshot, as we may want to do with Watchman.	2023-07-31 05:48:32 +00:00
Martin von Zweigbergk	37a770e8b4	working_copy: make `write_conflict_to_store()` also handle conflicts With this change, `write_path_to_store()` contains all the logic for reading a file from disk and writing it to a `TreeBuilder`, making the code for added and modified files more similar.	2023-07-31 05:48:32 +00:00
Martin von Zweigbergk	beb997e85a	watchman: don't even add non-watchman files to set of deleted files It's faster to add only files matched by the Watchman matcher to the set of deleted files than to add all files and then removed files not matched. This speeds up `jj diff` with Watchman in the Linux repo from ~530 ms to ~460 ms.	2023-07-28 12:12:09 -07:00

1 2 3 4 5

220 commits