ok/jj - ok.software

ok/jj

Author	SHA1	Message	Date
Yuya Nishihara	f5187fa063	copies: determine copy/rename operation by CopiesTreeDiffStream Not all callers need this information, but I assumed it's relatively cheap to look up the source path in the target tree compared to diffing. This could be represented as Regular(_)\|Copied(_, _)\|Renamed(_, _), but it's a bit weird if Copied and Renamed were separate variants. Instead, I decided to wrap copy metadata in Option.	2024-08-23 10:29:12 +09:00
Yuya Nishihara	b6060ce6dd	copies: wrap source path in Option to save allocation Most diff entries should have no copy sources.	2024-08-23 10:29:12 +09:00
Yuya Nishihara	08262eb152	copies: extract (source, target) path pair to separate type This patch adds accessor methods as I'm going to change the underlying data types. Since entry values are consumed separately, these methods are implemented on CopiesTreeDiffEntryPath, not on *TreeDiffEntry.	2024-08-23 10:29:12 +09:00
Yuya Nishihara	43bf195314	merged_tree: rename diff entry field from "value" to "values" It seems a slightly better, and aligns with the local variable name in materialized_diff_stream().	2024-08-23 10:29:12 +09:00
Matt Kulukundis	8ead72e99f	formatting only: switch to Item level import ganularity	2024-08-22 14:52:54 -04:00
Yuya Nishihara	352a4a0eea	copies: filter rename source entries by CopiesTreeDiffStream	2024-08-22 20:17:19 +09:00
Yuya Nishihara	d85e66bbb4	copies: turn add_records() into non-stream API, block_on_stream() by caller This is simpler, and I think it's generally better to not spawn executor in library code.	2024-08-22 20:17:19 +09:00
Martin von Zweigbergk	3acb89e7cc	merged_tree: remove `TreeDiffEntry::source`	2024-08-18 22:16:41 -07:00
Martin von Zweigbergk	70598498b0	merged_tree: provide separate version of `diff_stream()` with copy info I plan to provide a richer version of `TreeDiffEntry` with copy info (and to make `TreeDiffEntry` itself "poorer"). Most callers want to know about copies/renames, but at least working copy implementations probably don't. This patch adds separate `diff_stream()` and `diff_stream_with_copies()` so we can provide the simpler interface for callers that don't need copy info.	2024-08-18 22:16:41 -07:00
Martin von Zweigbergk	e670837ff6	copies: implement copy support in `MergedTree::diff_stream()` as adapter The support for copy tracing is already simply added to the stream just before yielding the item, so we can easily implement it as a stream adapter. That ensures that we use the same logic for the iterator- and stream-based versions. More importantly, it enables further cleanups and a simpler interface.	2024-08-18 22:16:41 -07:00
Martin von Zweigbergk	fd9a236be5	copies: move `CopyRecords` to new `copies` module Copy/rename handling is complicated. It seems worth having a module for it. I'm going to add more content to it next.	2024-08-18 22:16:41 -07:00
Yuya Nishihara	f7377fbbcd	merged_tree: replace MergedTreeVal<'_> by Merge<Option<&TreeValue>> MergedTreeVal was roughly equivalent to Merge<Option<Cow<_>>. As we've dropped support for the legacy trees, it can be simplified to Merge<Option<&_>>.	2024-08-12 23:01:46 +09:00
Matt Kulukundis	5911e5c9b2	copy-tracking: Add copy tracking as a post iteration step - force each diff command to explicitly enable copy tracking - enable copy tracking in diff_summary - post-process for diff iterator - post-process for diff stream - update changelog	2024-08-11 17:01:45 -04:00
Matt Kulukundis	34b0f87584	copy-tracking: plumb CopyRecordMap through diff method	2024-08-11 17:01:45 -04:00
Matt Kulukundis	8e84c60157	copy-tracking: create an explicit TreeDiffEntry struct	2024-08-11 17:01:45 -04:00
Yuya Nishihara	6fc7cec4a5	merged_tree: make TreeDiffIterator accept trees as &Merge<Tree> For the same reason as the patch for TreeEntriesIterator. It's probably better to assume that MergedTree represents the root tree.	2024-08-08 23:05:37 +09:00
Yuya Nishihara	9378adedb7	merged_tree: hold store globally by TreeDiffIterator Since TreeDiffDirItem is now calculated eagerly, it doesn't make sense to keep MergedTree in it.	2024-08-08 23:05:37 +09:00
Martin von Zweigbergk	ec7725064b	merged_tree: make `MergedTree` a struct I considered making `MergedTree` just a newtype (1-tuple) but I went with a struct instead because we may want to add copy information in a separate field in the future.	2024-08-08 05:32:16 -07:00
Martin von Zweigbergk	109391f9c7	merged_tree: delete `MergedTree::Legacy`	2024-08-08 05:32:16 -07:00
Yuya Nishihara	24b8934b14	tests: migrate .diff() callers to .diff_stream()	2024-08-08 10:45:59 +09:00
Yuya Nishihara	63e254d052	tests: use pollster instead of futures::executor::block_on() It doesn't matter in tests and I have no preference over these, but we tend to use .block_on().	2024-08-08 10:45:59 +09:00
Martin von Zweigbergk	65a988e3d2	merged_tree: make tree builder attempt to resolve conflicts As we discovered in the `jj fix` tests, `MergedTreeBuilder::write_tree()` doesn't try to resolve conflicts, not even trivial ones. This patch fixes that.	2024-06-08 20:29:30 +09:00
Martin von Zweigbergk	776b2d981f	merged_tree: make `resolve()` return a `MergedTree` It seems like a method on `MergedTree` should return another `MergedTree` when reasonable. I'm not sure why I made it return a `Merge<Tree>` instead.	2024-06-08 20:29:30 +09:00
Martin von Zweigbergk	7e6a968415	conflicts: consider the empty tree a non-legacy tree Since we no longer depend on legacy trees being preserved when we build new trees or merge trees, we can consider the root tree a non-legacy tree.	2024-05-27 06:25:27 -07:00
Martin von Zweigbergk	07bb1d81b7	tree_builder: propagate errors from `write_tree()`	2024-05-22 06:46:38 -07:00
Martin von Zweigbergk	1970ddef15	tree: propagate errors from `sub_tree()`/`path_value()`	2024-05-22 06:46:38 -07:00
Martin von Zweigbergk	facfb71f7b	test_merged_tree: reduce duplication and wrapping with helper lambdas I'm about to make `[Merged]Tree::path_value()` return a `Result`. This will help even more then.	2024-05-22 06:46:38 -07:00
Martin von Zweigbergk	0d1ff8a150	merged_tree: propagate errors from `TreeEntriesIterator` We shouldn't panic if we fail to read a tree from the backend.	2024-05-01 06:10:08 -07:00
Ilya Grigoriev	a88c06068e	clippy: new nightly fixes For some reason, clippy also suggested surrounding `self.value` with parentheses. Not sure whether that's a clippy bug. Cc: https://github.com/rust-lang/rust-clippy/issues/12268	2024-02-10 16:06:28 -08:00
Yuya Nishihara	35f718f212	merged_tree: remove canceling terms prior to resolving file-level conflict I think this is a variant of the problem fixed by `7fda80fc22` "tree: simplify conflict before resolving at hunk level." We need to simplify() the conflict before and after extracting file ids because the source conflict values may contain trees to be cancelled out, and the file values may differ only in exec bits. Since the legacy tree passes a simplified conflict in to this function, I made the merged tree do the same. Fixes #2654	2023-12-03 07:44:58 +09:00
Yuya Nishihara	4ffbf40c82	merged_tree: do not propagate conflicting empty tree value to parent Otherwise an empty subtree would be added to the parent tree. If the stored tree contained an empty subtree, simplify() wouldn't work against new "absent" subtree representation. I don't know if there's a such code path, but I believe it's very rare to encounter the problem. #2654	2023-12-03 07:44:58 +09:00
Yuya Nishihara	28ab9593c3	repo_path: split RepoPath into owned and borrowed types This enables cheap str-to-RepoPath cast, which is useful when sorting and filtering a large Vec<(String, _)> list by using matcher for example. It will also eliminate temporary allocation by repo_path.parent().	2023-11-28 07:33:28 +09:00
Yuya Nishihara	0a1bc2ba42	repo_path: add stub RepoPathBuf type, update callers Most RepoPath::from_internal_string() callers will be migrated to the function that returns &RepoPath, and cloning &RepoPath won't work.	2023-11-28 07:33:28 +09:00
Yuya Nishihara	d322df0c8d	matchers: make Files/PrefixMatcher constructors accept slice of borrowed paths RepoPath will become slice type (like str), and it doesn't make sense to require &[RepoPathBuf] here.	2023-11-28 07:33:28 +09:00
Yuya Nishihara	974a6870b3	repo_path: make RepoPath::components() return iterator This allows us to change the backing type from Vec<String> to String.	2023-11-27 08:42:09 +09:00
Yuya Nishihara	59ef3f0023	repo_path: split RepoPathComponent into owned and borrowed types This is a step towards introducing a borrowed RepoPath type. The current RepoPath type is inefficient as each component String is usually short. We could apply short-string optimization, but still each inlined component would consume 24 bytes just for e.g. "src", and increase the chance of random memory access. If the owned RepoPath type is backed by String, we can implement cheap cast from &str to borrowed &RepoPath type.	2023-11-26 18:21:40 +09:00
Yuya Nishihara	f2096da2d6	repo_path: add stub type to introduce borrowed RepoPathComponent type The current RepoPathComponent will be renamed to RepoPathComponentBuf, and new str wrapper will be added as RepoPathComponent.	2023-11-26 18:21:40 +09:00
Yuya Nishihara	6344cd56b3	repo_path: remove RepoPathJoin trait, just implement join() on the type I don't think we'll add join() that takes different types.	2023-11-26 07:14:47 +09:00
Yuya Nishihara	e0c35684af	merge: rename Merge::new() to Merge::from_removes_adds() Since (removes, adds) pair is no longer the canonical representation of Merge, the name Merge::new() seems too generic. Let's give more verbose name.	2023-11-07 17:10:12 +09:00
Martin von Zweigbergk	d989d4093d	merged_tree: let backend influence whether to use new diff algo Since the concurrent diff algorithm is significantly slower when using the Git backend, I think we'll have to use switch between the two algorithms depending on backend. Even if the concurrent version always performed as well as the sequential version, exactly how concurrent it should be probably still depends on the backend. This commit therefore adds a function to the `Backend` trait, so each backend can say how much concurrency they deal well with. I then use that number for choosing between the sequential and concurrent versions in `MergedTree::diff_stream()`, and also to decide the number of concurrent reads to do in the concurrent version.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	f40adb84fc	merged_tree: add a `Stream` for concurrent diff off trees When diffing two trees, we currently start at the root and diff those trees. Then we diff each subtree, one at a time, recursively. When using a commit backend that uses remote storage, like our backend at Google does, diffing the subtrees one at a time gets very slow. We should be able to diff subtrees concurrently. That way, the number of roundtrips to a server becomes determined by the depth of the deepest difference instead of by the number of differing trees (times 2, even). This patch implements such an algorithm behind a `Stream` interface. It's not hooked in to `MergedTree::diff_stream()` yet; that will happen in the next commit. I timed the new implementation by updating `jj diff -s` to use the new diff stream and then ran it on the Linux repo with `jj diff --ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by ~20%, from ~750 ms to ~900 ms. Maybe we can get some of that performance back but I think it'll be hard to match `MergedTree::diff()`. We can decide later if we're okay with the difference (after hopefully reducing the gap a bit) or if we want to keep both implementations. I also timed the new implementation on our cloud-based repo at Google. As expected, it made some diffs much faster (I'm not sure if I'm allowed to share figures).	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	9af09ec236	test_meregd_tree: test diffing with a matcher We didn't have any tests at all for `MergedTree::diff()` with a matcher other than `EverythingMatcher`. This patch adds a few.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	16aa8e8f10	test_merged_tree: nest each part of `test_diff_dir_file()` I'm about to add a few more checks for diffing with a matcher. I think it will help make it readable and reduce the risk of mixing up variables between each part of the test if we use some nested blocks. I also removed some unnecessary `.clone()` calls while at it.	2023-11-06 23:12:02 -08:00
Yuya Nishihara	895bbce8c0	files: use borrowed Merge iterator in merge() Since the underlying Merge data type is no longer (Vec<T>, Vec<T>), it doesn't make sense to build removes/adds Vecs and concatenate them.	2023-11-07 06:52:35 +09:00
Martin von Zweigbergk	a1ef9dc845	merged_tree: propagate backend errors in diff iterator I want to fix error propagation before I start using async in this code. This makes the diff iterator propagate errors from reading tree objects. Errors include the path and don't stop the iteration. The idea is that we should be able to show the user an error inline in diff output if we failed to read a tree. That's going to be especially useful for backends that can return `BackendError::AccessDenied`. That error variant doesn't yet exist, but I plan to add it, and use it in Google's internal backend.	2023-10-26 06:20:56 -07:00
Martin von Zweigbergk	6ad71e658d	merged_tree: rename `MergedTreeValue` to `MergedTreeVal` I'm going to add `MergedTreeValue` as an alias for `Merge<Option<TreeValue>>`, but we already have a type by that name in `merged_tree`. This patch renames it away, to make room for the new alias. I used `MergedTreeVal` for this borrowing version to be a bit like how `str` is a borrowed version of `String`.	2023-10-26 06:20:56 -07:00
Martin von Zweigbergk	7fda80fc22	tree: simplify conflict before resolving at hunk level I ran into a bug the other day where `jj status` said there was a conflict in a file but there were no conflict markers in the working copy. The commit was created when I squashed a conflict resolution into the commit's parent. The rebased child commit then ended up in this state. I.e., it looked something like this before squashing: ``` C (no conflict) \| \| B conflict \|/ A conflict ``` The conflict in B was different from the conflict in A. When I squashed in C, jj would try to resolve the conflicts by first creating a 7-way conflict (3 from A, 3 from B, 1 from C). Because of the exact content-level changes, the 7-way conflict couldn't be automatically resolved by `files::merge()` (the way it currently works anyway). However, after simplifying the conflict, it could be resolved. Because `MergedTree::merge()` does another round of conflict simplification of the result at the end of the function, it was the simplifed version that actually got stored in the commit. So when inspecting the conflict later (e.g. in the working copy, as I did), it could be automatically resolved. I think there are at least two ways to solve this. One is to call `merge_trees()` again after calling `tree.simplify()` in `MergedTree::merge()`. However, I think it would only matter in the case of content-level conflicts. Therefore, it seems better to make the content-level resolution solve this case to start with. I've done that by simplifying the conflict before passing it into `files::merge()`. We could even do the fix in `files::merge()`, but doing it before calling it has the advantage that we can avoid reading some unchanged content from the backend.	2023-09-27 22:14:39 -07:00
Martin von Zweigbergk	e3f82cd99a	tests: leverage `TestRepo::init()` in `test_merged_tree` I forgot to update these call sites when I introduced (the new version of) `TestRepo::init()`.	2023-09-20 07:47:30 -07:00
Martin von Zweigbergk	7ecd64fde1	merged_tree: use child path when merging child This fixes a bug where we used the parent directory's path when trying read trees and files for a child entry. Many tests in `test_merged_tree` fail after switching to the test backend there without this fix/	2023-09-18 07:53:19 -07:00
Martin von Zweigbergk	9c30d7500b	testutils: delete bool-typed `init()` in favor of enum-typed version It makes the call sites clearer if we pass the `TestRepoBackend` enum instead of the boolean `use_git` value. It's also more extensible (I plan to add another backend for tests).	2023-09-18 07:15:37 -07:00

1 2

67 commits