Commit graph

3096 commits

Author SHA1 Message Date
Yuya Nishihara
6d6f5990de merged_tree: add merge-join iterator over Merge<Tree> entries
For the same reason as 2cb7e91d "merged_tree: do not re-look up non-conflicting
tree values by name." This appears to bring a similar performance improvement.

I assume this change is/will be covered by test_merged_tree.rs. I considered
adding a few unit tests, but constructing Tree object isn't trivial, and the
iterator implementation is relatively straightforward.
2024-08-12 10:20:34 +09:00
Matt Kulukundis
5911e5c9b2 copy-tracking: Add copy tracking as a post iteration step
- force each diff command to explicitly enable copy tracking
- enable copy tracking in diff_summary
- post-process for diff iterator
- post-process for diff stream
- update changelog
2024-08-11 17:01:45 -04:00
Matt Kulukundis
0349d9ead3 copy-tracking: extract next_impl from next in diff iter/stream 2024-08-11 17:01:45 -04:00
Matt Kulukundis
34b0f87584 copy-tracking: plumb CopyRecordMap through diff method 2024-08-11 17:01:45 -04:00
Matt Kulukundis
6bae5eaf9d copy-tracking: create a MaterializedTreeDiffEntry type 2024-08-11 17:01:45 -04:00
Matt Kulukundis
e123eb21b9 copy-tracking: add source field to TreeDiffEntry
- add the field and make it compile, but don't use it yet
2024-08-11 17:01:45 -04:00
Matt Kulukundis
8e84c60157 copy-tracking: create an explicit TreeDiffEntry struct 2024-08-11 17:01:45 -04:00
Matt Kulukundis
ee6b922144 copy-tracking: create CopyRecordMap and add it to diff summaries 2024-08-11 17:01:45 -04:00
Matt Kulukundis
e667a2b403 copy-tracking: adjust backend signature
- use a single commit instead of an array of them.  This simplifies the
  implementation.  A higher level api can wrap this when an array of
  commits is desired and those semantics are figured out.
- since this API is directly 1-1 on parents, there are no conflicts
- if we introduce a higher level API that handles lists of commits, we
  may need to restore the conflict/resolved distinction, but for now
  simplify
2024-08-11 17:01:45 -04:00
Yuya Nishihara
c9e147c425 merged_tree: allow to postpone resolution of intermediate trees
This allows us to diff trees without fully resolving conflicts:

    let from_tree = merge_no_resolve(..);
    for (path, (from, to)) in from_tree.diff(to_tree, matcher) {
        let from = resolve_conflicts(from);
        if from == to {
            continue; // resolved file may be identical
        ...

I originally considered adding a matcher argument to merge() functions, but the
resulting API looked misleading. If merge() took a matcher, callers might expect
unmatched trees and files were omitted, not left unresolved. It's also slower
than diffing unresolved trees because merge(.., matcher) would have to write
partially resolved trees to the store.

Since "ancestor_tree" isn't resolved by itself, this patch has subtle behavior
change. For example, "jj diff -r9eaef582" in the "git" repository is no longer
empty. I think the new behavior is also technically correct, but I'm not pretty
sure.
2024-08-11 18:23:21 +09:00
Yuya Nishihara
5d141befc2 tests: evaluate file()/diff_contains() revset against merged parents
These tests would fail if trees are compared without resolving file conflicts.
2024-08-11 18:23:21 +09:00
Yuya Nishihara
dac04960f0 rewrite: remove redundant commit_id.clone() from merge_commit_trees*() 2024-08-11 18:23:21 +09:00
Yuya Nishihara
ed1c07e73e tree: fill in valid id to null tree, rename function to empty()
If a null tree were written to the store, GitBackend would crash because of
invalid hash length.
2024-08-11 18:23:21 +09:00
Yuya Nishihara
2cb7e91dc7 merged_tree: do not re-look up non-conflicting tree values by name
While measuring file(path) query, I noticed BTreeMap lookup appears in perf.
It actually has a measurable cost if the history is linear and parent trees
don't have to be merged dynamically. For merge-heavy history, the cost of
tree merges is more significant. I'll address that separately.

```
% hyperfine --sort command --warmup 3 --runs 50 -L bin jj-1,jj-2 \
  'target/release-with-debug/{bin} -R ~/mirrors/git --ignore-working-copy \
   log -r "::trunk() & ~merges() & file(root:builtin)" --no-graph -n100'
Benchmark 1: target/release-with-debug/jj-1 ..
  Time (mean ± σ):     239.7 ms ±   7.1 ms    [User: 192.1 ms, System: 46.5 ms]
  Range (min … max):   222.2 ms … 249.7 ms    50 runs

Benchmark 2: target/release-with-debug/jj-2 ..
  Time (mean ± σ):     201.7 ms ±   6.9 ms    [User: 153.7 ms, System: 46.6 ms]
  Range (min … max):   184.2 ms … 211.1 ms    50 runs

Relative speed comparison
        1.19 ±  0.05  target/release-with-debug/jj-1 ..
        1.00          target/release-with-debug/jj-2 ..
```
2024-08-09 00:17:37 +09:00
Yuya Nishihara
19b62d29ba merged_tree: leverage .to_tree_merge() in TreeDiffIterator 2024-08-08 23:05:37 +09:00
Yuya Nishihara
6fc7cec4a5 merged_tree: make TreeDiffIterator accept trees as &Merge<Tree>
For the same reason as the patch for TreeEntriesIterator. It's probably
better to assume that MergedTree represents the root tree.
2024-08-08 23:05:37 +09:00
Yuya Nishihara
9378adedb7 merged_tree: hold store globally by TreeDiffIterator
Since TreeDiffDirItem is now calculated eagerly, it doesn't make sense to
keep MergedTree in it.
2024-08-08 23:05:37 +09:00
Yuya Nishihara
37c41d0eaf tests: do not pass in commit objects loaded from different store
Otherwise the assertion would fail in the next patch.
2024-08-08 23:05:37 +09:00
Yuya Nishihara
8b72dad095 merged_tree: replace explicit .is_tree() call in TreeEntriesIterator
The value here shouldn't be absent, so .is_tree() is equivalent to
.to_tree_merge().is_some().
2024-08-08 23:05:37 +09:00
Yuya Nishihara
12434b49b8 merged_tree: make TreeEntriesIterator accept trees as &Merge<Tree>
Suppose we add copy information to MergedTree, a MergedTree can be considered
a root tree representation plus global metadata. I think Merge<Tree> is a better
type for sub trees.
2024-08-08 23:05:37 +09:00
Yuya Nishihara
8a3e4ad966 merged_tree: hold store globally by TreeEntriesIterator
Since TreeEntriesDirItem is now calculated eagerly, it doesn't make sense to
keep MergedTree in it.
2024-08-08 23:05:37 +09:00
Martin von Zweigbergk
ec7725064b merged_tree: make MergedTree a struct
I considered making `MergedTree` just a newtype (1-tuple) but I went
with a struct instead because we may want to add copy information in a
separate field in the future.
2024-08-08 05:32:16 -07:00
Martin von Zweigbergk
7596935285 merged_tree: make ConflictIterator a struct 2024-08-08 05:32:16 -07:00
Martin von Zweigbergk
109391f9c7 merged_tree: delete MergedTree::Legacy 2024-08-08 05:32:16 -07:00
Martin von Zweigbergk
10aab1bdc3 conflicts: always promote legacy trees to merged trees
In order to remove the `MergedTree::Legacy` form, we need to stop
creating such instances. This patch removes the last place we create
them, which is in `Store::get_root_tree()`.

The main practical consequence of this change is that loading legacy
trees gets a lot slower on large repos. However, since the default log
template includes the `conflict` keyword, we ended up scanning all
paths in `jj log` anyway, so I'm not sure many people will notice.
2024-08-08 05:32:16 -07:00
Yuya Nishihara
202fb533f4 merged_tree: remove .diff() method in favor of .diff_stream()
It's unlikely we'll need the iterator version of .diff() except for testing
the stream implementation.
2024-08-08 10:45:59 +09:00
Yuya Nishihara
24b8934b14 tests: migrate .diff() callers to .diff_stream() 2024-08-08 10:45:59 +09:00
Yuya Nishihara
63e254d052 tests: use pollster instead of futures::executor::block_on()
It doesn't matter in tests and I have no preference over these, but we tend
to use .block_on().
2024-08-08 10:45:59 +09:00
Yuya Nishihara
26f744ab2d revset: use .diff_stream() in file() evaluation, handle backend error
This is the last .diff() caller in non-test code. Though it wouldn't be
important to use async version here, this change helps remove .diff() API.
2024-08-08 10:45:59 +09:00
Yuya Nishihara
7bdb28f1fe cli: make "op abandon" not fail with multiple op heads
Since "op abandon" just rewrites DAG, it works no matter if the heads are
merged or not. This change will help crash recovery. "op abandon
--at-op=<one-of-the-heads>" can't be used because ancestor operations would be
preserved by the other head.
2024-08-07 10:51:44 +09:00
Yuya Nishihara
399110b1fc op_walk: allow to resolve operation expression from multiple heads
I'll make "op abandon" work without merging op heads.
2024-08-07 10:51:44 +09:00
Yuya Nishihara
4e3a5de6e9 op_walk: sort current heads to stabilize multiple ops error message 2024-08-07 10:51:44 +09:00
Yuya Nishihara
f7836aa687 cli: obslog: show diffs from all predecessors, not first predecessor
Suppose a squash node in obslog is analogous to a merge in revisions log, it
makes sense to show diffs from auto-merge (or auto-squash) parents. This
basically means a non-partial squash node no longer shows diffs.

This also fixes missing diffs at the root predecessors if there were.
2024-08-07 10:51:23 +09:00
Yuya Nishihara
d061c3782f merged_tree: remove .diff_summary()
There are no non-test callers since 452fecb7c4 "cli: colorize diff summary
and sort by path."
2024-08-06 10:15:44 +09:00
Yuya Nishihara
d435a8a793 tests: compare trees without using .diff_summary()
I don't think modification types matter here. Testing paths should be good
enough.
2024-08-06 10:15:44 +09:00
Yuya Nishihara
b290af8e29 op_walk: include operation ids in multiple match error 2024-08-03 09:22:26 +09:00
Stephen Jennings
6c41b1bef8 revset: add author_date and committer_date revset functions
Author dates and committer dates can be filtered like so:

    committer_date(before:"1 hour ago") # more than 1 hour ago
    committer_date(after:"1 hour ago")  # 1 hour ago or less

A date range can be created by combining revsets. For example, to see any
revisions committed yesterday:

    committer_date(after:"yesterday") & committer_date(before:"today")
2024-08-01 09:04:07 -07:00
Stephen Jennings
ff9e739798 revset: create DatePattern type
Creates a DatePattern type that can be created by parsing a string in any
format supported by the chrono-english crate, including:

- 2024-03-25
- 2024-03-25T00:00:00
- 2024-03-25T00:00:00-08:00
- 2 weeks ago
- 5 minutes ago
- yesterday
- yesterday 5pm
- yesterday 10:30
- yesterday 15:30
- tomorrow

A `kind` can be specified to indicate whether the pattern should match dates at
or after (`after`) or strictly before (`before`) the given instant.

chrono-english supports US and UK dialects to disambiguate mm/dd/yy from
dd/mm/yy, but for now we default to US. This should probably be a config
setting.
2024-08-01 09:04:07 -07:00
dploch
bfa1ce8936 workspace: make the constructor public
This allows constructing a workspace in a custom environment where the standard filesystem API cannot be used
2024-07-31 19:45:37 -04:00
dploch
5f7e3883e8 repo: define a public constructor for RepoLoader
This enables the creation of Repo objects in environments without standard filesystem support, by allowing the caller to load the store objects however they see fit. This confines interaction with the filesystem to the WorkingCopy abstractions.
2024-07-31 19:45:37 -04:00
Yuya Nishihara
d2f933eed3 commit_builder: remove unneeded &mut from .write_hidden()
Since the backend::Commit has to be cloned, .write_hidden() doesn't mutate the
self.commit object.
2024-07-25 22:39:00 +09:00
Martin von Zweigbergk
d740f1801b conflicts: use non-legacy MergedTreeId for root commit
This is part of migrating away from legacy trees (with path-level
conflicts). I can't think of any practical impact (we already compare
the tree ids equal).
2024-07-24 14:33:05 +02:00
Martin von Zweigbergk
352ca72314 tests: make helpers create non-legacy trees
Extracted and modified from #3746 by @ilyagr.
2024-07-24 14:33:05 +02:00
Yuya Nishihara
bafb357209 git: on abandoning unreachable commits, don't count HEAD ref
This basically reverts 20eb9ecec1 "git: don't abandon HEAD commit when it
loses a branch." I think the new behavior is more consistent because the Git
HEAD is equivalent to @- in jj, so it shouldn't be considered a named ref.

Note that we've made old HEAD branch not considered at 92cfffd843 "git: on
external HEAD move, do not abandon old branch."

#4108
2024-07-24 21:22:26 +09:00
Yuya Nishihara
da221eb888 repo: load index eagerly to simplify error handling
If readonly_index() and index() returned Result, it would propagate to many
call sites. That seems bad for API ergonomics. Suppose most "repo" commands
depend on an index, I think it's okay to load index eagerly:

 - "jj config" doesn't load repo (nor index)
 - "jj workspace root" doesn't load repo (nor index)
 - some other mutation commands load index when printing commit summary
 - many other commands load index when resolving revset
2024-07-23 18:26:16 +09:00
Yuya Nishihara
626aa90610 repo: use DetachedCommitBuilder constructors
I think this makes it clear that the builder doesn't add any rewrite records
to the mut_repo.
2024-07-23 18:22:40 +09:00
Yuya Nishihara
337dcef6ee commit_builder: add public interface that writes temporary commit to store
In order to render description template, we'll need a Commit object that
represents the old state (with new tree and parents) before updating the
commit description. The added functions will help generate an intermediate
Commit object.

Alternatively, we can create an in-memory Commit object with some fake
CommitId. It should be lightweight, but might cause weird issue because the
fake id wouldn't be found in the store.

I think it's okay to write a temporary commit and rely on GC as we do for
merge trees. However, I should note that temporary commits are more likely to
be preserved as they are pinned by no-gc refs until "jj util gc".
2024-07-23 18:22:40 +09:00
Yuya Nishihara
b4bf1358a5 commit_builder: extract inner builder which isn't lifetimed by mut_repo
This allows us to construct a builder, format description template with an
intermediate commit, then write() a final commit object to the repo.

I originally considered removing mut_repo from CommitBuilder at all, but
rewriter APIs rely on that CommitBuilder has &mut_repo, and splitting them
would make call sites uglier.

The inner builder methods are based on &mut Self instead of Self, because it's
easier to wrap, and users of the inner builder will bind it to a named variable
anyway.
2024-07-23 18:22:40 +09:00
Yuya Nishihara
6516a40c19 commit_builder: extract free function that sets up signing and write commit
I'll add another write() method that doesn't consume self, which will have to
clone self.commit.
2024-07-23 18:22:40 +09:00
Yuya Nishihara
fab310f53f commit_builder: keep Store internally
I'm going to extract an inner builder that is free from &mut MutableRepo
lifetime.
2024-07-23 18:22:40 +09:00