try_resolve_file_conflict() is also updated. It could be a generic function,
but there are only two callers, and the legacy tree one is used only in tests.
For the same reason as 2cb7e91d "merged_tree: do not re-look up non-conflicting
tree values by name." This appears to bring a similar performance improvement.
I assume this change is/will be covered by test_merged_tree.rs. I considered
adding a few unit tests, but constructing Tree object isn't trivial, and the
iterator implementation is relatively straightforward.
- force each diff command to explicitly enable copy tracking
- enable copy tracking in diff_summary
- post-process for diff iterator
- post-process for diff stream
- update changelog
- use a single commit instead of an array of them. This simplifies the
implementation. A higher level api can wrap this when an array of
commits is desired and those semantics are figured out.
- since this API is directly 1-1 on parents, there are no conflicts
- if we introduce a higher level API that handles lists of commits, we
may need to restore the conflict/resolved distinction, but for now
simplify
This allows us to diff trees without fully resolving conflicts:
let from_tree = merge_no_resolve(..);
for (path, (from, to)) in from_tree.diff(to_tree, matcher) {
let from = resolve_conflicts(from);
if from == to {
continue; // resolved file may be identical
...
I originally considered adding a matcher argument to merge() functions, but the
resulting API looked misleading. If merge() took a matcher, callers might expect
unmatched trees and files were omitted, not left unresolved. It's also slower
than diffing unresolved trees because merge(.., matcher) would have to write
partially resolved trees to the store.
Since "ancestor_tree" isn't resolved by itself, this patch has subtle behavior
change. For example, "jj diff -r9eaef582" in the "git" repository is no longer
empty. I think the new behavior is also technically correct, but I'm not pretty
sure.
While measuring file(path) query, I noticed BTreeMap lookup appears in perf.
It actually has a measurable cost if the history is linear and parent trees
don't have to be merged dynamically. For merge-heavy history, the cost of
tree merges is more significant. I'll address that separately.
```
% hyperfine --sort command --warmup 3 --runs 50 -L bin jj-1,jj-2 \
'target/release-with-debug/{bin} -R ~/mirrors/git --ignore-working-copy \
log -r "::trunk() & ~merges() & file(root:builtin)" --no-graph -n100'
Benchmark 1: target/release-with-debug/jj-1 ..
Time (mean ± σ): 239.7 ms ± 7.1 ms [User: 192.1 ms, System: 46.5 ms]
Range (min … max): 222.2 ms … 249.7 ms 50 runs
Benchmark 2: target/release-with-debug/jj-2 ..
Time (mean ± σ): 201.7 ms ± 6.9 ms [User: 153.7 ms, System: 46.6 ms]
Range (min … max): 184.2 ms … 211.1 ms 50 runs
Relative speed comparison
1.19 ± 0.05 target/release-with-debug/jj-1 ..
1.00 target/release-with-debug/jj-2 ..
```
Suppose we add copy information to MergedTree, a MergedTree can be considered
a root tree representation plus global metadata. I think Merge<Tree> is a better
type for sub trees.
I considered making `MergedTree` just a newtype (1-tuple) but I went
with a struct instead because we may want to add copy information in a
separate field in the future.
In order to remove the `MergedTree::Legacy` form, we need to stop
creating such instances. This patch removes the last place we create
them, which is in `Store::get_root_tree()`.
The main practical consequence of this change is that loading legacy
trees gets a lot slower on large repos. However, since the default log
template includes the `conflict` keyword, we ended up scanning all
paths in `jj log` anyway, so I'm not sure many people will notice.
Since "op abandon" just rewrites DAG, it works no matter if the heads are
merged or not. This change will help crash recovery. "op abandon
--at-op=<one-of-the-heads>" can't be used because ancestor operations would be
preserved by the other head.
Suppose a squash node in obslog is analogous to a merge in revisions log, it
makes sense to show diffs from auto-merge (or auto-squash) parents. This
basically means a non-partial squash node no longer shows diffs.
This also fixes missing diffs at the root predecessors if there were.
Author dates and committer dates can be filtered like so:
committer_date(before:"1 hour ago") # more than 1 hour ago
committer_date(after:"1 hour ago") # 1 hour ago or less
A date range can be created by combining revsets. For example, to see any
revisions committed yesterday:
committer_date(after:"yesterday") & committer_date(before:"today")
Creates a DatePattern type that can be created by parsing a string in any
format supported by the chrono-english crate, including:
- 2024-03-25
- 2024-03-25T00:00:00
- 2024-03-25T00:00:00-08:00
- 2 weeks ago
- 5 minutes ago
- yesterday
- yesterday 5pm
- yesterday 10:30
- yesterday 15:30
- tomorrow
A `kind` can be specified to indicate whether the pattern should match dates at
or after (`after`) or strictly before (`before`) the given instant.
chrono-english supports US and UK dialects to disambiguate mm/dd/yy from
dd/mm/yy, but for now we default to US. This should probably be a config
setting.
This enables the creation of Repo objects in environments without standard filesystem support, by allowing the caller to load the store objects however they see fit. This confines interaction with the filesystem to the WorkingCopy abstractions.
This is part of migrating away from legacy trees (with path-level
conflicts). I can't think of any practical impact (we already compare
the tree ids equal).
This basically reverts 20eb9ecec1 "git: don't abandon HEAD commit when it
loses a branch." I think the new behavior is more consistent because the Git
HEAD is equivalent to @- in jj, so it shouldn't be considered a named ref.
Note that we've made old HEAD branch not considered at 92cfffd843 "git: on
external HEAD move, do not abandon old branch."
#4108
If readonly_index() and index() returned Result, it would propagate to many
call sites. That seems bad for API ergonomics. Suppose most "repo" commands
depend on an index, I think it's okay to load index eagerly:
- "jj config" doesn't load repo (nor index)
- "jj workspace root" doesn't load repo (nor index)
- some other mutation commands load index when printing commit summary
- many other commands load index when resolving revset
In order to render description template, we'll need a Commit object that
represents the old state (with new tree and parents) before updating the
commit description. The added functions will help generate an intermediate
Commit object.
Alternatively, we can create an in-memory Commit object with some fake
CommitId. It should be lightweight, but might cause weird issue because the
fake id wouldn't be found in the store.
I think it's okay to write a temporary commit and rely on GC as we do for
merge trees. However, I should note that temporary commits are more likely to
be preserved as they are pinned by no-gc refs until "jj util gc".
This allows us to construct a builder, format description template with an
intermediate commit, then write() a final commit object to the repo.
I originally considered removing mut_repo from CommitBuilder at all, but
rewriter APIs rely on that CommitBuilder has &mut_repo, and splitting them
would make call sites uglier.
The inner builder methods are based on &mut Self instead of Self, because it's
easier to wrap, and users of the inner builder will bind it to a named variable
anyway.
As the doc comment says, it's called only from CommitBuilder. Let's clarify
that. I'm also planning to extract a builder that only writes to the store
(without mutably borrowing a mut_repo.) It will help implement description
template.
It's inconvenient that we have to quote glob patterns as 'glob:"*.rs"'. Suppose
filesets are usually specified in shell, it's better to allow unquoted strings
if possible. This change also means we'll probably abandon #2101 "make the
parsing of string arguments stricter."
Note that we can no longer introduce ? operator or [] subscript syntax in
filesets.
Closes#4053
The text pattern is applied prior to comparison as we do in Mercurial. This
might affect hunk selection, but is much faster than computing diff of full
file contents. For example, the following hunk wouldn't be caught by
diff_contains("a") because the line "b\n" is filtered out:
- a
b
+ a
Closes#2933
Since fileset and revset languages are syntactically close, we can reparse
revset expression as a fileset. This might sound a bit scary, but helps
eliminate nested quoting like file("~glob:'*.rs'"). One oddity exists in alias
substitution, though. Another possible problem is that we'll need to add fake
operator parsing rules if we introduce incompatibility in fileset, or want to
embed revset expressions in a fileset.
Since "file(x, y)" is equivalent to "file(x|y)", the former will be deprecated.
I'll probably add a mechanism to collect warnings during parsing.
For the same reason as the previous patch. I'm going to make DiffHunk leverage
BStr wrapper instead of custom Debug impl.
b"" literals in tests are changed to &str to get around type incompatibility
between &[u8; N].
This helps migrate internal [u8] variables to BStr.
b"" literals in tests are changed to &str to get around potential type
incompatibility between &[u8; N].
Adds support for revset functions `tracked_remote_branches()` and
`untracked_remote_branches()`. I think this would be especially useful
for configuring `immutable_heads()` because rewriting untracked remote
branches usually wouldn't be desirable (since it wouldn't update the
remote branch). It also makes it easy to hide branches that you don't
care about from the log, since you could hide untracked branches and
then only track branches that you care about.
As suggested by @crackcomm on discord, use eprintln!() to print warnings
to avoid messing up template output, e.g.:
jj --no-pager --ignore-working-copy show --tool true -T change_id -r rv...
rv...ignoring git submodule at "some/submodule"
Signed-off-by: Tim Janik <timj@gnu.org>
The return type T doesn't have to be a literal, and I'm going to use this
function to reparse fileset expression. We might also want to add another
expect_literal_with() helper that parses enum-like string value.
Maybe it'll also be good to keep RevsetExpression::Union(_) flattened, but
that's not needed to get around stack overflow. The constructed expression
tree is balanced.
test_expand_symbol_alias() is slightly adjusted since there are more than
one representation for "a|b|c" now.
Fixes#4031
Partially resolve a 1.5‐year‐old TODO comment.
Add opt‐in syntax for case‐insensitive matching, suffixing the
pattern kind with `-i`. Not every context supports case‐insensitive
patterns (e.g. Git branch fetch settings). It may make sense to make
this the default in at least some contexts (e.g. the commit signature
and description revsets), but it would require some thought to avoid
more confusing context‐sensitivity.
Make `mine()` match case‐insensitively unconditionally, since email
addresses are conventionally case‐insensitive and it doesn’t take
a pattern anyway.
This currently only handles ASCII case folding, due to the complexities
of case‐insensitive Unicode comparison and the `glob` crate’s lack
of support for it. This is unlikely to matter for email addresses,
which very rarely contain non‐ASCII characters, but is unfortunate
for names and descriptions. However, the current matching behaviour is
already seriously deficient for non‐ASCII text due to the lack of any
normalization, so this hopefully shouldn’t be a blocker to adding the
interface. An expository comment has been left in the code for anyone
who wants to try and address this (perhaps a future version of myself).
I don't think there's a possibility that uncommon_shared_words can become
non-empty by trimming the same amount of lines from both sides. Well, there's
an edge case regarding max_occurrences, but that shouldn't matter in practice.
Patience diff starts by lining up unique elements (e.g. lines) to find
matching segments of the inputs. After that, it refines the
non-matching segments by repeating the process. Histogram expands on
that by not just considering unique elements but by continuing with
elements of count 2, then 3, etc.
Before this commit, when diffing "a b a b b" against "a b a b a b", we
would match the two "a"s in the first input against the first two "a"s
in the second input. After this patch, we ignore the "a"s because
their counts differ, so we try to align the "b"s instead.
I have had this commit lying around since I wrote the histogram diff
implementation in 1e657c5331. I vaguely remember thinking that the
way I had implemented it (without this commit) was a bit weird, but I
wasn't sure if this commit would be an improvement or not. The bug
report from @chooglen today of a case where we behave differently from
Git is enough to make me think that we make this change after all.
#761
This is adapted from Breezy/Python patiencediff. AFAICT, Git implementation is
slightly different (and maybe more efficient?), but it's not super easy to
integrate with our diff logic. I'm not sure which one is better overall, but I
think the result is good so long as "uncommon LCS" matching is attempted first.
a9a3e4edc3/patiencediff/_patiencediff_py.py (L108)
This patch prevents some weird test changes that would otherwise be introduced
by the next patch.