I don't think there's a possibility that uncommon_shared_words can become
non-empty by trimming the same amount of lines from both sides. Well, there's
an edge case regarding max_occurrences, but that shouldn't matter in practice.
Patience diff starts by lining up unique elements (e.g. lines) to find
matching segments of the inputs. After that, it refines the
non-matching segments by repeating the process. Histogram expands on
that by not just considering unique elements but by continuing with
elements of count 2, then 3, etc.
Before this commit, when diffing "a b a b b" against "a b a b a b", we
would match the two "a"s in the first input against the first two "a"s
in the second input. After this patch, we ignore the "a"s because
their counts differ, so we try to align the "b"s instead.
I have had this commit lying around since I wrote the histogram diff
implementation in 1e657c5331. I vaguely remember thinking that the
way I had implemented it (without this commit) was a bit weird, but I
wasn't sure if this commit would be an improvement or not. The bug
report from @chooglen today of a case where we behave differently from
Git is enough to make me think that we make this change after all.
#761
This is adapted from Breezy/Python patiencediff. AFAICT, Git implementation is
slightly different (and maybe more efficient?), but it's not super easy to
integrate with our diff logic. I'm not sure which one is better overall, but I
think the result is good so long as "uncommon LCS" matching is attempted first.
a9a3e4edc3/patiencediff/_patiencediff_py.py (L108)
This patch prevents some weird test changes that would otherwise be introduced
by the next patch.
I've run into change ID prefixes being 4-5 characters instead of the
usual 1-2 characters in commands like `jj duplicate` and `jj split`
fairly often, and it seems like this should resolve that.
For the same reason as the previous commit.
Created and moved stats are printed separately because it's unusual to do both
within one "branch set" invocation.
For the same reason as cdc0cc3601. This will help notice problems like wrong
target revision.
The warning for multiple branches is reorganized as a hint for "-r" option,
which I think is the main purpose of this warning. Unlike "squash", we don't
check if an argument can be parsed as a revset because branch name is usually
a valid symbol expression.
The output looks somewhat similar to color-words diffs. Unified diffs are
verbose, but are easier to follow if adjacent lines are added/removed + modified
for example.
Word-level diffing is forcibly enabled. We can also add a config knob (or
!color condition) to turn it off to save CPU time.
I originally considered disabling highlights in block insertion/deletion, but
that wasn't always great. This can be addressed separately as it also applies
to color-words diffs. #3958
Forgetting a workspace removes its working-copy commit, so it makes
sense for it to be abandoned if it is discardable just like editing a
new commit will cause the old commit to be abandoned if it is
discardable.
Currently, if two workspaces are editing the same discardable commit and
one of them switches to editing a different commit, it is abandoned even
though the other workspace is still editing it. This commit treats
workspaces as referencing their working-copy commits so that they won't
be abandoned.
It's nice to see the result of "branch move", "create", etc., and this is more
important in "branch move" because the source branches can be specified in an
abstracted way. I originally considered printing a list of affected branches,
but it looked rather verbose. Since the destination revision is unique, we can
use commit_summary template instead.
This patch also removes a warning about multiple branches because the branch
names are included in the commit summary. I think the hint message is good
enough to signal possible mistake.
We usually print stats at the end of mutable operation, and I think these
messages are useful even if N = 1. I understand that "Deleted N" (N > 1) is
unusual and the original intent of these messages was to signal possible
mistakes. However, I don't think printing N=1 stats would nullify the original
purpose.
No emptiness check is needed for delete/forget, but names can be empty in
track/untrack because of noop changes.
The last hunk could be truncated instead, but the .peekable() version is easier
to follow. If we truncated lines, we would have to adjust line ranges
accordingly.