Commit graph

3244 commits

Author SHA1 Message Date
Yuya Nishihara
383cca4c4d diff: return matching hunk contents from all inputs
We're likely to use the right (or new) context lines in rendered diffs, but
it's odd that the hunks iterator choose which context hunk to return. We'll
also need both contents to calculate left/right line numbers.

Since the hunk content types are the same, I also split enum DiffHunk into
{ kind, contents } pair.
2024-10-06 09:45:27 +09:00
Yuya Nishihara
c5f926103a diff: use low-level HashTable in Histogram
This change made some diff benches slow, maybe because the generated code
becomes slightly worse due to the added abstraction? I'll revisit the
performance problem later. There are a couple of ways to mitigate it.

```
group                             new                     old
-----                             ---                     ---
bench_diff_git_git_read_tree_c    1.02     61.0±0.23µs    1.00     59.7±0.38µs
bench_diff_lines/modified/10k     1.00     41.6±0.24ms    1.02     42.3±0.22ms
bench_diff_lines/modified/1k      1.00      3.8±0.07ms    1.00      3.8±0.03ms
bench_diff_lines/reversed/10k     1.29     23.4±0.20ms    1.00     18.2±0.26ms
bench_diff_lines/reversed/1k      1.05    517.2±5.55µs    1.00   493.7±59.72µs
bench_diff_lines/unchanged/10k    1.00      3.9±0.10ms    1.08      4.2±0.10ms
bench_diff_lines/unchanged/1k     1.01    356.8±2.33µs    1.00    353.7±1.99µs
```
(I don't get stable results on my noisy machine, so the results would vary.)
2024-10-05 08:12:30 +09:00
Yuya Nishihara
de137c8f9a diff: implement some ignore-space rules
The added comparison functions correspond to --ignore-all-space and
--ignore-space-change. --ignore-space-at-eol can be combined with the other
flags, so it will have to be implemented as a preprocessing function.
--ignore-blank-lines will also require some change in the tokenizer function.
2024-10-05 08:12:30 +09:00
Yuya Nishihara
f672c92509 diff: add trait for bytes comparison
This could be implemented as a newtype `Wrapper<'a>(&'a [u8])`, but a lifetime
of the wrap function couldn't be specified correctly:

  fn diff(left: &[u8], right: &[u8], wrap_fn: F, ..)
  where
    F: for<'a> Fn(&'a [u8]) -> W<'a>, // F::Output<'a> can't be specified
    W: Copy + Eq + Hash

If the wrapper were of `&Wrapper([u8])` type, `Fn(&[u8]) -> &W` works. However,
it means we can no longer set comparison parameter (such as Regex) dynamically.

Another idea is to add some filter function of `Fn(&[u8]) -> Cow<'_, [u8]>`
type, but I don't think we would want to pay the allocation cost in
hashing/comparison code. `Fn(&[u8]) -> impl Iterator<Item = &[u8]>` might work,
but it would be equally complex.
2024-10-05 08:12:30 +09:00
Yuya Nishihara
dfaa52c88a cargo: add hashbrown dependency
We'll use low-level HashTable to customize Eq/Hash without implementing newtype
wrappers.

Unneeded default features are disabled for now. Note that the new default
hasher, foldhash, is released under the Zlib license, which isn't currently
included in the allow list.
2024-10-05 08:12:30 +09:00
Samuel Tardieu
43711de61c style: use .filter_map() instead of .flat_map() where appropriate 2024-10-04 22:29:13 +02:00
Samuel Tardieu
12f4d6d17b style: avoid using .to_owned()/.to_vec() on owned objects
`.clone()` is more explicit when we already have an object
of the right type.
2024-10-04 22:29:13 +02:00
Samuel Tardieu
3f2ef2ee04 style: add semicolon at the end of expressions used as statements 2024-10-04 22:29:13 +02:00
Samuel Tardieu
46e2723464 style: inline variables into format strings 2024-10-04 22:29:13 +02:00
Samuel Tardieu
62f582e6ab style: remove useless uses of .iter()
Most collection references implement `.into_iter()` or its mutable version,
so it is possible to iterate over the elements without using an explicit
method to do so.
2024-10-04 22:29:13 +02:00
Samuel Tardieu
3f0703ca2c cargo: inherit lints configuration from workspace 2024-10-04 22:29:13 +02:00
Samuel Tardieu
8ba33b7383 lib: add short method summary to its documentation 2024-10-04 17:09:54 +02:00
Samuel Tardieu
87840a5c2c testutils: add short method summary to its documentation 2024-10-04 17:09:54 +02:00
Samuel Tardieu
ac95a86584 proc-macros: add short method summary to its documentation 2024-10-04 17:09:54 +02:00
Matt Stark
6878b5047b Fix: Disallow revset function names starting with a number. 2024-10-04 15:25:11 +10:00
Yuya Nishihara
4f8ae69367 diff: keep absolute ranges to support unchanged regions of different lengths
We have two options to achieve "diff --ignore-*-space":
 a. preprocess contents to be diffed, then translate hunk ranges back
 b. add hooks to customize eq and hash functions
I originally thought (a) would be easier, but actually, there aren't many
changes needed to implement (b). And (b) should have a fewer logic errors.

This patch removes assumption that each unchanged region has the same content
length. It won't be true if whitespace characters are ignored.
2024-10-01 21:24:02 +09:00
Yuya Nishihara
e28fdac48e diff: extract helper that returns texts between two UnchangedRanges 2024-10-01 21:24:02 +09:00
Yuya Nishihara
055f15b6a8 diff: don't destructure UnchangedRange in a loop, rename base_range field
I'll replace offsets with absolute [Range, ..] to support unchanged regions of
different lengths. All fields in UnchangedRange will be ranges.
2024-10-01 21:24:02 +09:00
Yuya Nishihara
db226d9f64 revset: fix crash on "log --at-op 00000000 -r 'root()'"
Spotted while refactoring IdPrefixContext.
2024-10-01 21:23:47 +09:00
Yuya Nishihara
0ac6df7073 revset: make present(unknown@) recover from missing working copy error
Missing working-copy commit is similar situation to unknown ref, and should
be caught by present().
2024-10-01 20:04:06 +09:00
Yuya Nishihara
68176d965e diff: do not translate word-range indices by collect_unchanged_ranges()
Intersection of unchanged ranges becomes a simple merge-join loop, so I've
removed the existing tests. I also added a fast path for the common 2-way
diffs in which we don't have to build vec![(pos, vec![pos])].

One source of confusion introduced by this change is that WordPosition means
both global and local indices. This is covered by the added tests, but I might
add separate local/global types later.
2024-10-01 06:31:22 +09:00
Yuya Nishihara
483db9d7d2 diff: reuse result vec when recurse into unchanged_ranges()
It's silly that we build new Vec for each recursion stack and merge elements
back. I don't see a measurable performance difference in the diff bench, but
this change will help simplify the next patch. If a result vec were created for
each unchanged_ranges() invocation, it would probably make more sense to return
a list of "local" word positions. Then, callers would have to translate the
returned positions to the caller's local positions.
2024-10-01 06:31:22 +09:00
Yuya Nishihara
f6277bbdb8 diff: remove redundant check for empty ranges from unchanged_ranges() recursion
It's cheap to create an empty Vec, and I'm going to remove it anyway.
2024-10-01 06:31:22 +09:00
Yuya Nishihara
64e1ae277d diff: remove redundant hash map lookup of uncommon shared words 2024-09-28 07:49:28 +09:00
Yuya Nishihara
5c52b4ec13 diff: omit construction of count-to-words map for right-side histogram
This also allows us to borrow Vec<WordPositions> from &self.
2024-09-28 07:49:28 +09:00
Yuya Nishihara
493f610fd5 diff: build left-right index map without using (word, occurrence) hash keys
We can assign a unique integer to each (word, occurrence) pair instead. As a
bonus, HashMap can be replaced with Vec.

```
group                             new                     old
-----                             ---                     ---
bench_diff_git_git_read_tree_c    1.00     72.5±3.25µs    1.08     78.5±0.48µs
bench_diff_lines/modified/10k     1.00     45.1±1.18ms    1.10     49.8±1.85ms
bench_diff_lines/modified/1k      1.00      4.1±0.07ms    1.11      4.5±0.34ms
bench_diff_lines/reversed/10k     1.00     19.0±0.12ms    1.12     21.2±1.26ms
bench_diff_lines/reversed/1k      1.00   558.5±37.42µs    1.17   655.6±16.27µs
bench_diff_lines/unchanged/10k    1.00      5.3±0.78ms    1.33      7.0±0.89ms
bench_diff_lines/unchanged/1k     1.00   422.0±16.68µs    1.28   540.7±13.96µs
```
2024-09-28 07:49:28 +09:00
Yuya Nishihara
6cc76ba543 revset: fix copy-paste error in conflict() deprecation message 2024-09-25 16:26:49 +09:00
Yuya Nishihara
dd93e8f60b diff: introduce newtype that represents word-range index
There are usize text indices/ranges and word-range indices. Let's make them
somewhat distinct.
2024-09-25 07:39:41 +09:00
Yuya Nishihara
739a5d8617 diff: pack (text, ranges) pair in a struct
I'll add a few more helper methods there. It might also make sense to cache
precomputed hash values.

unchanged_ranges() is made private since there are no external callers, and
I'm going to add more private types.
2024-09-25 07:39:41 +09:00
Yuya Nishihara
1b469321e2 diff: sort word occurrences only by positions
Since uncommon_shared_words are unique, their occurrence positions should also
be unique.
2024-09-25 07:39:41 +09:00
Yuya Nishihara
5842267c73 diff: use iter::zip() instead of slice indexing 2024-09-25 07:39:41 +09:00
Essien Ita Essien
895d53f395 Rename conflict and file revsets to conflicts and files.
See discussion thread in linked issue.

With this PR, all revset functions in [BUILTIN_FUNCTION_MAP](8d166c7642/lib/src/revset.rs (L570))
that return multiple values are either named in plural or the naming is hard to misunderstand (e.g. `reachable`)

Fixes: #4122
2024-09-24 20:02:49 +01:00
Samuel Tardieu
90280ad2fd repo: introduce MutableRepo::reparent_descendants() 2024-09-24 09:30:28 +02:00
Samuel Tardieu
736163c8d3 repo: add MutableRepo::rebase_descendants documentation 2024-09-24 09:30:28 +02:00
Yuya Nishihara
49e45cc245 revset, templater: add deprecation warnings 2024-09-23 07:07:07 +09:00
Yuya Nishihara
8b1760ca5d revset: pass diagnostics receiver around
Stacking at AliasExpanded node looks wonky. If we migrate error handling to
Diagnostics API, it might make sense to remove AliasExpanded node and add
node.aliases: vec![(id, span), ..] field instead.

Some closure arguments are inlined in order to help type inference.
2024-09-23 07:07:07 +09:00
Yuya Nishihara
4b477fa59e fileset: pass diagnostics receiver around, add printing function
CLI tests will be added later.
2024-09-23 07:07:07 +09:00
Yuya Nishihara
df8967970e revset, templater: make context message of nested errors less specific
So that these error variants can be reused as warning contexts.
2024-09-23 07:07:07 +09:00
Yuya Nishihara
6b92305102 dsl_util: add basic diagnostics receiver
This object will be passed around AST processing functions. It's basically a
Vec<ParseError>.
2024-09-23 07:07:07 +09:00
Mateusz Mikuła
8dd3003bec refactor: mark Timestamp struct as Copy 2024-09-22 16:23:53 +02:00
Yuya Nishihara
c02fd3103c op_heads_store: don't test "no heads" error without acquiring lock
I recently got random test_commit_parallel*() failures, and this patch appears
to fix the problem.

SimpleOpHeadsStore::update_op_heads() adds new_id file and removes old_ids
files in that order. It ensures that there exists at least one id file, but it
doesn't mean readdir() can observe the added id file.

 1. process A: get_op_heads() -> opendir()
 2. process B: update_op_heads() -> add_op_head(new_id), remove_op_head(old_id)
 3. process A:                -> readdir() (can miss new_id)

update_op_heads() could do rename(old_ids[0], new_id), but I don't remember if
readdir() can always pick up a renamed entry.
2024-09-21 11:24:00 +09:00
Kevin Liao
412ef36259 cli: Support renaming workspaces
fixes #4342
2024-09-16 19:35:36 -07:00
Yuya Nishihara
a684076f16 hex_util: simplify common_hex_len() a bit to compare input bytes once
I think it's slightly easier to follow if we calculate a diff of input bits
first. I don't know which one is faster, but I assume compiler can optimize to
similar instructions.
2024-09-17 07:02:01 +09:00
Samuel Tardieu
56dbbb8fc6 lib: optimize common prefix computation of two hex strings
Comparing each byte before comparing the nibbles is more efficient. A
benchmark comparing the old and new implementations with various common
prefix lengths shows:

```
Common hex len/old/3    time:   [7.5444 ns 7.5807 ns 7.6140 ns]
Common hex len/new/3    time:   [1.2100 ns 1.2144 ns 1.2192 ns]
Common hex len/old/6    time:   [11.849 ns 11.879 ns 11.910 ns]
Common hex len/new/6    time:   [1.9950 ns 2.0046 ns 2.0156 ns]
Common hex len/old/32   time:   [63.030 ns 63.345 ns 63.718 ns]
Common hex len/new/32   time:   [6.4647 ns 6.4800 ns 6.4999 ns]
```
2024-09-15 18:32:28 +02:00
Yuya Nishihara
c6ee6130da revset: use generic GraphEdge type in default graph iterator 2024-09-15 07:06:47 +09:00
Yuya Nishihara
bea013acd6 id_prefix: fix crash on hidden change id disambiguation
The short-prefixes revset may contain remote_branches() for example.

Fixes #4446
2024-09-13 19:32:53 +09:00
Martin von Zweigbergk
63e616c801 git: restore support for git.push-branch-prefix config but deprecate it 2024-09-12 23:28:30 -07:00
Martin von Zweigbergk
8d4445d5d1 bookmarks: rename proto symbols from "branch"
Proto fields are identified by the tag (and the message names are not
used), so it's safe to rename them.
2024-09-11 20:49:50 -07:00
Martin von Zweigbergk
1aa2aec141 bookmarks: update some leftover uses of the word "branch" 2024-09-11 19:19:31 -07:00
Philip Metzger
d9c68e08b1 everything: Rename branches to bookmarks
Jujutsu's branches do not behave like Git branches, which is a major
hurdle for people adopting it from Git. They rather behave like
Mercurial's (hg) bookmarks. 

We've had multiple discussions about it in the last ~1.5 years about this rename in the Discord, 
where multiple people agreed that this _false_ familiarity does not help anyone. Initially we were 
reluctant to do it but overtime, more and more users agreed that `bookmark` was a better for name 
the current mechanism. This may be hard break for current `jj branch` users, but it will immensly 
help Jujutsu's future, by defining it as our first own term. The `[experimental-moving-branches]` 
config option is currently left alone, to force not another large config update for
users, since the last time this happened was when `jj log -T show` was removed, which immediately 
resulted in breaking users and introduced soft deprecations.

This name change will also make it easier to introduce Topics (#3402) as _topological branches_ 
with a easier model. 

This was mostly done via LSP, ripgrep and sed and a whole bunch of manual changes either from
me being lazy or thankfully pointed out by reviewers.
2024-09-11 18:54:45 +02:00