Commit graph

2647 commits

Author SHA1 Message Date
Yuya Nishihara
0c0eb37f2e index: don't store commit ids in sorted lookup table to save disk space
This reduces the index file size. In my linux mirror repo containing 1591524
commits, the initial index file shrank from 122MB to 92MB. In theory, this
makes commit id lookup slow because of additional indirection and cache miss,
but I don't see significant difference. In mid-size repo, this is actually a
bit faster thanks to smaller index reads.

Alternatively, the commit id field could be removed from the CommitGraphEntry,
but doing that would introduce indirect lookup there, and the index disk size
isn't as small as this change.

- jj-0 baseline                         122MB
- jj-1 shrink CommitLookupEntry (this)   92MB
- jj-3 shrink CommitGraphEntry           98MB

Mid-size repo, "log" with default template
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2,jj-3 \
  -s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
  "target/release-with-debug/{bin} -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     177.7 ms ±  12.9 ms    [User: 96.3 ms, System: 81.5 ms]
  Range (min … max):   156.8 ms … 191.2 ms    20 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     169.8 ms ±  13.8 ms    [User: 93.3 ms, System: 76.6 ms]
  Range (min … max):   151.1 ms … 191.5 ms    20 runs

Benchmark 4: target/release-with-debug/jj-3 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     170.3 ms ±  13.4 ms    [User: 90.1 ms, System: 79.7 ms]
  Range (min … max):   154.8 ms … 186.2 ms    20 runs

Relative speed comparison
        1.05 ±  0.11  target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
        1.00          target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
        1.00 ±  0.11  target/release-with-debug/jj-3 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
```

Small repo, "log" thousands of commits with -T"commit_id.shortest()"
```
% hyperfine --sort command --warmup 3 --runs 100 -L bin jj-0,jj-1,jj-2,jj-3 \
  -s "target/release-with-debug/{bin} -R ~/mirrors/git debug reindex" \
  "target/release-with-debug/{bin} -R ~/mirrors/git --ignore-working-copy log -r.. -l5000 -T'commit_id.shortest()' --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/git --ignore-working-copy log -r.. -l5000 -T'commit_id.shortest()' --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     179.3 ms ±  12.8 ms    [User: 149.7 ms, System: 29.6 ms]
  Range (min … max):   155.2 ms … 191.0 ms    100 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/git --ignore-working-copy log -r.. -l5000 -T'commit_id.shortest()' --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     179.1 ms ±  13.7 ms    [User: 148.5 ms, System: 30.5 ms]
  Range (min … max):   157.2 ms … 196.7 ms    100 runs

Benchmark 4: target/release-with-debug/jj-3 -R ~/mirrors/git --ignore-working-copy log -r.. -l5000 -T'commit_id.shortest()' --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     178.2 ms ±  13.6 ms    [User: 148.7 ms, System: 29.6 ms]
  Range (min … max):   156.5 ms … 191.7 ms    100 runs

Relative speed comparison
        1.01 ±  0.11  target/release-with-debug/jj-0 -R ~/mirrors/git --ignore-working-copy log -r.. -l5000 -T'commit_id.shortest()' --config-toml='revsets.short-prefixes=""'
        1.01 ±  0.11  target/release-with-debug/jj-1 -R ~/mirrors/git --ignore-working-copy log -r.. -l5000 -T'commit_id.shortest()' --config-toml='revsets.short-prefixes=""'
        1.01 ±  0.11  target/release-with-debug/jj-3 -R ~/mirrors/git --ignore-working-copy log -r.. -l5000 -T'commit_id.shortest()' --config-toml='revsets.short-prefixes=""'
```
2024-02-19 11:36:45 +09:00
Vladimir Petrzhikovskii
06d67f02d8 cli: list new remote branches during git fetch 2024-02-18 17:36:01 +01:00
Yuya Nishihara
a1b16c5583 index: build reachable change ids set lazily
Instead of abstracting RevWalk over borrowed/Arc-ed index types, I decided to
implement bitset-based ancestor traversal. It's simpler and probably faster so
long as the set isn't sparse.

"jj log" without working copy snapshot:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
  -s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
  "target/release-with-debug/{bin} -R ~/mirrors/linux \
   --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     271.3 ms ±   9.9 ms    [User: 183.8 ms, System: 87.7 ms]
  Range (min … max):   250.5 ms … 282.7 ms    20 runs

Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     177.5 ms ±  12.6 ms    [User: 94.6 ms, System: 82.9 ms]
  Range (min … max):   154.4 ms … 188.7 ms    20 runs

Relative speed comparison
        1.53 ±  0.12  target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
        1.00          target/release-with-debug/jj-2 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
```

"jj status" with working copy snapshot (watchman enabled):
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
  -s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
  "target/release-with-debug/{bin} -R ~/mirrors/linux \
   status --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     318.6 ms ±  12.6 ms    [User: 219.1 ms, System: 94.1 ms]
  Range (min … max):   294.2 ms … 333.0 ms    20 runs

Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     214.7 ms ±  15.0 ms    [User: 117.4 ms, System: 96.1 ms]
  Range (min … max):   198.4 ms … 243.3 ms    20 runs

Relative speed comparison
        1.48 ±  0.12  target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
        1.00          target/release-with-debug/jj-2 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
```
2024-02-19 00:54:43 +09:00
Yuya Nishihara
adcb01ef95 index: move RevWalk tests to inner module
The main tests module is getting bigger, and these tests are very specific
to the RevWalk* implementations.
2024-02-19 00:54:43 +09:00
Yuya Nishihara
924a5fc842 index: inline entry size calculation
There aren't many callers now, and using self.commit_id_length might help
compiler remove redundant bounds checking in CommitLookupEntry.
2024-02-19 00:47:46 +09:00
Yuya Nishihara
d5c75da4f5 index: precompute base data offsets
These offsets are getting messier, so let's calculate them in one place. This
will probably help compiler optimization.
2024-02-19 00:47:46 +09:00
Yuya Nishihara
3c7aa75b9b index: switch to persistent change id index
The shortest change id prefix will become a few digits longer, but I think
that's acceptable. Entries included in the "revsets.short-prefixes" set are
unaffected.

The reachable set is calculated eagerly, but this is still faster as we no
longer need to sort the reachable entries by change id. The lazy version will
save another ~100ms in mid-size repos.

"jj log" without working copy snapshot:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
  -s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
  "target/release-with-debug/{bin} -R ~/mirrors/linux \
   --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     353.6 ms ±  11.9 ms    [User: 266.7 ms, System: 87.0 ms]
  Range (min … max):   329.0 ms … 365.6 ms    20 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     271.3 ms ±   9.9 ms    [User: 183.8 ms, System: 87.7 ms]
  Range (min … max):   250.5 ms … 282.7 ms    20 runs

Relative speed comparison
        1.99 ±  0.16  target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
        1.53 ±  0.12  target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
```

"jj status" with working copy snapshot (watchman enabled):
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
  -s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
  "target/release-with-debug/{bin} -R ~/mirrors/linux \
   status --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     396.6 ms ±  10.1 ms    [User: 300.7 ms, System: 94.0 ms]
  Range (min … max):   373.6 ms … 408.0 ms    20 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     318.6 ms ±  12.6 ms    [User: 219.1 ms, System: 94.1 ms]
  Range (min … max):   294.2 ms … 333.0 ms    20 runs

Relative speed comparison
        1.85 ±  0.14  target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
        1.48 ±  0.12  target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
```
2024-02-18 09:44:57 +09:00
Yuya Nishihara
5f3a31300b index: implement index-level change id lookup methods
These methods are basically the same as the commit_id versions, but
resolve_change_id_prefix() is a bit more involved as we need to gather matches
from multiple segments.
2024-02-18 09:44:57 +09:00
Yuya Nishihara
f73e590837 index: implement segment-level change id lookup methods
In resolve_change_id_prefix(), I've implemented two different ways of
collecting the overflow items. I don't think they impact the performance,
but we can switch to the alternative method as needed.
2024-02-18 09:44:57 +09:00
Yuya Nishihara
8cdf6d752c index: move change ids to sstable, build change-id-to-pos lookup table
This basically means that the change ids are interned. We'll implement binary
search over the sorted change ids table. The table could be sorted differently
for better cache locality, but it is in lexicographical order for simplicity.
With my testing, the cost of the id lookup isn't dominant.

Unlike the parent entries, the size of the per-id overflow items isn't saved.
That's s because the number of the same-change-id commits is either 1 or many.
It doesn't make sense to allocate 8 bytes for each change id. Instead, we'll
pay extra indirection cost to determine the size.
2024-02-18 09:44:57 +09:00
Yuya Nishihara
9974a46327 index: clarify parent entries are global positions
I'm going to add change id overflow table whose elements are of LocalPosition
type. Let's make sure that the serialization code would break if we changed
the underlying data type.
2024-02-18 09:44:57 +09:00
Thomas Castiglione
aaa5d6bc4f working_copy: add Send supertrait
If WorkingCopy: Send, then Workspace is Send, which is useful for long-running
servers. All existing impls are Send already, so this is just a marker.
2024-02-17 15:13:25 +08:00
Yuya Nishihara
5eea88d26a tests: fix concurrent git read/write test to retry on ref lock contention
Apparently, gix has 100ms timeout. Since this test tries to create contended
situation, it's possible that the ref lock can't be acquired. I've added
upper bound to the retry loop at b37293fa68 "tests: add upper bound to
test_concurrent_read_write_commit() loop", so ignoring arbitrary errors
should be okay.

The problem can be reproduced on my Linux machine by inserting 10ms sleep() to
gix and increasing the concurrency.

Fixes #3069
2024-02-17 15:09:27 +09:00
Yuya Nishihara
ce295f8bc2 op_store: remove unneeded repr(u8) from RemoteRefState
It no longer makes sense after e1fd402d39 "Fix the ContentHash implementations
for std::Option, MergedTreeId, and RemoteRefState."
2024-02-17 02:13:44 +09:00
Yuya Nishihara
718d080e7a index: make reindexing message less scary 2024-02-17 01:45:23 +09:00
Evan Mesterhazy
a80d0183a2 Implement ContentHash for u32 and u64
This is for completeness and to avoid accidents such as someone calling
`ContentHash::hash(1234u32.to_le_bytes())` and expecting it to hash properly as
a u32 instead of a 4 byte slice, which produces a different hash due to hashing
the length of the slice before its contents.
2024-02-16 10:23:39 -05:00
Evan Mesterhazy
e1fd402d39 Fix the ContentHash implementations for std::Option, MergedTreeId, and RemoteRefState
The `ContentHash` documentation specifies that implementations for enums should
hash the ordinal number of the variant contained in the enum as a 32-bit
little-endian number and then hash the contents of the variant, if any.

The current implementations for `std::Option`, `MergedTreeId`, and
`RemoteRefState` are non-conformant since they hash the ordinal number as a u8
with platform specific endianness.


Fixes #3051
2024-02-16 09:27:32 -05:00
Yuya Nishihara
903f18acfd index: extract helper functions for id lookup in mutable table
Similar to the previous commit, these functions will be reused by the change id
lookup methods. The return value isn't cloned because resolve_id_prefix() will
return (key, value) pair, and the current caller doesn't need a cloned value.
2024-02-16 11:12:53 +09:00
Yuya Nishihara
000cb41c7e index: extract helper struct for post processing binary search result
This code will be shared among commit id and change id lookup functions.
2024-02-16 11:12:53 +09:00
Yuya Nishihara
6fa660d9a8 index: extract inner binary search function
The callback returns Ordering instead of &[u8] due to lifetime difficulty.
2024-02-16 11:12:53 +09:00
Yuya Nishihara
2e64bf83fd index: pass bytes prefix to binary search function
This helps extract common binary search helper to be used by change id index.
2024-02-14 23:34:47 +09:00
Yuya Nishihara
91a68b950d index: adjust binary search function to conform to std behavior
This removes redundant case from resolve_neighbor_commit_ids(). The returned
position should never be lower than the prefix id.

The implementation is basically a copy of slice::binary_search_by(). We still
use (low + high) / 2 as the size wouldn't exceed 2^31.

https://github.com/rust-lang/rust/blob/1.76.0/library/core/src/slice/mod.rs#L2825
2024-02-14 23:34:47 +09:00
Yuya Nishihara
e2c8a8fabd index: fix change id resolution test to not depend on deterministic order
Since IdIndex sorts the entries by using .sort_unstable_by_key(), the order of
the same-key elements is undefined. Perhaps, it's stable for short arrays, and
the test passes because of that.
2024-02-14 23:22:23 +09:00
Yuya Nishihara
8b1dfa7157 index: compact parent encoding, inline up to two parents
This saves 4 more bytes per entry, and more importantly, most commit parents
can be resolved with no indirection to the overflow table.

IIRC, Git always inlines the first parent, but that wouldn't be useful in jj
since jj diffs merge commit against the auto-merge parent. The first merge
parent is nothing special.

I'll use a similar encoding in change id sstable, where only one position
will be inlined (to optimize for imported commits.)

Benchmark number measuring the cost of change id index building:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1 \
  -s "target/release-with-debug/{bin} -R ~/mirrors/linux \
      --ignore-working-copy debug reindex" \
  "target/release-with-debug/{bin} -R ~/mirrors/linux \
    --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     342.9 ms ±  14.5 ms    [User: 202.4 ms, System: 140.6 ms]
  Range (min … max):   326.6 ms … 360.6 ms    20 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=""'
  Time (mean ± σ):     325.0 ms ±  13.6 ms    [User: 196.2 ms, System: 128.8 ms]
  Range (min … max):   311.6 ms … 343.2 ms    20 runs

Relative speed comparison
        1.06 ±  0.06  target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=""'
        1.00          target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r@ --config-toml='revsets.short-prefixes=""'
```
2024-02-14 22:33:48 +09:00
Yuya Nishihara
89928ffdd8 index: remove local-global pos round trip from entry_by_id() 2024-02-14 22:33:48 +09:00
Yuya Nishihara
249449ff1a index: store local position in lookup table 2024-02-14 22:33:48 +09:00
Yuya Nishihara
1d11cffcfa index: use local position in segment-local operations
I'm going to change the index format to store local positions in the lookup
table. That's not super important, but I think it makes sense because the
lookup table should never contain inter-segment links.

The mutable segment now stores local positions in its lookup map. The readonly
segment will be updated later.
2024-02-14 22:33:48 +09:00
Yuya Nishihara
3e43108abb index: remove unused flag field from readonly index segment
This is remainder of fdb861b957 "backend: remove unused Commit::is_pruned."
As I'm going to change the index format, let's remove unused fields, too.
2024-02-14 22:33:48 +09:00
Yuya Nishihara
26528091e6 revset: drop now unused is_legacy flag from dag ranges 2024-02-14 10:04:56 +09:00
Yuya Nishihara
815670a4ad revset: add parsing rule and expression node dedicated for kind:"pattern"
This unblocks removal of 'is_legacy: bool' fields.

Note that all legacy dag range expressions can't be accepted by the new grammar.
For example, 'x:y()' is parsed as ('x:y', error) because 'x:y' is a valid string
pattern expression, and '(' isn't an infix operator. The old compat_dag_range_op
is NOT removed as it can still translate 'x():y' or 'x:(y)' to a better error,
and we might make the string pattern syntax stricter #2101.
2024-02-14 10:04:56 +09:00
Yuya Nishihara
815437598f revset: disable parsing rules of legacy dag range operator
The legacy parsing rules are turned into compatibility errors. The x:y rule
is temporarily enabled when parsing string patterns. It's weird, but we can't
isolate the parsing function because a string pattern may be defined in an
alias.
2024-02-14 10:04:56 +09:00
Yuya Nishihara
2905a70b18 doc, tests: drop use of deprecated revset dag range operator 2024-02-14 10:04:56 +09:00
Yuya Nishihara
1f6d1de62d index: on reindexing, print error details to stderr
It's not ideal to print the error there, but using stderr should be slightly
better. It could be a tracing message, but tracing won't be displayed by
default.
2024-02-12 19:38:36 +09:00
Yuya Nishihara
b0e8e2a1af index: move segment files to sub directory, add version number
I'm going to introduce breaking changes in index format. Some of them will
affect the file size, so version number or signature won't be needed. However,
I think it's safer to detect the format change as early as possible.

I have no idea if embedded version number is the best way. Because segment
files are looked up through the operation links, the version number could be
stored there and/or the "segments" directory could be versioned. If we want to
support multiple format versions and clients, it might be better to split the
tables into data chunks (e.g. graph entries, commit id table, change id table),
and add per-chunk version/type tag. I choose the per-file version just because
it's simple and would be non-controversial.

As I'm going to introduce format change pretty soon, this patch doesn't
implement data migration. The existing index files will be deleted and new
files will be created from scratch.

Planned index format changes include:
 1. remove unused "flags" field
 2. inline commit parents up to two
 3. add sorted change ids table
2024-02-12 19:38:36 +09:00
Yuya Nishihara
4b541e6c93 index: on reinit(), don't remove "operations" directory itself
This should be slightly safer as the store may be accessed concurrently from
another process.
2024-02-12 19:38:36 +09:00
Yuya Nishihara
81837897dc index: extract dir.join("operations") to private method 2024-02-12 19:38:36 +09:00
Martin von Zweigbergk
48a9f9ef56 repo: use Transaction for creating repo-init operation
Since the operation log has a root operation, we don't need to create
the repo-initialization operation in order to create a valid
`ReadonlyRepo` instance. I think it's conceptually simpler to create
the instance at the root operation id and then add the initial
operation using the usual `Transaction` API. That's what this patch
does.

Doing that also brought two issues to light:

 1. The empty view object doesn't have the root commit as head.
 2. The initialized `OpHeadsStore` doesn't have the root operation as
     head.

Both of those seem somewhat reasonable, but maybe we should change
them. For now, I just made the initial repo (before the initial
operation) have a single op head (to compensate for (2)). It might be
worth addressing both issues so the repo is in a better state before
we create the initial operation. Until we do, we probably shouldn't
drop the initial operation.
2024-02-11 21:19:30 -08:00
Martin von Zweigbergk
305a507ae3 repo: move creation of repo-init operation to end of init()
Since we now have a root operation, we don't need the
repo-initialization operation to create the repo. Let's move it later
to clarify that.
2024-02-11 21:19:30 -08:00
Ilya Grigoriev
a9c3af8153 test_local_working_copy: use std::fs:write instead of OpenOptions 2024-02-10 16:06:28 -08:00
Ilya Grigoriev
b2e37d448b clippy: add truncate option as suggested by clippy
In the next commit, I replace the whole thing with
std::fs::write, but I'll leave this here in case
the next commit is somhow incorrect
2024-02-10 16:06:28 -08:00
Ilya Grigoriev
a88c06068e clippy: new nightly fixes
For some reason, clippy also suggested surrounding
`self.value` with parentheses. Not sure whether
that's a clippy bug.

Cc: https://github.com/rust-lang/rust-clippy/issues/12268
2024-02-10 16:06:28 -08:00
dependabot[bot]
6d1faf9b03 Update strsim (changes tests), clap, clap_complete
This is #3002 with tests rerun to account for changes
to `strsim`, as @thoughtpolice noticed in
https://github.com/martinvonz/jj/pull/3002#issuecomment-1936763101

The string similarity changes include an example that
seems better and one that seems worse. Decreasing
the threshold definitely makes things worse.
2024-02-10 00:01:47 -08:00
Yuya Nishihara
e908bd9a17 simple_op_store: use TryFrom<i32> instead of deprecated from_i32() 2024-02-10 09:15:30 +09:00
Yuya Nishihara
421ab592be cargo: bump gix to 0.58.0, migrate to ObjectId::try_from()
The panicking conversion function appears to be renamed, and try_from() is
added instead.
2024-02-10 09:15:30 +09:00
Austin Seipp
5b517b542e rust: bump MSRV to 1.76.0
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2024-02-09 15:48:01 -06:00
Martin von Zweigbergk
6c1aeff7a9 working copy: materialize symlinks on Windows as regular files
I was a bit surprised to learn (or be reminded?) that checking out
symlinks on Windows leads to a panic. This patch fixes the crash by
materializing symlinks from the repo as regular files. It also updates
the snapshotting code so we preserve the symlink-ness of a path. The
user can update the symlink in the repo by updating the regular file
in the working copy. This seems to match Git's behavior on Windows
when symlinks are disabled.
2024-02-09 09:20:24 -08:00
Martin von Zweigbergk
b253a28788 merge: add as_normal(), taken from RefTarget
The `RefTarget::as_normal()` function is not specific to `RefTarget`,
and I plan to use it from `local_working_copy`.
2024-02-09 09:20:24 -08:00
Martin von Zweigbergk
5a898b16a8 working_copy: handle symlink outside write_path_to_store()
The `write_path_to_store()` has almost no overlapping code between the
handling of symlinks and regular files, which suggests that we should
move out the handling of symlinks to the caller (there's only one).
2024-02-09 09:20:24 -08:00
Jonathan Tan
33f3a420a1 workspace: recover from missing operation
If the operation corresponding to a workspace is missing for some reason
(the specific situation in the test in this commit is that an operation
was abandoned and garbage-collected from another workspace), currently,
jj fails with a 255 error code. Teach jj a way to recover from this
situation.

When jj detects such a situation, it prints a message and stops
operation, similar to when a workspace is stale. The message tells the
user what command to run.

When that command is run, jj loads the repo at the @ operation (instead
of the operation of the workspace), creates a new commit on the @
commit with an empty tree, and then proceeds as usual - in particular,
including the auto-snapshotting of the working tree, which creates
another commit that obsoletes the newly created commit.

There are several design points I considered.

1) Whether the recovery should be automatic, or (as in this commit)
manual in that the user should be prompted to run a command. The user
might prefer to recover in another way (e.g. by simply deleting the
workspace) and this situation is (hopefully) rare enough that I think
it's better to prompt the user.

2) Which command the user should be prompted to run (and thus, which
command should be taught to perform the recovery). I chose "workspace
update-stale" because the circumstances are very similar to it: it's
symptom is that the regular jj operation is blocked somewhere at the
beginning, and "workspace update-stale" already does some special work
before the blockage (this commit adds more of such special work). But it
might be better for something more explicitly named, or even a sequence
of commands (e.g. "create a new operation that becomes @ that no
workspace points to", "low-level command that makes a workspace point to
the operation @") but I can see how this can be unnecessarily confusing
for the user.

3) How we recover. I can think of several ways:
a) Always create a commit, and allow the automatic snapshotting to
create another commit that obsoletes this commit.
b) Create a commit but somehow teach the automatic snapshotting to
replace the created commit in-place (so it has no predecessor, as viewed
in "obslog").
c) Do either a) or b), with the added improvement that if there is no
diff between the newly created commit and the former @, to behave as if
no new commit was created (@ remains as the former @).
I chose a) since it was the simplest and most easily reasoned about,
which I think is the best way to go when recovering from a rare
situation.
2024-02-09 00:38:47 -08:00
Ilya Grigoriev
12c3be70f4 lib refs.rs: rename TrackingRefPair to LocalAndRemoteRef
As discussed in
https://github.com/martinvonz/jj/pull/2962#discussion_r1479384841, the
previous name is confusing since the struct is used for pairs where the
remote branch is not tracked by the local branch.
2024-02-07 17:06:28 -08:00
jyn
d66fcf2ca0 compile integration tests as a single binary
this greatly speeds up the time to run all tests, at the cost of slightly larger recompile times for individual tests.

this unfortunately adds the requirement that all tests are listed in `runner.rs` for the crate.
to avoid forgetting, i've added a new test that ensures the directory is in sync with the file.

 ## benchmarks

before this change, recompiling all tests took 32-50 seconds and running a single test took 3.5 seconds:

```
; hyperfine 'touch lib/src/lib.rs && cargo t --test test_working_copy'
  Time (mean ± σ):      3.543 s ±  0.168 s    [User: 2.597 s, System: 1.262 s]
  Range (min … max):    3.400 s …  3.847 s    10 runs
```

after this change, recompiling all tests take 4 seconds:
```
;  hyperfine 'touch lib/src/lib.rs ; cargo t --test runner --no-run'
  Time (mean ± σ):      4.055 s ±  0.123 s    [User: 3.591 s, System: 1.593 s]
  Range (min … max):    3.804 s …  4.159 s    10 runs
```
and running a single test takes about the same:
```
; hyperfine 'touch lib/src/lib.rs && cargo t --test runner -- test_working_copy'
  Time (mean ± σ):      4.129 s ±  0.120 s    [User: 3.636 s, System: 1.593 s]
  Range (min … max):    3.933 s …  4.346 s    10 runs
```

about 1.4 seconds of that is the time for the runner, of which .4 is the time for the linker. so
there may be room for further improving the times.
2024-02-06 18:19:41 -08:00
Ilya Grigoriev
1741ab22e4 view.rs: clarify some internal function docstrings
Mostly, I was a bit confused that some of these functions return a
`TrackingRefPair` but don't seem to take into account whether the remote
branch is being tracked or not.
2024-02-06 17:52:01 -08:00
Martin von Zweigbergk
b343289238 working_copy: make reset() take a commit instead of a tree
Our virtual file system at Google (CitC) would like to know the commit
so it can scan backwards and find the closest mainline tree based on
it. Since we always record an operation id (which resolves to a
working-copy commit) when we write the working-copy state, it doesn't
seem like a restriction to require a commit.
2024-02-06 12:41:09 -08:00
Yuya Nishihara
77ceadbfd0 cleanup: remove remaining ": {source}" from error message templates 2024-02-04 09:13:21 +09:00
Yuya Nishihara
1efadd96c8 git: remove ": {source}" from FailedRefExportReason, walk chain by caller
The error output gets more verbose because all gix error sources are printed.
Maybe we'll need a better formatting, but changing to multi-line output doesn't
look nice either.
2024-02-04 09:13:21 +09:00
Yuya Nishihara
a0cefb8b7b revset, template: remove ": {source}" from parse error message template
These error types are special because the message is embedded in ASCII art. I
think it would be a source of bugs if some error types had ": {source}" but
others don't. So I'm going to remove all ": {source}"s, and let the callers
concatenate them when needed.
2024-02-04 09:13:21 +09:00
Ilya Grigoriev
d439de073d rewrite.rs: revert commits cfcc7c5e and becbc889
This mostly reverts https://github.com/martinvonz/jj/pull/2901 as well as its
fixup https://github.com/martinvonz/jj/pull/2903. The related bug is reopened,
see https://github.com/martinvonz/jj/issues/2869#issuecomment-1920367932.

The problem is that while the fix did fix #2869 in most cases, it did
reintroduce the more severe bug https://github.com/martinvonz/jj/issues/2760
in one case, if the working copy is the commit being rebased.

For example, suppose you have the tree

```
root -> A -> B -> @ (empty) -> C
```

### Before this commit

#### Case 1

`jj rebase -s B -d root --skip-empty` would work perfectly before this
commit, resulting in

```
root -> A
  \-------B -> C
           \- @ (new, empty)
```

#### Case 2

Unfortunately, if you run `jj rebase -s @ -d A --skip-empty`, you'd have the
following result (before this commit), which shows the reintroduction of #2760:

```
root -> A @ -> C
         \-- B
```

with the working copy at `A`. The reason for this is explained in
https://github.com/martinvonz/jj/pull/2901#issuecomment-1920043560.

### After this commit

After this commit, both case 1 and case 2 will be wrong in the sense of #2869,
but it will no longer exhibit the worse bug #2760 in the second case.

Case 1 would result in:

```
root -> A
  \-------B -> @ (empty) -> C
```

Case 2 would result in:

```
root -> A -> @ -> C
         \-- B
```

with the working copy remaining a descendant of A
2024-02-03 15:56:44 -08:00
Essien Ita Essien
8423c63a04 cli: Refactor workspace root directory creation
* Add file_util::create_or_reuse_dir() which is needed by all init
  functionality regardless of the backend.
2024-02-03 14:15:05 +00:00
Yuya Nishihara
ec0f2753ae repo: mark inner error of EditCommitError as source 2024-02-01 16:59:44 +09:00
Martin von Zweigbergk
7c87fe243c backends: implement as_any() on OpStore and OpHeadsStore too
It's useful for custom commands to be able to downcast to custom
backend types.
2024-01-31 00:15:29 -08:00
Ilya Grigoriev
cfcc7c5e34 test_rewrite: Fixup test comment after becbc88 2024-01-30 23:43:05 -08:00
Martin von Zweigbergk
9efa66e8c9 rewrite: remove return value from rebase_next()
`rebase_next()` returns an `Option<RebasedDescendant>`, but the only
way we use it is to decide whether to terminate the loop over
`to_visit`. Let's simplify by making the caller iterate over
`to_visit` instead.
2024-01-30 23:27:48 -08:00
Martin von Zweigbergk
881d75e899 rewrite: drop TODO about changing the API
The `rebase_next()` method is private, so I think we've addressed the
TODO.
2024-01-30 23:27:48 -08:00
Ilya Grigoriev
becbc88915 rewrite.rs: fix working copy position after jj rebase --abandon-empty
Fixes #2869
2024-01-30 22:53:55 -08:00
Ilya Grigoriev
1fff6e37a1 rewrite.rs DescendantRebaser: rename variable for clarity
The `edit` argument seems to be true if and only if the
old commit was *not* abandoned. So, I flipped its value
and renamed it to `abandoned_old_commit`.
2024-01-30 22:53:55 -08:00
Yuya Nishihara
976b801208 index: on reinit(), delete all segment files to save disk space
Perhaps, reinit() will evolve to gc() function? It's basically a gc() with
empty operation set.
2024-01-31 09:40:52 +09:00
Yuya Nishihara
3d68601c01 index: remove redundant stat() of operation link file, handle error instead
This wouldn't matter in practice, but the operation link file could be deleted
after testing the existence.
2024-01-31 09:40:52 +09:00
Yuya Nishihara
3d0b3d57d8 git_backend: on gc(), remove unreachable no-gc refs and compact them
With my jj repo, the number of jj/keep refs went down from 87887 to 27733.
The .git directory size is halved, but we'll need to clean up extra and index
files to save disk space. "git gc --prune=now && jj debug reindex" passed, so
the repo wouldn't be corrupted.

#12
2024-01-27 10:18:11 +09:00
Yuya Nishihara
351487b9f5 backend: pass Index and keep_newer timestamp parameters to gc()
GitBackend::gc() will need to check if a commit is reachable from any
historical operations. This could be calculated from the view and commit
objects, but the Index will do a better job.
2024-01-27 10:18:11 +09:00
Yuya Nishihara
845eb4ce01 git_backend: when running "git gc", chdir instead of specifying it by GIT_DIR
Hopefully this will be more reliable on Windows where path/environment stuff
is messy.
2024-01-27 10:18:11 +09:00
Yuya Nishihara
4e54021930 backend: have gc() return BackendError instead of opaque error type
The gc() implementation is likely to call other backend functions, which
return BackendError.
2024-01-27 10:18:11 +09:00
Yuya Nishihara
84949dd551 backend: mark BackendError::Other as transparent
The inner error should be the source, and I don't think the "Error:" prefix
gives additional context.
2024-01-27 10:18:11 +09:00
Yuya Nishihara
8a67191d25 git: simplify import_head() as it doesn't have to process multiple head commits 2024-01-27 00:01:59 +09:00
Yuya Nishihara
fc114ef217 git: extract Git HEAD handling bits from import_some_refs()
I'm going to make WorkspaceCommandHelper::maybe_snapshot() snapshot the working
copy before importing refs. git::import_some_refs() can rebase the working copy
branch and therefore @ can be moved. git::import_head() doesn't, and it should
be invoked before snapshotting.

git::import_head() is inserted to some of the git:import_refs() callers where
HEAD seems to matter. I feel it's a bit odd that the HEAD ref is imported to
non-colocated repo, but "jj init --git-repo" relies on that, and I think the
existence of HEAD@git is harmless. It's merely a ref to the revision checked
out somewhere else.
2024-01-27 00:01:59 +09:00
Yuya Nishihara
5a88180720 git_backend: fix import_head_commits() to not issue duplicated ref edits
This was broken at afa72ff496 "git_backend: inline prevent_gc() to bulk-update
refs." Since no-gc refs are created within a transaction, duplicated edits are
no longer allowed.
2024-01-27 00:00:57 +09:00
Ilya Grigoriev
dff440c4a8 clippy: Fix nightly warnings about "useless use of vec!" 2024-01-25 22:00:26 -08:00
Daniel Ploch
20cbe77bf5 workspace: support creating shares of custom workspaces 2024-01-25 11:46:07 -08:00
Daniel Ploch
cb889f0b45 workspace: combine working copy functions into a trait 2024-01-25 11:46:07 -08:00
Yuya Nishihara
5a7d8ac596 working_copy: don't follow symlinks when visiting files in gitignored directory
Fixes #2878
2024-01-24 16:38:48 +09:00
Yuya Nishihara
d0d4496258 tests: add executable files and symlinks to gitignored directory test 2024-01-24 16:38:48 +09:00
Martin von Zweigbergk
502150b2f4 conflicts: test materialization with with negative snapshots
We didn't have any tests with negative snapshots (after a `-------`
line). I initially thought we couldn't produce such conflict markers
anymore. I'm not sure we want to render conflicts like the one in the
test like this. I don't think I intended for `add_index` in the code
to be able to be two steps ahead of the remove. Maybe we should
rewrite the algorithm to not do that and thus never produce negative
snapshots.
2024-01-23 07:18:54 -08:00
Ilya Grigoriev
d168fd2b09 test_rebase_abandoning_empty: add children of an empty @ to the test case
This demonstrates the minor bug discussed in
https://github.com/martinvonz/jj/pull/2766#discussion_r1442365389
AKA https://github.com/martinvonz/jj/issues/2869.

It's also interesting whether changing the definition of "discardable" commit
would affect this test, see
https://github.com/martinvonz/jj/issues/2859#issuecomment-1903275884

(I think it won't, but still)
2024-01-22 18:36:49 -08:00
Jonathan Tan
0bc1341fd0 revset: add count_estimate() to Revset trait
The count() function in this trait is used by "jj branch" to determine
(and then report) how many commits a certain branch is ahead/behind
another branch. This is currently implemented by walking all commits
in the revset, counting how many were encountered. But this could be
improved: if the number is large, it is probably sufficient to report
"at least N" (instead of walking all the way), and this does not scale
well to jj backends that may not have all commits present locally (which
may prefer to return an estimate, rather than access the network).

Therefore, add a function that is explicitly documented to be O(1)
and that can return a range of values if the backend so chooses.

Also remove count(), as it is not immediately obvious that it is an
expensive call, and callers that are willing to pay the cost can obtain
the exact same functionality through iter().count() anyway. (In this
commit, all users of count() are migrated to iter().count() to preserve
all existing functionality; they will be migrated to count_estimate() in
a subsequent commit.)

"branch" needed to be updated due to this change. Although jj
is currently only available in English, I have attempted to keep
user-visible text from being assembled piece by piece, so that if we
later decide to translate jj into other languages, things will be easier
for translators.
2024-01-22 15:07:00 -08:00
Yuya Nishihara
c7be4d019c index: add all_heads_for_gc() that iterates heads of all indexed commits
GitBackend::gc() will recreate no-gc refs for the indexed heads. We could
collect all historical heads by traversing operation log, but it isn't enough
because there may be predecessor links to hidden commits, and "git gc" isn't
aware of predecessors.
2024-01-17 23:07:14 +09:00
Yuya Nishihara
afa72ff496 git_backend: inline prevent_gc() to bulk-update refs 2024-01-17 10:43:25 +09:00
Yuya Nishihara
96ee9bdb9f git_backend: ensure no-gc refs are created for all imported head commits
This also means that we can implement GC without taking care of extra
metadata. I haven't tried, but it wouldn't be easy to keep Git refs and extra
table in sync.
2024-01-17 10:43:25 +09:00
Yuya Nishihara
2e1aa6c49c git_backend: remove fast path testing imported commits, filter them by caller
The idea is that GC, if implemented, will clean up objects based on the Index
knowledge. It's probably okay to leave some extra metadata of unreachable
objects, but GC-ed refs should be recreated if the corresponding heads get
reimported. See also the next patch.
2024-01-17 10:43:25 +09:00
Yuya Nishihara
48c4985e34 git_backend: ensure that no-gc ref target never conflicts 2024-01-17 10:43:25 +09:00
Yuya Nishihara
f66c859fe4 git_backend: use lower-level API to create no-gc refs
This will allow us to issue multiple prevent_gc() requests all at once. It's
not important here, but will be unavoidable when implementing GC. Deleting
tons of refs from packed refs is super slow if the requests were processed one
by one.
2024-01-17 10:43:25 +09:00
Yuya Nishihara
34956f17e5 op_walk: assert that virtual root op is not reparented
This is enforced by the caller, but it's scary if it weren't.
2024-01-16 21:46:54 +09:00
Yuya Nishihara
fb3e006a45 op_store: add special case for root id resolution 2024-01-16 21:46:54 +09:00
Yuya Nishihara
660806ffed tests: set up unparented operations for id prefix tests
Otherwise we can't easily pick i to create operation id starting with "0".
2024-01-16 21:46:54 +09:00
Yuya Nishihara
df1be14aa8 tests: split op id resolution tests, don't require merged op for prefix tests
This makes it easy to set up crafted environment for prefix resolution tests.
2024-01-16 21:46:54 +09:00
Essien Ita Essien
dc074363d1 no-op: Move external git repo canonicalization into Workspace::init_git_external
* Move canonicalization of the external git repo path into the Workspace::init_git_external().
  This keeps necessary code together.
* Add a new variant of WorkspaceInitError for reporting path not found errors. The user error
  string is written to pass existing tests.
2024-01-16 10:46:02 +00:00
Yuya Nishihara
da218d19db repo: optimize enforce_view_invariants() to not traverse ancestors until root
Because the default index cuts off the traversal at min(generations), including
the root id means all ancestors will be visited. This could be worked around at
the index side, but I think it's the repo/view's responsibility. That being
said, it's not uncommon to pad a revset with "root()", so it might make sense
for the index to special case the root id.

I also removed the redundant .clone().
2024-01-15 09:57:02 +09:00
Martin von Zweigbergk
6e302bb3a2 op_store: add a virtual root operation, similar to root commit
It seems obvious in hindsight to have a virtual root operation just
like we have a virtual root commit. It removes the same kind of
problems by making sure there's always a common ancestor (or multiple)
between any two commits.

I think the reason I didn't add a root operation from the beginning
was that there used to be a mandatory working-copy commit in the view
(this was before support for multiple workspaces).

Perhaps we should remove the "initialize repo" operation now. The only
difference between their view objects is that the "initialize repo"
operation adds the root commit as a head. We could add that to the
root operation, but then the root operation's value depends on the
commit backend.
2024-01-14 10:15:14 -08:00
Martin von Zweigbergk
c9af8bf43a view: drop tracking of public heads
We've had the public_heads for as long as we've had the View object,
IIRC (I didn't check), but we still don't use it for anything. I don't
have any concrete plans for using it either. Maybe our config for
immutable commits is good enough, or maybe we'll want something more
generic (like Mercurial's phases). For now, I think we should simplify
by removing it the storage for public heads.
2024-01-13 22:23:57 -08:00
Martin von Zweigbergk
a66e2a0a6d working_copy: mark commit_id field in proto reserved
By marking it reserved, we prevent accidental use. We can still read
working copy protos that have the field.
2024-01-12 17:38:23 -08:00
Yuya Nishihara
543036c753 cli: run "op log" without loading repo or merging concurrent ops
When debugging behavior of badly-GCed repos, I find it's annoying that "op log"
fails because the index can't be loaded. Since "op log" doesn't need a repo, I
think it's better to display the exact op-heads state without merging.
2024-01-13 10:38:10 +09:00
Yuya Nishihara
831a530283 op_walk: make walk_ancestors() sort head ops to stabilize output
I thought this would be done by dag_walk::topo_order_reverse_lazy_ok(), but
apparently I made it preserve the input order in a way topo_order_reverse()
would do.
2024-01-13 10:38:10 +09:00