Commit graph

2147 commits

Author SHA1 Message Date
Yuya Nishihara
c9150d02fc working_copy: don't look up file state twice while visiting directories 2023-11-30 12:09:31 +09:00
Yuya Nishihara
6ce7bd5338 repo_path: replace .contains() with .starts_with(), flipping the arguments
self.contains(other) means that the self tree contains the other tree (i.e.
the self path is prefix of the other), but it could be confused the other way
around if we were thinking about the path literal, not the tree. Let's add
.starts_with() instead by copying the std::path::Path definition.
2023-11-29 08:41:23 +09:00
Yuya Nishihara
266690a46b repo_path: make strip_prefix() public function returning &RepoPath
There are no external callers, but I think it's useful.
2023-11-29 08:41:23 +09:00
Yuya Nishihara
73690ed54e matchers: clean up .walk_to(dir) to yield &RepoPath instead of iterator 2023-11-29 08:41:23 +09:00
Yuya Nishihara
bc9725c73c working_copy: use RepoPath::parent() which no longer allocates temporary object 2023-11-29 08:41:23 +09:00
Yuya Nishihara
016fc2b5cc repo_path: change .split() and .parent() to return &RepoPath 2023-11-29 08:41:23 +09:00
Yuya Nishihara
28ab9593c3 repo_path: split RepoPath into owned and borrowed types
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().
2023-11-28 07:33:28 +09:00
Yuya Nishihara
0a1bc2ba42 repo_path: add stub RepoPathBuf type, update callers
Most RepoPath::from_internal_string() callers will be migrated to the function
that returns &RepoPath, and cloning &RepoPath won't work.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
f5938985f0 repo_path: make RepoPath::from_internal_string() accept owned string
I'm going to add borrowed RepoPath type, and most from_internal_string()
callers will be migrated to it. For the remaining callers, it makes more
sense to move the ownership of String to RepoPathBuf.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
d322df0c8d matchers: make Files/PrefixMatcher constructors accept slice of borrowed paths
RepoPath will become slice type (like str), and it doesn't make sense to
require &[RepoPathBuf] here.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
a23bb5b958 matchers: in tests, use alias to RepoPath::from_internal_string()
It looked verbose to fully spell the function name.
2023-11-28 07:33:28 +09:00
Ilya Grigoriev
6aef4bb52e cli rebase: do not allow -r --skip-empty
This follows up on 3967f63 (see that commit's description for more
motivation) and e79c8b6.

In a discussion linked below, it was decided that forbidding `-r --skip-empty`
entirely is preferable to the mixed behavior introduced in 3967f63.

3967f637dc (commitcomment-133539911)
2023-11-27 10:16:36 -08:00
Yuya Nishihara
55f75278bc repo_path: make to_internal_file_string() return &str, rename accordingly 2023-11-27 08:42:09 +09:00
Yuya Nishihara
12d7f8be16 repo_path: turn RepoPath into String wrapper
RepoPath::from_components() is removed since it is no longer a primitive
function.

The components iterator could be implemented on top of str::split(), but
it's not as we'll probably want to add components.as_path() -> &RepoPath.

Tree walking and tree_states map construction get slightly faster thanks to
fewer allocations and/or better cache locality. If we add a borrowed RepoPath
type, we can also implement a cheap &str to &RepoPath conversion on top. Then,
we can get rid of BTreeMap<RepoPath, FileState> construction at all.

Snapshot without watchman:
```
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux status"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
  Time (mean ± σ):     950.1 ms ±  24.9 ms    [User: 1642.4 ms, System: 681.1 ms]
  Range (min … max):   913.8 ms … 990.9 ms    10 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
  Time (mean ± σ):     872.1 ms ±  14.5 ms    [User: 1922.3 ms, System: 625.8 ms]
  Range (min … max):   853.2 ms … 895.9 ms    10 runs

Relative speed comparison
        1.09 ±  0.03  target/release-with-debug/jj-0 -R ~/mirrors/linux status
        1.00          target/release-with-debug/jj-1 -R ~/mirrors/linux status
```

Tree walk:
```
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux files --ignore-working-copy"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux files --ignore-working-copy
  Time (mean ± σ):     375.3 ms ±  15.4 ms    [User: 223.3 ms, System: 151.8 ms]
  Range (min … max):   359.4 ms … 394.1 ms    10 runs

Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux files --ignore-working-copy
  Time (mean ± σ):     357.1 ms ±  16.2 ms    [User: 214.7 ms, System: 142.6 ms]
  Range (min … max):   341.6 ms … 378.9 ms    10 runs

Relative speed comparison
        1.05 ±  0.06  target/release-with-debug/jj-0 -R ~/mirrors/linux files --ignore-working-copy
        1.00          target/release-with-debug/jj-1 -R ~/mirrors/linux files --ignore-working-copy
```
2023-11-27 08:42:09 +09:00
Yuya Nishihara
974a6870b3 repo_path: make RepoPath::components() return iterator
This allows us to change the backing type from Vec<String> to String.
2023-11-27 08:42:09 +09:00
Yuya Nishihara
aba8c640be repo_path: capture current Vec<String> ordering by tests
The added test would fail if paths were purely ordered by concatenated strings.
I'm not sure if we want to preserve the current ordering, but let's not break
it for the moment.
2023-11-27 08:42:09 +09:00
Ilya Grigoriev
3967f637dc cli rebase: do not allow -r --skip-empty to drop emptied descendants
This follows up on @matts1 's #2609.

We still allow the `-r` commit to become empty. I would be more comfortable if
there was a test for that, but I haven't done that (yet?) and it seems pretty
safe. If that's a problem, I'm happy to forbid `-r --skip-empty` entirely,
since it is far less useful than `-s --skip-empty` or `-b --skip-empty`.

I think it is undesired to abandon emptied descendants. As far as descendants
of `A` are concerned, `jj rebase -r A` should be equivalent to `jj abandon A`,
and `jj abandon` does not remove emptied commits. It also doesn't seem very
useful to do that, since I think descendant commits of an abandoned (or moved
with `-r`) commit only become empty in pathological cases.

Additionally, if we did want -r to empty descendants of `A`, we'd have to add
thorough tests and possibly improve the algorithm. I want to refactor `rebase
-r` and add features to it, and having to consider cases of commits becoming
abandoned makes everything harder.

For example, if we have

```
root -> A -> B -> C
```

and `jj rebase -r A -d C` empties commit `B` (or `C`), I do not know whether
the current algorithm will work correctly. It seems possible that it would, but
that depends on the fact that empty merge commits are not abandoned for
descendants. That seems dangerous to rely on without tests.

I hope (but can't promise) that in the near future, making DescendantRebaser
return more information  should help make it possible to create such
functionality in a more robust way. I am likely to attempt this as part of
implementing `-r --after`.
2023-11-26 10:56:58 -08:00
Yuya Nishihara
59ef3f0023 repo_path: split RepoPathComponent into owned and borrowed types
This is a step towards introducing a borrowed RepoPath type. The current
RepoPath type is inefficient as each component String is usually short. We
could apply short-string optimization, but still each inlined component would
consume 24 bytes just for e.g. "src", and increase the chance of random memory
access. If the owned RepoPath type is backed by String, we can implement cheap
cast from &str to borrowed &RepoPath type.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
f2096da2d6 repo_path: add stub type to introduce borrowed RepoPathComponent type
The current RepoPathComponent will be renamed to RepoPathComponentBuf, and
new str wrapper will be added as RepoPathComponent.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
e14b31a033 repo_path: reject leading slash and empty path components
Leading/trailing slashes would introduce a bit of complexity if we migrate
the backing type from Vec<String> to String. Empty components are okay, but
let's reject them as they are cryptic and invalid.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
755af75c30 repo_path: in tests, use alias to RepoPath::from_internal_string()
It seemed too verbose to spell the full function name in tests.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
b7543f8a08 rewrite: fix check for newly-empty commit in optimized path
'old_base_tree_id == None' means the rebased tree is unchanged, so the commit
shouldn't be considered newly-empty.
2023-11-26 14:42:17 +09:00
Yuya Nishihara
2f93de9299 rewrite: flatten mapping from EmptyBehaviour to desired action
I think this is slightly easier to follow.
2023-11-26 14:42:17 +09:00
Ilya Grigoriev
c32847696d rewrite.rs: rename new_parents to parent_mapping
The function `new_parents` makes sense, but I found the mapping
being named `new_parents` confusing.
2023-11-25 21:36:35 -08:00
Yuya Nishihara
6344cd56b3 repo_path: remove RepoPathJoin trait, just implement join() on the type
I don't think we'll add join() that takes different types.
2023-11-26 07:14:47 +09:00
Yuya Nishihara
d7df2516c5 repo_path: remove RepoPathComponent::string(), use as_str() instead
There are only two callers, and one does further conversion to BString.
2023-11-26 07:14:47 +09:00
Martin von Zweigbergk
6d54afa60e revset: make evaluate_programmatic() optimize expression
It seems generally useful to optimize revset expressions in
`evaluate_programmatic()` so the caller doesn't have to remember to do
it. It should generally be cheap to do so even if it's often not
needed.
2023-11-24 21:13:58 -10:00
Martin von Zweigbergk
550164209c revset: add a RevsetExpression::evaluate_programmatic()
We often resolve a programmatic revset and then immediately evaluate
it. This patch adds a convenience method for those two steps.
2023-11-24 21:13:58 -10:00
Martin von Zweigbergk
f2602f78cf revset: make resolve_programmatic() not return a Result
I think it's always a programming error if `resolve_programmatic()`
returns a `Result`, so it shouldn't have to return a `Result`.
2023-11-24 21:13:58 -10:00
Martin von Zweigbergk
f27f52984e revset: rename resolve() to resolve_programmatic()
`RevsetExpression::resolve()` is meant for programmatically created
expressions. In particular, it may not contain symbols. Let's try to
clarify that by renaming the function and documenting it.
2023-11-24 21:13:58 -10:00
Matt Stark
0a95e20ebe lib: Implement skipping of empty commits 2023-11-24 14:48:06 +11:00
Matt Stark
dc89566039 lib: Create struct RebaseOptions 2023-11-24 14:48:06 +11:00
Anton Bulakh
5c3c0e9f6e sign: Implement generic commit signing on the backend 2023-11-23 22:52:20 +02:00
Anton Bulakh
5ab00e197a backend: Inline gix::Repository::commit_as to prepare for signing
Additional bonus is that this allows us to avoid creating keep refs for the
intermediate commits in the data race preventing loop.
2023-11-23 22:52:20 +02:00
Yuya Nishihara
042d26049c working_copy: lazily construct file_states BTreeMap
While it got faster to build a large BTreeMap<RepoPath, _>, there's still
a measurable cost. Let's eliminate it if watchman is enabled and the working
copy is clean. Perhaps, we should introduce new serialization format that
supports instant loading and lookup, but this hack works for the moment.
I'm not sure if the new tree_state format should be flat (RepoPath, _) list,
or tree like the backend storage btw.

In my "linux" repo (watchman enabled):
    % hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
      "target/release-with-debug/{bin} -R ~/mirrors/linux status"
    Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
      Time (mean ± σ):     768.9 ms ±  14.2 ms    [User: 630.7 ms, System: 131.2 ms]
      Range (min … max):   742.3 ms … 783.1 ms    10 runs

    Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
      Time (mean ± σ):     713.0 ms ±  16.8 ms    [User: 587.9 ms, System: 116.2 ms]
      Range (min … max):   681.5 ms … 731.1 ms    10 runs

    Relative speed comparison
            1.08 ±  0.03  target/release-with-debug/jj-0 -R ~/mirrors/linux status
            1.00          target/release-with-debug/jj-1 -R ~/mirrors/linux status
2023-11-23 18:48:14 +09:00
Yuya Nishihara
12cd657837 working_copy: extract file_states_to_proto() helper
Just minimizing the changes in the next commit. As we already have
file_states_from_proto(), it makes sense to extract the "to" function.
2023-11-23 18:48:14 +09:00
Yuya Nishihara
74c4ef32aa fsmonitor: exclude .git and .jj directories from changed files
This ensures that the root fsmonitor_matcher matches nothing if there are no
working-copy changes. The query result can be observed by "jj debug watchman
query-changed-files".

I don't have expertise on watchman query language, but using the watchman API
is probably better than .filter()-ing the result manually.
2023-11-23 18:48:14 +09:00
Yuya Nishihara
1ddcaa43b3 fsmonitor: don't apply prefix matching to paths obtained from watchman
If I understand it, watchman returns changed files and directories, and a
directory change doesn't mean we need to scan all files under the directory.
2023-11-23 10:06:00 +09:00
Yuya Nishihara
767e94f5af fsmonitor: drop unneeded mut from make_fsmonitor_matcher()
We only need &self.working_copy_path here.
2023-11-23 10:06:00 +09:00
Yuya Nishihara
c16c89bc27 fsmonitor: keep paths relative to the workspace root
Since the caller wants repo-relative paths, it doesn't make sense to convert
them back and forth.
2023-11-23 10:06:00 +09:00
Yuya Nishihara
a4f6e0de0b repo_path: extract helper that converts Path to RepoPath literally 2023-11-23 10:06:00 +09:00
Yuya Nishihara
31def4b131 cleanup: don't use debug format to print source errors 2023-11-23 10:05:37 +09:00
Yuya Nishihara
16620e0e4c merged_tree: drop legacy tree handling from ConflictsDirItem constructor
No callers pass in a legacy tree.
2023-11-21 07:45:30 +09:00
Yuya Nishihara
4ad3db2e84 merged_tree: extract value() function of non-legacy trees 2023-11-21 07:45:30 +09:00
Yuya Nishihara
ca3f549c9e merged_tree: remove redundant clone() from ConflictIterator construction 2023-11-21 07:45:30 +09:00
Martin von Zweigbergk
acc35a89a8 merged_tree: inline non-recursive entry iterator 2023-11-19 20:29:40 -10:00
Martin von Zweigbergk
426f6d0cdd merged_tree: inline non-recursive conflict iterator
The abstraction is no longer useful since we made the types not
self-referential.
2023-11-19 20:29:40 -10:00
Yuya Nishihara
5186066cf5 working_copy: simply collect() proto file states into BTreeMap
Suppose the input list is presorted, sorting a sorted vec would be cheaper
than .insert()-ing sorted items one by one.

In my "linux" repo (watchman eanbled):
 - jj-0: baseline
 - jj-1: previous (don't randomize by HashMap)
 - jj-2: this

    % hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
        "target/release-with-debug/{bin} -R ~/mirrors/linux status"
    Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
      Time (mean ± σ):      1.034 s ±  0.020 s    [User: 0.881 s, System: 0.212 s]
      Range (min … max):    1.011 s …  1.068 s    10 runs

    Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
      Time (mean ± σ):     849.3 ms ±  13.8 ms    [User: 710.7 ms, System: 199.3 ms]
      Range (min … max):   821.7 ms … 870.2 ms    10 runs

    Benchmark 3: target/release-with-debug/jj-2 -R ~/mirrors/linux status
      Time (mean ± σ):     786.2 ms ±  16.7 ms    [User: 650.7 ms, System: 204.1 ms]
      Range (min … max):   760.8 ms … 805.2 ms    10 runs

    Relative speed comparison
            1.32 ±  0.04  target/release-with-debug/jj-0 -R ~/mirrors/linux status
            1.08 ±  0.03  target/release-with-debug/jj-1 -R ~/mirrors/linux status
            1.00          target/release-with-debug/jj-2 -R ~/mirrors/linux status
2023-11-20 08:29:33 +09:00
Yuya Nishihara
ee6a1e2c0a working_copy: don't build intermediate HashMap from proto file states
According to the doc, this is compatible with the map syntax.
https://protobuf.dev/programming-guides/proto3/#maps

This change means that the serialized file states are sorted by RepoPath,
so BTreeMap<RepoPath, _> can be reconstructed with fewer cache misses.

In my "linux" repo (watchman enabled):
 - jj-0: baseline
 - jj-1: this

    % hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1,jj-2 \
      "target/release-with-debug/{bin} -R ~/mirrors/linux status"
    Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
      Time (mean ± σ):      1.034 s ±  0.020 s    [User: 0.881 s, System: 0.212 s]
      Range (min … max):    1.011 s …  1.068 s    10 runs

    Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
      Time (mean ± σ):     849.3 ms ±  13.8 ms    [User: 710.7 ms, System: 199.3 ms]
      Range (min … max):   821.7 ms … 870.2 ms    10 runs

    Relative speed comparison
            1.32 ±  0.04  target/release-with-debug/jj-0 -R ~/mirrors/linux status
            1.08 ±  0.03  target/release-with-debug/jj-1 -R ~/mirrors/linux status

Cache-misses got reduced:

    % perf stat -e task-clock,cycles,instructions,cache-references,cache-misses \
      -- ./target/release-with-debug/jj-0 -R ~/mirrors/linux --no-pager status

              1,091.68 msec task-clock                       #    1.032 CPUs utilized
         4,179,596,978      cycles                           #    3.829 GHz
         6,166,231,489      instructions                     #    1.48  insn per cycle
           134,032,047      cache-references                 #  122.776 M/sec
            29,322,707      cache-misses                     #   21.88% of all cache refs

           1.057474164 seconds time elapsed

           0.897042000 seconds user
           0.194819000 seconds sys

    % perf stat -e task-clock,cycles,instructions,cache-references,cache-misses \
      -- ./target/release-with-debug/jj-1 -R ~/mirrors/linux --no-pager status

                927.05 msec task-clock                       #    1.083 CPUs utilized
         3,451,299,198      cycles                           #    3.723 GHz
         6,222,418,272      instructions                     #    1.80  insn per cycle
            98,499,363      cache-references                 #  106.251 M/sec
            11,998,523      cache-misses                     #   12.18% of all cache refs

           0.855938336 seconds time elapsed

           0.720568000 seconds user
           0.207924000 seconds sys
2023-11-20 08:29:33 +09:00
Yuya Nishihara
56047cb7ec working_copy: don't pass all proto data to from_proto() functions
Just a code cleanup. This allows us to consume proto fields if needed.
I also removed redundant .clone() and .as_str().
2023-11-20 08:29:33 +09:00
Yuya Nishihara
d7e63837f4 index: use extend_wanted/unwanted() to initialize RevWalk 2023-11-17 21:38:56 +09:00
Yuya Nishihara
a9f3dd95e5 index: remove index field from RevWalkQueue, move it to caller 2023-11-17 21:38:56 +09:00
Yuya Nishihara
ff1bb3133e index: move index-dependent operations out of RevWalkQueue
RevWalkQueue no longer depend on the RevWalkIndex trait, and the index field
can be removed.
2023-11-17 21:38:56 +09:00
Yuya Nishihara
32a7b33e56 index: optimize RevWalk to store IndexPosition instead of IndexEntry
The idea is the same as the heads_pos() change in 9832ee205d. While
IndexEntry::position() should be cheap, saving 20 bytes per entry appears to
improve the performance in mid-size repos.

In my "linux" repo:

    revsets/all()
    -------------
    baseline  1.24     156.0±1.06ms
    this      1.00     126.0±0.51ms

I don't see significant difference in small-sized repos like "jj" or "git".

IndexEntryByPosition isn't removed since it's still used by the revset engine.
2023-11-17 21:38:56 +09:00
Martin von Zweigbergk
9be24db051 tree: make TreeEntriesDirItem not self-referential
This removes the last use of `ouroboros`. Since `TreeEntriesDirItem`
is only used in "legacy trees" (before tree-level conflicts), I didn't
bother to check the performance impact. I also didn't bother to check
the matcher before adding the entries to the list, instead leaving
that where it is in `Iterator::next()`.
2023-11-17 03:50:34 -08:00
Martin von Zweigbergk
c0295c5dbc merged_tree: make ConflictsDirItem not self-referential
This removes the last use of `ouroboros` in `merged_tree.rs`. The set
of conflicts to iterate is usually so small that I didn't bother
checking the performance impact.
2023-11-17 03:50:34 -08:00
Martin von Zweigbergk
e1a02c5c5b merged_tree: make TreeDiffDirItem not self-referential
This removes another dependency on `ouroboros`, for a small
performance hit:


```
❯ hyperfine --warmup 3 --runs 30 \
      '/tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0' \
      '/tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0'
Benchmark 1: /tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0
  Time (mean ± σ):     689.7 ms ±  23.9 ms    [User: 400.0 ms, System: 289.8 ms]
  Range (min … max):   666.9 ms … 759.2 ms    30 runs

Benchmark 2: /tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0
  Time (mean ± σ):     710.9 ms ±  19.2 ms    [User: 420.4 ms, System: 290.6 ms]
  Range (min … max):   688.5 ms … 752.0 ms    30 runs

Summary
  '/tmp/jj-before --ignore-working-copy diff -s --from v5.0 --to v6.0' ran
    1.03 ± 0.05 times faster than '/tmp/jj-after --ignore-working-copy diff -s --from v5.0 --to v6.0'
```
2023-11-17 03:50:34 -08:00
Martin von Zweigbergk
61d87fe296 merged_tree: make TreeEntriesIterator not self-referential
While importing the `ouroboros` crate and the `aliasable` crate it
depends on, the "unsafe Rust reviewer" expressed some concern that
they contain a lot of unsafe code that's hard to review. We can avoid
the unsafe code altogether by making `TreeEntriesIterator` not
self-refential. Instead, we can collect the matching entries in an
individual tree up front. It does have some performance cost:


```
❯ hyperfine --warmup 3 --runs 30 \
      '/tmp/jj-before --ignore-working-copy files -r v6.0' \
      '/tmp/jj-after --ignore-working-copy files -r v6.0'
Benchmark 1: /tmp/jj-before --ignore-working-copy files -r v6.0
  Time (mean ± σ):     461.4 ms ±  14.3 ms    [User: 232.1 ms, System: 229.4 ms]
  Range (min … max):   443.4 ms … 496.3 ms    30 runs

Benchmark 2: /tmp/jj-after --ignore-working-copy files -r v6.0
  Time (mean ± σ):     482.0 ms ±  14.3 ms    [User: 257.2 ms, System: 224.9 ms]
  Range (min … max):   461.8 ms … 513.3 ms    30 runs

Summary
  '/tmp/jj-before --ignore-working-copy files -r v6.0' ran
    1.04 ± 0.04 times faster than '/tmp/jj-after --ignore-working-copy files -r v6.0'
```

I think that's acceptable.
2023-11-17 03:50:34 -08:00
Yuya Nishihara
93fbcec2f7 index: use BinaryHeap instead of BTreeSet in common_ancestors_pos()
For the same reason as the heads_pos() change. We just want to omit duplicated
items.
2023-11-16 08:27:59 +09:00
Yuya Nishihara
d4059520a9 index: cache generation numbers during common_ancestors_pos() computation
I'm not sure if this is better, but common_ancestors_pos() would have a
similar property to heads_pos().
2023-11-16 08:27:59 +09:00
Yuya Nishihara
ea4bdd718d index: use "while let" in common_ancestors_pos() 2023-11-16 08:27:59 +09:00
Yuya Nishihara
02c84a8596 index: remove stale "allow(unstable_name_collisions)"
I think this is remainder of nightly shims.
2023-11-16 08:27:59 +09:00
Yuya Nishihara
6399c392fd index: make heads_pos() deduplicate entries without building separate set
This is much faster (maybe because of better cache locality?) Another option
is to use BTreeSet, but the BinaryHeap version is slightly faster.

"bench revset" result in my linux repo:

    revsets/heads(tags())
    ---------------------
    baseline  3.28     560.6±4.01ms
    1         2.92     500.0±2.99ms
    2         1.98     339.6±1.64ms
    3 (this)  1.00     171.2±0.30ms
2023-11-16 08:27:59 +09:00
Yuya Nishihara
9832ee205d index: optimize heads_pos() to cache generation numbers during computation
Apparently, IndexEntry::generation_number() isn't cheap probably because it
involves random access to larger memory region, and the u32 value might not
be aligned. Let's instead store the generation numbers in BinaryHeap.

Also, heads_pos() becomes slightly faster by keeping the BinaryHeap entries
small, so I've removed the IndexEntry at all.

This makes the default log and disambiguation revsets fast, which evaluate
'heads(immutable_heads())'.

"bench revset" result in my linux repo:

    revsets/heads(tags())
    ---------------------
    baseline  3.28     560.6±4.01ms
    1         2.92     500.0±2.99ms
    2 (this)  1.98     339.6±1.64ms
2023-11-16 08:27:59 +09:00
Yuya Nishihara
1e933b84dd index: make IndexEntry::parents() lazy instead of collecting to Vec
All callers just iterate over the parent entries.

"bench revset" result in my linux repo:

    revsets/heads(tags())
    ---------------------
    baseline  3.28     560.6±4.01ms
    1 (this)  2.92     500.0±2.99ms
2023-11-16 08:27:59 +09:00
Yuya Nishihara
39b065f7ab git: on import_refs(), exclude uninteresting dirs such as refs/jj/keep
For loose refs, uninteresting directories can be just skipped. For packed refs,
gix will have to do binary search for each prefix to find the starting point.
Still it's better overall if the repository contains tons of refs/jj/keep refs.

With my linux repo containing ~5k loose jj refs, this saves ~40ms:

    % hyperfine --warmup 3 --runs 10 \
        "/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux" \
        "/tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux"
    Benchmark 1: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux
      Time (mean ± σ):     151.6 ms ±  11.4 ms    [User: 38.8 ms, System: 111.6 ms]
      Range (min … max):   129.8 ms … 159.5 ms    10 runs
    Benchmark 2: /tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux
      Time (mean ± σ):     109.9 ms ±  11.6 ms    [User: 27.5 ms, System: 82.4 ms]
      Range (min … max):    89.4 ms … 117.8 ms    10 runs
2023-11-14 17:35:27 +09:00
Yuya Nishihara
044716ee40 git: migrate import_refs() to gix::Repository
Gitoxide errors are boxed since there are various error types and they tend
to exceed the clippy size limit.

Apparently, gitoxide is faster than git2:

    % hyperfine --warmup 3 --runs 10 \
        "/tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux" \
        "/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux"
    Benchmark 1: /tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux
      Time (mean ± σ):     205.4 ms ±  15.7 ms    [User: 59.6 ms, System: 144.6 ms]
      Range (min … max):   189.7 ms … 223.9 ms    10 runs
    Benchmark 2: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux
      Time (mean ± σ):     176.2 ms ±  13.7 ms    [User: 41.2 ms, System: 134.0 ms]
      Range (min … max):   155.4 ms … 186.5 ms    10 runs
2023-11-14 17:35:27 +09:00
Yuya Nishihara
6c98dfcdcb git: have import_refs() obtain git2::Repository instance from store
This helps gitoxide migration. It's theoretically possible to import Git refs
from non-Git backend, but I don't think such API flexibility is needed.
2023-11-14 17:35:27 +09:00
Yuya Nishihara
dbb1adaf0a git: move import-related types close to import_refs() function 2023-11-14 17:35:27 +09:00
Yuya Nishihara
8e143541a5 operation: propagate OpStoreError from parents()
We need to .collect_vec() the parents iterator to temporary buffer since the
borrowed iterator can't be returned back to the dag_walk functions. Another
option is to clone op_store and parent ids to remove &self lifetime from the
iterator, but that also means a temporary Vec is created.
2023-11-14 07:16:39 +09:00
Yuya Nishihara
8ddad859e8 dag_walk: add fallible topo_order_reverse_lazy()
Unlike dfs_ok(), this function short-circuits at an Err as we use non-lazy
topo_order_forward() internally. I think that's good enough. If we implement
GC on operation log, deleted parents will be excluded (or mapped to tombstone)
by caller. An Err shouldn't mean it's GC-ed.
2023-11-14 07:16:39 +09:00
Yuya Nishihara
3d5a07e86a dag_walk: add fallible dfs(), topo_order(), heads(), and closest_common_node()
This unblocks the use of Result<T, E> in op.parents().

There are two ways to encode errors:
 a. impl IntoIterator<Item = Result<T, E>>
 b. Result<V, E> where V: FromIterator<Item = T>
I think (a) is more natural to algorithms like dfs(), which can process error
nodes transparently.

Still the caller might have to collect the source iterator to temporary Vec
to conform to the neighbors_fn signature. It's not easy for neighbors_fn to
return an iterator borrowing the input node. We already have GAT, but doesn't
have return-position impl Trait in trait yet.
2023-11-14 07:16:39 +09:00
Yuya Nishihara
e5a9a26911 dag_walk: remove unused and untested leaves() function 2023-11-14 07:16:39 +09:00
Anton Bulakh
e3a1e5b80e sign: Implement storage for digital commit signatures
Recognize signature metadata from git commit objects, implement
a basic version of that for the native backend.
Extract the signed data (a commit binary repr without the signature) to
be verified later.
2023-11-12 03:37:13 +02:00
Yuya Nishihara
b42a69db6d git_backend: configure committer (and author) of gix::Repository
Otherwise, ref updates would fail if we port git::export_refs() to gitoxide.
This change isn't strictly needed for the backend itself, but we'll reuse the
gix::Repository instance created by the backend when importing and exporting
Git refs.
2023-11-11 22:35:54 +09:00
Yuya Nishihara
ea32c0cb9e git_backend: pass UserSettings to GitBackend constructors 2023-11-11 22:35:54 +09:00
Yuya Nishihara
8a2048a0e5 repo: pass UserSettings to store factories and initializers
GitBackend will use it to configure gix::Repository. I think UserSettings
is generally useful to pass store-specific parameters, so I've updated all
factory functions.
2023-11-11 22:35:54 +09:00
Yuya Nishihara
6125fb160e op_store: embed details in operation/view not found error
This is basically a copy of BackendError::ObjectNotFound. The failed id may
be either view or operation id.
2023-11-11 22:35:40 +09:00
Yuya Nishihara
ea96513fd1 op_store: deduplicate functions that map io::Error to OpStoreError
io_to_read_error() also translates ErrorKind::NotFound.
2023-11-11 22:35:40 +09:00
Martin von Zweigbergk
120115a20d cli: pass MaterializedTreeValue into git_diff_part()
Just a little preparation for reading the materialized values
concurrently.
2023-11-10 04:54:47 -08:00
Waleed Khan
a60733f632 tree: remove unsafe with ouroboros for self-referential iterators 2023-11-09 21:50:29 -08:00
Yuya Nishihara
6ff3a4f3df repo: reimplement DirtyCell without using unsafe
While the safe implementation is a bit more complex (and probably more branchy),
I don't think the runtime overhead would matter here. Let's remove one more
unsafe for better code maintainability.
2023-11-10 07:42:45 +09:00
Martin von Zweigbergk
9b24d24612 conflicts: add another helper for materializing a tree value
We have a few places where we have a `MergedTreeValue` and need to
read the data associated with it so we can write to the working copy
or include it in a diff. Let's extract some of that shared logic to a
function so we can reuse it. I plan to use it for reading file
contents in advance while streaming a diff in `local_working_copy`
soon (and probably in `jj diff` thereafter), but I think it seems like
an improvement on its own.
2023-11-08 21:21:38 -08:00
Martin von Zweigbergk
65bd5cacba working copy: on checkout, move read from store out of write_*() functions
I'd like to read N files ahead from the backend, to avoid serializing
too many server calls on backends that are backed by a server. Moving
the reads a little earlier is a little step towards that.

The `TreeState::write_*()` functions can now be made into free/static
functions if we prefer.
2023-11-08 21:21:38 -08:00
Yuya Nishihara
084b99e1e2 index: rewrite CompositeIndex::entry_by_pos() by leveraging ancestors iterator
We no longer have "unsafe" in this function, so let's use the iterator API
instead of recursion. Apparently I haven't pushed this change before because
unsafe in .find_map() looked scary.
2023-11-08 12:09:33 +09:00
Anton Bulakh
d27351b978 misc: drop a few low-hanging unsafes
Remove a couple of unnecessary unsafes:
- The NonZeroUsize is a constant where the unwrap will optimize away
anyway and we don't have an unsafe without any good reason there :)
- The other two were simply not needed, lifetimes worked fine, maybe
Rust became better since that code was written? NLL? Anyway, they're
gone now
2023-11-08 02:16:08 +02:00
Yuya Nishihara
2ac9865ce7 revset: exclude @git branches from remote_branches()
As discussed in Discord, it's less useful if remote_branches() included
Git-tracking branches. Users wouldn't consider the backing Git repo as
a remote.

We could allow explicit 'remote_branches(remote=exact:"git")' query by changing
the default remote pattern to something like 'remote=~exact:"git"'. I don't
know which will be better overall, but we don't have support for negative
patterns anyway.
2023-11-08 07:34:30 +09:00
Yuya Nishihara
d1b0c4cc48 merge: relax input type of Merge::from_removes_adds() 2023-11-07 17:10:12 +09:00
Yuya Nishihara
e0c35684af merge: rename Merge::new() to Merge::from_removes_adds()
Since (removes, adds) pair is no longer the canonical representation of Merge,
the name Merge::new() seems too generic. Let's give more verbose name.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
2c128f1b61 merged_tree: convert from legacy conflicts through interleaved list
This is basically the same change as the previous commit.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
a734f46130 merged_tree: build unresolved Merge<Tree> from interleaved list
We no longer need to iterate removes and adds separately.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
dd26b7be40 merge: add Merge constructor that accepts interleaved values
Also migrated some callers of 3-way merge, where [left, base, right] order
looks okay.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
803b41c426 merge: load legacy Merge values without allocating intermediate buffers 2023-11-07 17:10:12 +09:00
Yuya Nishihara
09987c1d27 merge: micro-optimize allocation of Merge object for resolved value
It's super common that a Merge object holds a resolved value, so let's inline
up to 1 element. T of Merge<T> usually consists of a couple of pointer-sized
fields. I don't see any measurable speed up, but it's no worse than the
original.
2023-11-07 17:10:12 +09:00
Martin von Zweigbergk
1140295829 merged_tree: extract polling of tree futures into a function 2023-11-07 00:03:50 -08:00
Martin von Zweigbergk
c77417d4e4 merged_tree: drop outer loop in TreeDiffStreamImpl::poll_next()
As suggested by Yuya. I also added a comment and an assertion in the
case where return `Poll::Pending`.
2023-11-07 00:03:50 -08:00
Martin von Zweigbergk
d989d4093d merged_tree: let backend influence whether to use new diff algo
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
f40adb84fc merged_tree: add a Stream for concurrent diff off trees
When diffing two trees, we currently start at the root and diff those
trees. Then we diff each subtree, one at a time, recursively. When
using a commit backend that uses remote storage, like our backend at
Google does, diffing the subtrees one at a time gets very slow. We
should be able to diff subtrees concurrently. That way, the number of
roundtrips to a server becomes determined by the depth of the deepest
difference instead of by the number of differing trees (times 2,
even). This patch implements such an algorithm behind a `Stream`
interface. It's not hooked in to `MergedTree::diff_stream()` yet; that
will happen in the next commit.

I timed the new implementation by updating `jj diff -s` to use the new
diff stream and then ran it on the Linux repo with `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by
~20%, from ~750 ms to ~900 ms. Maybe we can get some of that
performance back but I think it'll be hard to match
`MergedTree::diff()`. We can decide later if we're okay with the
difference (after hopefully reducing the gap a bit) or if we want to
keep both implementations.

I also timed the new implementation on our cloud-based repo at
Google. As expected, it made some diffs much faster (I'm not sure if
I'm allowed to share figures).
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
c9ce80a82a merged_tree: extract function for merged iterator of basenames in diff
I'm going to reuse this for stream/async diffing.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
b72f04ba61 merged_tree: rename all_tree_conflict_names() since it's not about conflicts 2023-11-06 23:12:02 -08:00
Yuya Nishihara
3fddc31da8 merge: remove Merge::take() which is no longer used
Merge::take() is no longer a cheap function. We can add into_vec() if needed.
2023-11-07 06:52:35 +09:00
Yuya Nishihara
92dfe59ade refs: run non-trivial merge of ref targets without destructuring Merge object 2023-11-07 06:52:35 +09:00
Yuya Nishihara
93601541cb refs: use swap_remove() in non-trivial merge of ref targets
I'm going to add a Merge method that removes negative/positive terms pair, and
swap_remove() is the easiest option. The order of the conflicted ref targets
doesn't matter.
2023-11-07 06:52:35 +09:00
Yuya Nishihara
895bbce8c0 files: use borrowed Merge iterator in merge()
Since the underlying Merge data type is no longer (Vec<T>, Vec<T>), it doesn't
make sense to build removes/adds Vecs and concatenate them.
2023-11-07 06:52:35 +09:00
Yuya Nishihara
f1898a31b5 merge: simply print interleaved conflict values in debug output
We could apply that for the resolved case, but Resolved/Conflicted label
seems more useful than just printing Merge([value]).
2023-11-06 07:21:06 +09:00
Yuya Nishihara
b07b370ed3 merge: simply generate content hash from interleaved values 2023-11-06 07:21:06 +09:00
Yuya Nishihara
46ffb2f0b2 merge: store negative/positive terms internally in an interleaved Vec
Many callers use interleaved iterators, and recently-added serialization code
is built on top of that, so I think it's better to store terms in that format.

map() functions no longer use MergeBuilder as we know the mapped values are
ordered properly. flatten() and simplify() are reimplemented to work with the
interleaved values. The other changes are trivial.
2023-11-06 07:21:06 +09:00
Yuya Nishihara
287728fee7 merge: extract trivial_merge() that takes interleaved adds/removes iterator
The Merge type will store interleaved terms instead of separate adds/removes
vecs.
2023-11-06 07:21:06 +09:00
Yuya Nishihara
01523ba4f3 merge: rewrite bottom half of trivial_merge() for non-copyable types
The input values of trivial_merge() will be changed to Iterator<Item = T>
where T: Eq + Hash. It could be <Item = &'a T>, but it doesn't have to be.
2023-11-06 07:21:06 +09:00
Martin von Zweigbergk
7c923514ee git: add config to disable abandoning of unreachable commits
Some users prefer to have commits not get abandoned when importing
refs. This adds a config option for that.

Closes #2504.
2023-11-05 06:10:54 -08:00
Martin von Zweigbergk
7bf8906f9c git: extract a function for abandoning unreachable commits
This motivation for this is so we can easily skip calling the function
if the user has opted out of the propagation of abandoned commits we
usually do (#2504). However, it seems like a good piece of code to
extract regardless of that feature.
2023-11-05 06:10:54 -08:00
Yuya Nishihara
d9fbf21794 merge: have Merge::adds()/removes() return iterator
The Merge type will be changed to store interleaved values internally.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
1c6913d618 merge: use Merge::iter() instead of adds()/removes() where order doesn't matter
Merge::iter() will be a slice::Iter, and be more efficient than chaining adds
and removes.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
99e6ff493a merge: fix copy-paste error in doc comment for adds() 2023-11-05 16:43:06 +09:00
Yuya Nishihara
f6d85c51cd merge: add non-optional Merge accessor to the zeroth value
We have a few callers which just need to obtain an object common among all
the merge values. Let's add a non-failing accessor for that purpose.
2023-11-05 16:43:06 +09:00
Yuya Nishihara
b12c688ea0 merge: add method for indexed adds/removes access
The current adds()/removes() will be changed to return iterators.
2023-11-05 16:43:06 +09:00
Martin von Zweigbergk
6a5615c933 rewrite: use MergedTree::diff_stream() when restoring from tree 2023-11-04 21:07:49 -07:00
Yuya Nishihara
602b44258e workspace: add function that initializes colocated git repository
One less git2 API use in CLI.

The function name GitBackend::init_colocated() is a bit odd, but we need to
specify the work-tree path, not the ".git" repo path. So we can't eliminate
the notion of the working copy path anyway.
2023-11-05 08:48:35 +09:00
Yuya Nishihara
ce46c10c96 git_backend: extract inner function that initializes backend with open git repo 2023-11-05 08:48:35 +09:00
Yuya Nishihara
dce640aaf1 workspace: one less cloning of workspace_root in init_external_git()
Just a trivial code cleanup.
2023-11-05 08:48:35 +09:00
Yuya Nishihara
c866b4a42d workspace: fix repository path in init_internal_git() doc comment
Also rephrased "Git backend" as "Git repo" since the new backend storage will
be created.
2023-11-05 08:48:35 +09:00
Antoine Cezar
5973ab47b9 commands: move rebase_to_dest_parent to jj_lib::rewrite
What make rebase_to_dest_parent a good candidate for jj_lib::rewrite module:

- It is used both in obslog and interdiff. It's a sign that it may be moved to a lower layer
- CommandError is returned by converting from TreeMergeError. Not explicitly.
- It only use jj_lib::rewrite fonctions.
2023-11-03 20:48:00 +01:00
Martin von Zweigbergk
904c37d36d working copy: use MergedTree::diff_stream()
This will make it a little faster to update the working copy at Google
once we've made `MergedTree::diff_stream()` fetch trees
concurrently. (It only makes it a little faster because we still fetch
files serially.)
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
72245cfac5 merged_tree: add Stream-based version of diff(), delegating for now
I'm going to implement a `Stream`-based version optimized for
high-latency (RPC-based) commit backends. So far, that implementation
is about 20% slower in the Linux repo when running `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. I think that's almost
only because the algorithm is different, not because it's async per
se.

This commit adds a `Stream`-based version of `MergedTree::diff()` that
just wraps the regular iterator in stream. I updated `jj diff` to use
it. I couldn't measure any difference on the command above in the
Linux repo. I think that means we can safely use the same
`Stream`-based interface regardless of backend, even if we end up
needing two different implementations of the `Stream`. We would then
be using the wrapped iterator from this commit for local backends, and
the new implementation for remote backends. But ideally we can make
the remote-friendly implementation fast enough that we don't need two
implementations.
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
24b706641f async: switch to pollster's block_on()
During the transition to using more async code, I keep running into
https://github.com/rust-lang/futures-rs/issues/2090. Right now, I want
to convert `MergedTree::diff()` into a `Stream`. I don't want to
update all call sites at once, so instead I'm adding a
`MergedTree::diff_stream()` method, which just wraps
`MergedTree::diff()` in a `Stream. However, since the iterator is
synchronous, it needs to block on the async `Backend::read_tree()`
calls. If we then also block on the `Stream` in the CLI, we run into
the panic.
2023-11-03 08:15:10 -07:00
Martin von Zweigbergk
3a378dc234 cli: add a function for restoring part of a tree from another tree
We had similar code in two places for restoring paths from one tree to
another. Let's reuse it instead.

I put the new function in the `rewrite` module. I'm not sure if that's
right place. Maybe it belongs in `tree`?
2023-11-02 06:07:45 -07:00
Yuya Nishihara
162dcd49b4 cli: rewrite base GitIgnoreFile lookup to use gitoxide instead of libgit2
Since gix::Repository::config_snapshot() borrows the repo instance, it has to
be allocated in caller's stack. That's why GitBackend::git_config() is removed.
2023-11-02 19:33:06 +09:00
Yuya Nishihara
c88e69ad6f git_backend: replace git2::Repository with gix::Repository
My gut feeling is that gitoxide aims to be more transparent than libgit2. We'll
need to know more about the underlying Git data model.

Random comments on gix API:

 * gix::Repository provides API similar to git2::Repository, but has less
   "convenient" functions. For example, we need to use .find_object() +
   .try_to/into_<kind>() instead of .find_<kind>().
 * gix::Object, Blob, etc. own raw data as bytes. gix::object and gix::objs
   types provide high-level views on such data.
 * Tree building is pretty low-level compared to git2.
 * gix leverages bstr (i.e. bytes) extensively.

It's probably not difficult to migrate git::import/export_refs(). It might
help eliminate the startup overhead of libssl initialization. The gix-based
GitBackend appears to be a bit faster, but that wouldn't practically matter.

#2316
2023-11-02 19:33:06 +09:00
Yuya Nishihara
f5a61dc2b7 git_backend: open just-initialized repo with canonicalized path
Otherwise, the initialized repo could have a different work-dir path than the
load()-ed one. libgit2 appears to do some normalization somewhere, but gix
won't.
2023-11-02 19:33:06 +09:00
Yuya Nishihara
fd187d266f git_backend: box GitBackendInit/LoadError up front
These error enums will wrap gix error types, and will become bigger enough for
clippy to complain.
2023-11-02 19:33:06 +09:00
Isabella Basso
749d8bb15a git: preserve HEAD when possible
Closes: #2210
2023-11-01 08:23:52 -03:00
Yuya Nishihara
1788b5014e git_backend: remove redundant copy back of author timestamp
Only the committer timestamp can be updated inside a loop.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
f5aa739c70 git_backend: use .strip_suffix() instead of manual slicing 2023-10-31 06:51:27 +09:00
Yuya Nishihara
9bd84c55e0 git_backend: use file mode extensively in read_tree()
Both filemode() and kind() are calculated from the same underlying data,
and kind() is libgit2-specific API.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
b3c9cab12d git_backend: handle read_tree() lookup/encoding errors gracefully 2023-10-31 06:51:27 +09:00
Yuya Nishihara
847adc832f git_backend: use lossy conversion to decode non-UTF-8 commit message
If message() returned None, it doesn't mean the commit message is empty. I
originally mapped it to an error, but that made import of linux repo fail.

https://docs.rs/git2/latest/git2/struct.Commit.html#method.message
2023-10-31 06:51:27 +09:00
Yuya Nishihara
06c254e742 git_backend: use non-owned str::from_utf8() to decode symlink target
Just for consistency with the other changes. str::Utf8Error is 2 words long,
so I removed the boxing.
2023-10-31 06:51:27 +09:00
Yuya Nishihara
d1c71c05c9 git_backend: remove redundant error handling for invalid hash length
The only error that could be returned by libgit2 is invalid hash length, and
we check that explicitly. If we switch the backends to gitoxide, there will be
panicking constructor.

https://docs.rs/git2/latest/git2/struct.Oid.html#method.from_bytes
2023-10-31 06:51:27 +09:00
Ilya Grigoriev
8bc3f5fd67 cli rebase: Allow jj rebase -r to rebase a commit onto a descendant
#1188

There are some additional test changes because children and descendants are now
rebased before the commit itself.
2023-10-30 10:56:27 -07:00
Yuya Nishihara
2d3fe7eee2 rewrite: replace use of "lift"ed function application with try_collect()
Also removed redundant borrow + clone.
2023-10-30 13:50:37 +09:00
Martin von Zweigbergk
35a23172ec backend: delete unused Phase enum
The idea was to support phases like in hg, but that hasn't happened
yet. We can add back this simple enum if we do add support for phases.
2023-10-29 12:02:40 -07:00
Martin von Zweigbergk
cfcdd71865 backend: make read_conflict synchronous again
This avoids https://github.com/rust-lang/futures-rs/issues/2090. I
don't think we need to worry about reading legacy conflicts
asynchronously - async is really only useful for Google's backend
right now, and we don't use the legacy format at Google. In
particular, I don't want `MergedTree::value()` to have to be async.
2023-10-28 16:45:40 -07:00
Martin von Zweigbergk
a1ef9dc845 merged_tree: propagate backend errors in diff iterator
I want to fix error propagation before I start using async in this
code. This makes the diff iterator propagate errors from reading tree
objects.

Errors include the path and don't stop the iteration. The idea is that
we should be able to show the user an error inline in diff output if
we failed to read a tree. That's going to be especially useful for
backends that can return `BackendError::AccessDenied`. That error
variant doesn't yet exist, but I plan to add it, and use it in
Google's internal backend.
2023-10-26 06:20:56 -07:00
Martin von Zweigbergk
309f1200d6 merge: introduce a type alias for Merge<Option<TreeValue>>
Reasons to introduce this alias:

* Reduces complexity of a type, to silence Clippy warnings in the
  future if we use this type as a type parameter

* The type is used quite frequently, so it makes sense to have a name
  for it

* It's easier to visually scan for the end of the type when you don't
  have to match opening and closing angle brackets
2023-10-26 06:20:56 -07:00
Martin von Zweigbergk
6ad71e658d merged_tree: rename MergedTreeValue to MergedTreeVal
I'm going to add `MergedTreeValue` as an alias for
`Merge<Option<TreeValue>>`, but we already have a type by that name in
`merged_tree`. This patch renames it away, to make room for the new
alias. I used `MergedTreeVal` for this borrowing version to be a bit
like how `str` is a borrowed version of `String`.
2023-10-26 06:20:56 -07:00
Yuya Nishihara
1bfe5b5b56 cli: add string pattern support to "git push --branch"
Since "jj git fetch --branch" supports glob patterns, users would expect that
"jj git push --branch glob:.." also works.

The error handling bits are copied from "branch" sub commands. We might want to
extract it to a common helper function, but I haven't figured out a reasonable
boundary point yet.
2023-10-26 04:51:17 +09:00
Yuya Nishihara
a6ac9b46e7 git: simply call fetch() with one or more branch name filters 2023-10-25 03:58:48 +09:00
Yuya Nishihara
560b63544a cli: parse "git fetch --branch" parameter as string pattern
Even though "*" can't be used as a branch name to fetch, it should be better
to explicitly enable glob matching like the other commands.
2023-10-25 03:58:48 +09:00
Martin von Zweigbergk
8157d4ff63 merge: materialize conflicts in executable files like regular files
AFAICT, all callers of `Merge::to_file_merge()` are already well
prepared for working with executable files. It's called from these
places:

* `local_working_copy.rs`: Materialized conflicts are correctly
  updated using `Merge::with_new_file_ids()`.

* `merge_tools/`: Same as above.

* `cmd_cat()`: We already ignore the executable bit when we print
  non-conflicted files, so it makes sense to also ignore it for
  conflicted files.

* `git_diff_part()`: We print all conflicts with mode "100644" (the
  mode for regular files). Maybe it's best to use "100755" for
  conflicts that are unambiguously executable, or maybe it's better to
  use a fake mode like "000000" for all conflicts. Either way, the
  current behavior seems fine.

* `diff_content()`: We use the diff content in various diff
  formats. We could add more detail about the executable bits in some
  of them, but I think the current output is fine. For example,
  instead of our current "Created conflict in my-file", we could say
  "Created conflict in executable file my-file" or "Created conflict
  in ambiguously executable file my-file". That's getting verbose,
  though.

So, I think all we need to do is to make `Merge::to_file_merge()` not
require its inputs to be non-executable.

Closes #1279.
2023-10-24 06:45:45 -07:00
Martin von Zweigbergk
21b11df8a9 merge: make non-conflicted debug string for Merge shorter
Resolves states are most common and the current format is pretty
verbose. Let's print it as if `Merge` were an enum with `Resolved` and
`Conflicted` variants instead.
2023-10-24 06:45:45 -07:00
Yuya Nishihara
831dc84c5b git: remove RefName::GitRef variant which isn't used anymore 2023-10-24 09:45:01 +09:00
Yuya Nishihara
a756333216 git: move RefName enum from view module
It's no longer used by View API.
2023-10-24 09:45:01 +09:00
Yuya Nishihara
95919699d4 repo: add merge methods per ref kind, remove RefName indirection
Since local/remote branches are now of different types, it doesn't make much
sense to dispatch merging through RefName. Let's add merge_<kind>() methods
instead.
2023-10-24 09:45:01 +09:00
Yuya Nishihara
6dfe1572a0 git: expand RefName match arms in import_refs()
This prepares for the removal of merge_single_refs().
2023-10-24 09:45:01 +09:00
Yuya Nishihara
ffe7e5f142 repo: inline merge_single_ref() from view
MutableRepo handles merging of the other kind of refs internally, and the
merge function is short enough to inline. I also removed early returns since
most callers provide non-identical ref targets, and merge_ref_targets() should
be cheap if the inputs can be trivially merged.
2023-10-24 09:45:01 +09:00
Yuya Nishihara
5543f7a11c refs: merge tracking state of remote branches
Otherwise "jj op undo" can't roll back tracking states (whereas "op restore"
can.)
2023-10-24 07:13:58 +09:00
Yuya Nishihara
3d45540ff6 refs: add function to diff RemoteRef pairs 2023-10-24 07:13:58 +09:00
Yuya Nishihara
6450194ccd refs: rename diff_named_refs() to diff_named_ref_targets()
I'm going to add RemoteRef variant of this function, and "refs" there will
be ambiguous.
2023-10-24 07:13:58 +09:00
Yuya Nishihara
59eb03eec5 refs: add helper function to iterate local/remote ref pairs
This partially reverts the change in 30fb7995c2 "view: make local/remote
branches iterator yield RemoteRef instead of RefTarget." As I'm going to add
diff function for RemoteRef pairs, we'll need a generic version of merge-join
iterator anyway.
2023-10-24 07:13:58 +09:00
Yuya Nishihara
16ef57907b str_util: add more StringPattern methods for convenience
The Display impl helps to format error messages. Since both Regex and
glob::Pattern implement Display, it's probably okay for our type to copy that.
2023-10-22 04:07:49 +09:00
Yuya Nishihara
4c4eb31a62 view: add methods that filter local/remote branches by pattern
We'll use them in CLI code.
2023-10-22 04:07:49 +09:00
Martin von Zweigbergk
578d61ec2e git: use forward slashes in relative path to backing git repo
Closes #2403.
2023-10-20 22:51:14 -05:00
Yuya Nishihara
cfcc76571c revset: add support for glob:pattern 2023-10-21 09:55:01 +09:00
Yuya Nishihara
f7c8622981 str_util: extract StringPattern filtering iterator from revset module 2023-10-21 09:55:01 +09:00
Yuya Nishihara
2d80f071de str_util: extract function that constructs StringPattern from string 2023-10-21 09:55:01 +09:00
Yuya Nishihara
5707a194d5 str_util: extract StringPattern from revset module
Branch name filtering in CLI will be migrated to this, and I'll probably add
glob:<pattern> in place of --glob option.
2023-10-21 09:55:01 +09:00
Martin von Zweigbergk
8764ad9826 conflicts: make materialization async
We need to let async-ness propagate up from the backend because
`block_on()` doesn't like to be called recursively. The conflict
materialization code is a good place to make async because it doesn't
depends on anything that isn't already async-ready.
2023-10-20 07:38:34 -07:00
Martin von Zweigbergk
f541f9f3a6 cleanup: import futures::exectutor::block_on() instead of qualifying
It seems we'll end up using `block_on()` quite a bit, at least until
we're done transitioning to async, and the function name doesn't
conflict with anything else, so let's always import it when we need
it.
2023-10-20 07:38:34 -07:00
Yuya Nishihara
390b3208da revset: always include non-tracking remote branches in suggestion
For the same reason as the previous commit. Non-tracking remote branch
shouldn't be shadowed by a local branch of the same name.
2023-10-17 16:42:36 +09:00
Yuya Nishihara
9e0b9e6dc8 refs: error out push if non-tracking remote branches exist
We can provide more actionable error message than "not fast-forwardable". If
the push was fast-forwardable, "jj branch track" should be able to merge the
remote branch without conflicts, so the added step would be minimal.
2023-10-17 15:06:03 +09:00
Yuya Nishihara
089503abfb refs: classify push action based on tracking target
Although this is logically correct, the error message is a bit cryptic. It's
probably better to reject push if non-tracking remote branches exist.

#1136
2023-10-17 15:06:03 +09:00
Yuya Nishihara
30fb7995c2 view: make local/remote branches iterator yield RemoteRef instead of RefTarget
We'll use remote_ref.tracking_target() to classify push action, but not all
callers of local_remote_branches() need tracking_target() instead of target.
2023-10-17 15:06:03 +09:00
Yuya Nishihara
e0965c4533 git: on push, update jj's view of remote branches without using import_refs()
This means that the commits previously pinned by remote branches are no longer
abandoned. I think that's more correct since "push" is the operation to
propagate local view to remote, and uninteresting commits should have been
locally abandoned.
2023-10-17 15:06:03 +09:00
Yuya Nishihara
c8a848d260 git: prohibit push to remote named "git"
Since I'm going to make git::push_branches() update the repo view internally,
it should fail fast if the remote name is reserved. Before, the problem was
detected on git::import_refs().
2023-10-17 15:06:03 +09:00
Yuya Nishihara
58897d79c7 git: extract push function that processes branches instead of git refs
Since pushed remote branches will share the common base targets with locals,
these branches should be marked as tracking. git::push_branches() will handle
that. It looks ugly that the public GitBranchPushTargets type keeps "force"-d
branches as a separate set, but we'll need to rework that anyway when we
implement --force-with-lease behavior. So let's leave it for now.

Some of the git::push_updates() tests have been migrated to the new function.
I left a couple of basic tests for git::push_updates() because push_updates()
will be used to implement a low-level "jj git push-refs" command.
2023-10-17 15:06:03 +09:00
Yuya Nishihara
57167cefda git: on import_refs(), don't abandon ancestors of newly fetched refs
I made import_refs() not preserve commits referenced by remote branches at
520f692a46 "git: on import_refs(), don't preserve old branches referenced by
remote refs." The idea is that remote branches are weak, and commits referenced
by these refs can be freely rewritten by future local changes without moving
the refs. I don't think that's wrong, but 520f692a46 also made "new" remote
changes be abandoned by old remote refs. This problem occurs only when
git.auto-local-branch is off.

I think there are two ways to fix the problem:
 a. pin non-tracking remote branches just like local refs
 b. pin newly fetched refs in addition to local refs
This patch implements (b) because it's simpler and more obvious that the
fetched commits would never be abandoned immediately.
2023-10-17 14:49:49 +09:00
Martin von Zweigbergk
c3b45b6fd1 workspace: make working-copy type customizable
This add support for custom `jj` binaries to use custom working-copy
backends. It works in the same way as with the other backends, i.e. we
write a `.jj/working_copy/type` file when the working copy is
initialized, and then we let that file control which implementation to
use (see previous commit).

I included an example of a (useless) working-copy implementation. I
hope we can figure out a way to test the examples some day.
2023-10-16 22:33:44 -07:00
Martin von Zweigbergk
6bfd618275 workspace: load working copy implementation dynamically
This makes `Workspace::load()` look a new `.jj/working_copy/type` file
in order to load the right working copy implementation, just like
`Repo::load()` picks the right backends based on `.jj/store/type`,
`.jj/op_store/type`, etc. We don't write the file yet, and we don't
have a way of adding alternative working copy implementations, so it
will always be `LocalWorkingCopy` for now.
2023-10-16 22:33:44 -07:00
Martin von Zweigbergk
e1f00d9426 working copy: pass commit instead of tree into check_out()
Our internal working copy implementations at Google will need the
commit so they can walk history backwards until they get to a "public"
commit. They'll then use that to tell build tools and virtual file
systems to present that as a base.

I'm not sure if we'll need to update `reset()` too. It's currently
only used by `jj untrack`, which doesn't change the commit's parent,
so it wouldn't affect any history walks.
2023-10-16 22:33:44 -07:00
Martin von Zweigbergk
7c8a0a18f9 repo: define types for backend initializer functions
`ReadonlyRepo::init()` takes callbacks for initializing each kind of
backend. We called these things like `op_store_initializer`. I found
that confusing because it is not a `OpStoreFactory` (which is for
loading an existing backend). This patch tries to clarify that by
renaming the arguments and adding types for each kind of callback
function.
2023-10-16 22:33:44 -07:00
Yuya Nishihara
9cafff87e1 cli: add API and branch subcommand to track/untrack remote branches
This patch adds MutableRepo::track_remote_branch() as we'll probably need to
track the default branch on "jj git clone". untrack_remote_branch() is also
added for consistency.
2023-10-16 23:21:05 +09:00
Yuya Nishihara
4af678848d op_store: minimal change to load/store tracking state of remote branches
We could instead migrate the storage types to (local_branches, remote_views),
but that would be more involved and break forward compatibility with little
benefit. Maybe we can do that later when we introduce remote tags.
2023-10-16 23:21:05 +09:00
Yuya Nishihara
4cd2518be0 git: on import_refs(), respect tracking state of existing remote refs
In this commit, new behavior is tested by using in-memory view data. Data
persistence and track/untrack commands will be implemented soon.
2023-10-16 23:21:05 +09:00
Yuya Nishihara
a697175674 view: add tracking state to RemoteRef
The state field isn't saved yet. git import/export code paths are migrated,
but new tracking state is always calculated based on git.auto-local-branch
setting. So the tracking state is effectively a global flag.

As we don't know whether the existing remote branches have been merged in to
local branches, we assume that remote branches are "tracking" so long as the
local counterparts exist. This means existing locally-deleted branch won't
be pushed without re-tracking it. I think it's rare to leave locally-deleted
branches for long. For "git.auto-local-branch = false" setup, users might have
to untrack branches if they've manually "merged" remote branches and want to
continue that workflow. I considered using git.auto-local-branch setting in the
migration path, but I don't think that would give a better result. The setting
may be toggled after the branches got merged, and I'm planning to change it
default off for better Git interop.

Implementation-wise, the state enum can be a simple bool. It's enum just
because I originally considered to pack "forgotten" concept into it. I have
no idea which will be better for future extension.
2023-10-16 23:21:05 +09:00
Martin von Zweigbergk
0582893144 working copy: return Box<dyn LockedWorkingCopy> from start_mutation() 2023-10-15 16:13:19 -07:00
Martin von Zweigbergk
580586d008 working copy: return Box<dyn WorkingCopy> from finish() 2023-10-15 16:13:19 -07:00
Martin von Zweigbergk
6a13fa8264 working copy: add tree_id() to backend trait
Looks like I missed this earlier. I think it makes sense to have on
all working copy implementations.
2023-10-15 16:13:19 -07:00
Martin von Zweigbergk
a733fceba9 working copy: add functions to start/finish modification to backend trait
To keep this patch small, the functions still return the concrete
local-disk implementations. I'll fix that soon.
2023-10-15 16:13:19 -07:00
Martin von Zweigbergk
63654d064b working copy: add sparse pattern functions to backend trait 2023-10-15 15:59:49 -07:00
Martin von Zweigbergk
6457a13260 working copy: add reset() function to the backend trait
This includes documenting the new function and the other types moved
to the `working_copy` module.
2023-10-15 15:59:49 -07:00
Martin von Zweigbergk
0d2247b0df working copy: add check_out() function to the backend trait
This includes documenting the new function and the other types moved
to the `working_copy` module.
2023-10-15 15:59:49 -07:00
Martin von Zweigbergk
781859cb51 working copy: add snapshot() function to the backend trait
This includes documenting the new function and the other types moved
to the `working_copy` module.
2023-10-15 15:59:49 -07:00
Martin von Zweigbergk
3aa57b1a04 working copy: start defining a trait for a locked working copy
As with the `WorkingCopy` trait, this just contains some trivial
methods for now.
2023-10-15 15:59:49 -07:00
Martin von Zweigbergk
49637cb0fd working copy: don't clear tree_state_dirty just before it's dropped 2023-10-15 15:59:49 -07:00
Martin von Zweigbergk
8826639e4e working copy: remove reference from locked instance to base instance
It's going to be easier to define a `LockedWorkingCopy` trait if it
doesn't need to borrow from `WorkingCopy`, so let's remove the
reference we currently have and have
`LockedLocalWorkingCopy::finish()` return the new `LocalWorkingCopy`
instead.

I think the main disadvantage is that we now have to remember to
replace the old `LocalWorkingCopy` instance by the new one, whereas
the compiler would remind us before this commit. We could make
`start_modification()` take an owned `self`, but that would be a bit
annoying to work with when we have the instance stored in a field.
2023-10-15 15:59:49 -07:00
Martin von Zweigbergk
c8184845d7 workspace: replace working_copy_mut() by wrapper type
I'm about to make `LockedLocalWorkingCopy` not borrow from
`LocalWorkingCopy`. That will make it easier to forget to update any
`LocalWorkingCopy` variables when the modifications have been
committed. This patch introduces a wrapper around
`LockedLocalWorkingCopy` to help prevent that.

Thanks to Yuya for the suggestion.
2023-10-15 15:59:49 -07:00
Martin von Zweigbergk
35f9c12cb5 working copy: move LocalWorkingCopy::check_out() to Workspace
`LocalWorkingCopy::check_out()` can be expressed using the planned
`WorkingCopy` trait, so it doesn't need to be in the trait itself
`WorkingCopy`. I wasn't sure if I should make it a free function in
`working_copy`, but I ended up moving it onto `Workspace`.
2023-10-15 15:59:49 -07:00
Yuya Nishihara
8b46712bcb view: add stub method to set remote branch target and state, migrate callers
Since set_remote_branch_target() is called while merging refs, its tracking
state shouldn't be reinitialized. The other callers are migrated to new setter
to keep the story simple.
2023-10-16 05:12:19 +09:00
Yuya Nishihara
35ea607abd view: make get_remote_branch() return RemoteRef instead of RefTarget
I'm going to add tracking state to RemoteRef, and we should compare both
target and state in the tests.
2023-10-16 05:12:19 +09:00
Yuya Nishihara
d730b250a0 view: add absent RemoteRef constructors, proxy methods, and conversion impls
Copied from RefTarget.
2023-10-16 05:12:19 +09:00
Yuya Nishihara
ab1a6e2f71 view: make remote branches iterator yield RemoteRef instead of RefTarget
git::import_refs() will need to read RemoteRef's tracking state.
2023-10-16 05:12:19 +09:00
Yuya Nishihara
e7d93e5bf1 view: turn BranchTarget into borrowed type
This isn't important, but I'm going to change remote_targets to store RemoteRef
instead of RefTarget, so I went ahead and change the other field types as well.
2023-10-16 05:12:19 +09:00
Yuya Nishihara
92facbf21a view: add method to iterate branches of specified remote 2023-10-16 05:12:19 +09:00
Yuya Nishihara
8bdef924c8 view: rename remote_branches() to all_remote_branches()
I'm going to add a method that iterates branches of certain remote, and I
can't find a better name for it than remote_branches(remote_name).
2023-10-16 05:12:19 +09:00
Martin von Zweigbergk
6ca7b5d352 repo: levarage the new static <backend type>::name() functions 2023-10-14 15:21:53 -07:00
Martin von Zweigbergk
f8be0b2030 backends: deduplicate definition of backend names
I copied the example set by `DefaultSubmoduleStore`.
2023-10-14 06:38:35 -07:00
Yuya Nishihara
9186d0fd38 workspace: convert external git repo path to store-relative by constructor
We could fix do_git_clone() instead, but it seemed a bit weird that the
git_repo_path is relative to the store path which is unknown to callers.

Fixes #2374
2023-10-14 22:20:09 +09:00
Yuya Nishihara
b7c7b19eb8 view: migrate in-memory structure to per-remote branches map
There's a subtle behavior change. Unlike the original remove_remote_branch(),
remote_views entry is not discarded when the branches map becomes empty. The
reasoning here is that the remote view can be added/removed when the remote
is added/removed respectively, though that's not implemented yet. Since the
serialized data cannot represent an empty remote, such view may generate
non-unique content hash.
2023-10-14 22:20:00 +09:00
Yuya Nishihara
3ec3cac90b revset: use op_store::View type to resolve branches()/remote_branches()
These functions depend heavily on the underlying data structure, and I haven't
decided abstract View API to access to per-remote data types. Let's use the
underlying data type for now.
2023-10-14 22:20:00 +09:00
Yuya Nishihara
0160eaefd9 view: add function that converts branches to/from per-remote view-like data 2023-10-14 22:20:00 +09:00
Yuya Nishihara
2b78275e22 view: add per-remote view types and iterator that reconstructs BranchTarget
I'm planning to add support for untracked remote branches, and under that
model, there will be many remote branches without local counterparts. That's
the main reason why remote branches are grouped by remote, not by branch name.

The added helper functions will be used by simple_op_store and view.
2023-10-14 22:20:00 +09:00
Yuya Nishihara
198dfa5cbe view: extract remove_remote() from git module
This will be just self.data.remote_views.remove(remote_name), so I'm not gonna
refactor the implementation at this point.
2023-10-13 18:12:45 +09:00
Yuya Nishihara
1d3c830e85 view: rewrite get_branch() callers in tests, remove the method
get_branch() would need to reconstruct the remote_targets map if we migrate
the underlying data structure to per-remote views. Let's remove the method as
it is only used in tests.
2023-10-13 18:12:45 +09:00
Yuya Nishihara
3fb0a3b926 view: add has_branch(name), replace some of get_branch(name) callers
get_branch(name) will be removed soon.
2023-10-13 18:12:45 +09:00
Yuya Nishihara
d840610bad view: rewrite set_branch() callers in tests, remove the method
I'm going to reorganize the underlying data structure, and set_branch() won't
be as simple as it is now.
2023-10-13 18:12:45 +09:00
Martin von Zweigbergk
0aa5f1ae10 working copy: rename working_copy_path() to just path()
It seems pretty clear from the context. Turns out we only use the
function in a test case. Maybe we don't even need it. It's easy to
provide it, though.
2023-10-12 16:10:38 -07:00
Martin von Zweigbergk
9e43207911 working copy: don't expose TreeStateError in LocalWorkingCopy API
The `TreeStateError` type is specific to the current local-disk
working-copy backend, so it should not be part of the generic
working-copy interface I'm trying to create.
2023-10-12 16:10:38 -07:00
Martin von Zweigbergk
0e09d53ce6 working copy: make some reset errors less specific
Same reasoning as the previous commits.
2023-10-12 16:10:38 -07:00
Martin von Zweigbergk
645be615b4 working copy: make some snapshot errors less specific
Same reasoning as the previous commit.
2023-10-12 16:10:38 -07:00
Martin von Zweigbergk
324c40d4c5 working copy: make some checkout errors less specific
I think some of the errors variants in `CheckoutError` are too
specific to the local-disk implementation. Let's merge them and make
them less specific, so it's easier to define a reasonable trait for
the working copy.
2023-10-12 16:10:38 -07:00
Martin von Zweigbergk
33d27ed09f working copy: start defining a working copy trait
This just extracts a trait for the trivial bits to start with.
2023-10-12 16:10:38 -07:00
Yuya Nishihara
69a30b47af refs: migrate classify_branch_push_action() to local/remote targets pair 2023-10-12 16:50:09 +09:00
Yuya Nishihara
420bf79217 view: add method to iterate local/remote pairs of certain remote
This is common operation, and we don't need a map of all remote targets
to calculate push action.
2023-10-12 16:50:09 +09:00
Yuya Nishihara
679a591a22 repo: rewrite diffing of named refs to compare each targets pair
As I'm going to split branches into per-remote map, .get_branch(name) will need
to gather remote branches by name to construct remote_targets map. Let's instead
iterate local/remote branches separately. I also migrated diffing of the other
kinds of refs to filter out unchanged entries upfront.
2023-10-11 19:24:24 +09:00
Yuya Nishihara
6bc19bfa95 git: remove workaround for "branch forget && git fetch"
As we now diff incoming git refs against our known remote branches, the problem
described in the comment no longer occurs. test_branch_forget_fetched_branch()
passes, and the inline comments in the test are still valid.
2023-10-11 06:18:36 +09:00
Yuya Nishihara
7d78ef60d1 git: rewrite diffing of exportable branches to not re-lookup targets by name
As we need to build a set of all branch names anyway, we can also put old/new
targets there. InvalidGitName is moved to caller since the diff function no
longer converts RefName to "refs/" string.
2023-10-11 06:18:36 +09:00
Yuya Nishihara
6f5cc2fd32 git: on export_refs(), copy already-exported local branches to "git" remote
This ensures that our view of the "git" remote is updated even if the last
imported/exported git_refs were out of sync because of "op restore".
2023-10-11 06:18:36 +09:00
Yuya Nishihara
aaf1bbcb4a git: use HashMap to track failed branches internally in export_refs()
This helps to filter out unexported refs in the next commit.
2023-10-11 06:18:36 +09:00
Yuya Nishihara
e160970b79 git: migrate import_refs() to diffing git_refs and known remote refs 2023-10-09 22:31:20 +09:00
Yuya Nishihara
36ee24379b git: build separate lists of git/remote refs to be imported
The idea is that the "remote" refs could have been "op restore"-d whereas
view.git_refs() will never be. The next commit will update known_remote_refs
to be constructed from the current remote branches.

Instead of building these lists in a single loop, we could load new git_refs
to the view first, and then build diffs of the remote refs. I considered that,
but I feel it would be a bit awkward to update refs before importing commits
to the view.

The "remote" refs are stored in BTreeMap since merging order should be stable.
2023-10-09 22:31:20 +09:00
Yuya Nishihara
5bd2ab76f4 git: check reserved remote name while diffing
As I'm going to add separate lists of changed git_refs/remote_refs, it'll
become a bit unclear which one we should check for reserved remotes. The
diff might also be reorganized as a list of (remote, name, kind, old_target,
new_target) where remote == "git" means the git-tracking branch. In this
data structure, the notion of reserved remote name would be lost.
2023-10-09 22:31:20 +09:00
Yuya Nishihara
f062df6da7 git: rename local variables in import_refs()
I'm going to add separate lists of changes for git_refs and remote refs, and
the current changed_git_refs will be the list for the remote refs.
2023-10-09 22:31:20 +09:00
Martin von Zweigbergk
1b9a3e27e0 merged_tree: read before/after trees concurrently
I'm going to rewrite `TreeDiffIterator` to fetch one level (depth) of
the tree at a time and concurrently. One step towards that is to
convert the iterator to a `Stream`. I'd like to do that by making the
current `Iterator` implementation call the new `Stream`
implementation. However, we can't call `futures::executor::block_on()`
on a future that itself calls `futures::executor::block_on()` (as
`Store::read_tree()` does), so the first step is to bubble up the
async-ness a bit. This patch does that by fetching both sides of the
diff concurrently. That should give close to a 2x speedup on
high-latency backends. (It doesn't help with our backend at Google,
however, because we have a daemon process that does some speculative
prefetching that usually downloads the child trees anyway.)
2023-10-08 23:36:49 -07:00
Martin von Zweigbergk
815cf9bf07 merge: implement Default and Extend on MergeBuilder
`futures::stream::Stream::collect()` requires a collection that
implements `Default` and `Extend`, and I would like to to be able to
collect a stream of trees.
2023-10-08 23:36:49 -07:00
Martin von Zweigbergk
5174489959 backend: make read functions async
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:

 * Make the backend methods `async`
 * Use many threads for reading from the backend
 * Add backend methods for batch reading

I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.

I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).

I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).
2023-10-08 23:36:49 -07:00
Martin von Zweigbergk
bd5eef9c5e git_backend: rename some store variables backend in tests
This is to avoid confusion with instances of the `Store` type.
2023-10-08 23:36:49 -07:00
Martin von Zweigbergk
b9a122ffe7 working_copy: inline apply_diff closure
This effectively undoes d8a313cdd4, which is no longer needed since
we just changed that error handling. It should make it easier to share
some of the current if/else blocks.
2023-10-07 14:02:31 -07:00
Martin von Zweigbergk
44eb902171 working_copy: don't crash when updating and tracked file exits on disk
Before this patch, when updating to a commit that has a file that's
currently an ignored file on disk, jj would crash. After this patch,
we instead leave the conflicting files or directories on disk. We
print a helpful message about how to inspect the differences between
the intended working copy and the actual working copy, and how to
discard the unintended changes.

Closes #976.
2023-10-07 14:02:31 -07:00
Martin von Zweigbergk
4601c87710 working_copy: move creation of parent dirs to one place
I'm about to add handling of parent dirs that are existing ignored
files, so it's better to have it in one place. The only functional
difference should be that we now create parent directories for git
submodules. I don't think that matters.
2023-10-07 14:02:31 -07:00
Martin von Zweigbergk
187ba9430a working_copy: rename to local_working_copy
It's about time we make the working copy a pluggable backend like we
have for the other storage. We will use it at Google for at least two
reasons:

 * To support our virtual file system. That will be a completely
   separate working copy backend, which will interact with the virtual
   file system to update and snapshot the working copy.

 * On local disk, we need to tell our build system where to find the
   paths that are not in the sparse patterns. We plan to do that by
   wrapping the standard local working copy backend (the one moved in
   this commit), writing a symlink that points to the mainline commit
   where the "background" files can be read from.

Let's start by renaming the exising implementation to
`local_working_copy`.
2023-10-07 08:19:03 -07:00
Yuya Nishihara
d0bc34e0f2 git: look up "git" remote branches normally 2023-10-07 19:33:35 +09:00
Yuya Nishihara
717d0d3d6d git: on deserialize/import/export, copy refs/heads/* to remote named "git"
I've added a boolean flag to the store to ensure that the migration never runs
more than once after the view gets "op restore"-d. I'll probably reorganize the
branches structure to support non-tracking branches later, but updating the
storage format in a single commit would be too involved.

If jj is downgraded, these "git" remote refs would be exported to the Git repo.
Users might have to remove them manually.
2023-10-07 19:33:35 +09:00
Yuya Nishihara
9407d4ecca view: on deserialize, remove reserved "git" remote refs stored by old jj
I'm going to migrate "refs/heads/" branches to .remote_targets["git"]. This
commit will simplify the story as we won't have to exclude "refs/remotes/git/"
refs when diffing or renaming/removing remote.
2023-10-07 19:33:35 +09:00
Yuya Nishihara
0e82e52c3a revset: remove extra step to resolve full commit id, use prefix matching
Since both has_id() and resolve_prefix() do binary search, their costs are
practically the same. I think has_id() would complete with fewer ops, but such
level of optimization wouldn't be needed here. More importantly, this ensures
that unreachable commits aren't imported by GitBackend::read_commit().
2023-10-07 02:08:36 +09:00
Yuya Nishihara
f0ad1f53ea git_backend: on read_commit(), fall back to importing extras as needed
One problematic scenario is that we have commits imported by old jj, and all
of their descendant commits are created by jj. Therefore import_head_commits()
wouldn't reach the old ancestor commits.

This change might bury a real bug, but I don't have a better alternative. Maybe
we can remove this hack after a couple of jj releases, and add a debug command
that imports all reachable Git commits from all historical heads.

Closes #2343
2023-10-07 02:08:36 +09:00
Yuya Nishihara
a302090de9 git: add hack to unset Git HEAD by using placeholder ref
As we can set HEAD to an arbitrary ref by using .reference_symbolic(), we don't
have to manage a ref that can also be valid as a branch name.

Fixes #1495
2023-10-05 01:32:48 +09:00
Yuya Nishihara
7ccbc0424c git: extract function that resets Git HEAD
I'll add a workaround for the root parent issue #1495 there. We can pass in
the wc parent id instead of the wc_commit object, but we might want to use
wc_commit.id() to generate a unique placeholder ref name.
2023-10-04 01:43:34 +09:00
Yuya Nishihara
7c96cead34 git_backend: rename git_repo_clone() as it isn't just cloning, propagate error
Since git2::Repository::open() will access to the filesystem, it can technically
fail.
2023-10-04 00:04:24 +09:00
Yuya Nishihara
837dc4f47c git_backend: rewrite remaining git_repo() callers, make it private
While debugging git issues, I often ended up creating a deadlock by adding
debug prints. It's also not obvious that git::export_refs() works even if the
git_repo() has already been locked, whereas git::import_refs() wouldn't. Let's
consolidate lock handling to the backend implementation.
2023-10-04 00:04:24 +09:00
Yuya Nishihara
0d63223dad git_backend: proxy git2::Repository methods to manage lock scope internally
Since git_repo() acquires Mutex, it's super easy to create a deadlock.
2023-10-04 00:04:24 +09:00