Adds support for revset functions `tracked_remote_branches()` and
`untracked_remote_branches()`. I think this would be especially useful
for configuring `immutable_heads()` because rewriting untracked remote
branches usually wouldn't be desirable (since it wouldn't update the
remote branch). It also makes it easy to hide branches that you don't
care about from the log, since you could hide untracked branches and
then only track branches that you care about.
Maybe it'll also be good to keep RevsetExpression::Union(_) flattened, but
that's not needed to get around stack overflow. The constructed expression
tree is balanced.
test_expand_symbol_alias() is slightly adjusted since there are more than
one representation for "a|b|c" now.
Fixes#4031
Partially resolve a 1.5‐year‐old TODO comment.
Add opt‐in syntax for case‐insensitive matching, suffixing the
pattern kind with `-i`. Not every context supports case‐insensitive
patterns (e.g. Git branch fetch settings). It may make sense to make
this the default in at least some contexts (e.g. the commit signature
and description revsets), but it would require some thought to avoid
more confusing context‐sensitivity.
Make `mine()` match case‐insensitively unconditionally, since email
addresses are conventionally case‐insensitive and it doesn’t take
a pattern anyway.
This currently only handles ASCII case folding, due to the complexities
of case‐insensitive Unicode comparison and the `glob` crate’s lack
of support for it. This is unlikely to matter for email addresses,
which very rarely contain non‐ASCII characters, but is unfortunate
for names and descriptions. However, the current matching behaviour is
already seriously deficient for non‐ASCII text due to the lack of any
normalization, so this hopefully shouldn’t be a blocker to adding the
interface. An expository comment has been left in the code for anyone
who wants to try and address this (perhaps a future version of myself).
Forgetting a workspace removes its working-copy commit, so it makes
sense for it to be abandoned if it is discardable just like editing a
new commit will cause the old commit to be abandoned if it is
discardable.
Currently, if two workspaces are editing the same discardable commit and
one of them switches to editing a different commit, it is abandoned even
though the other workspace is still editing it. This commit treats
workspaces as referencing their working-copy commits so that they won't
be abandoned.
Fixes#2476.
Previously, if there was a change id match within the short prefix
lookup set, `jj` would not look for commits with that same change id
outside the short prefix set. So, it wouldn't find the conflicted
commits for a commit with a divergent (AKA conflicted) change id.
It's common to create empty working-copy commits while using jj, and
currently the author timestamp for a commit is only set when it is first
created. If you create an empty commit, then don't work on a repo for a
few days, and then start working on a new feature without abandoning the
working-copy commit, the author timestamp will remain as the time the
commit was created rather than being updated to the time that work began
or finished.
This commit changes the behavior so that discardable commits (empty
commits with no description) by the current user have their author
timestamps reset when they are rewritten, meaning that the author
timestamp will become finalized whenever a commit is given a description
or becomes non-empty.
Since we've split (local, remotes) branches to (locals, remotes { branches }),
.has_branch() API no longer makes much sense. Callers often need to check if
a remote branch is tracked.
- make an internal set of watchman extensions until the client api gets
updates with triggers
- add a config option to enable using triggers in watchman
Co-authored-by: Waleed Khan <me@waleedkhan.name>
It's been more than 6 months since we added support for dynamically
selecting the working copy implementation. This patch drops support
for selecting the default implementation of that and other stores.
As we discovered in the `jj fix` tests,
`MergedTreeBuilder::write_tree()` doesn't try to resolve conflicts,
not even trivial ones. This patch fixes that.
Otherwise, newly created default branch would be re-imported as a new Git HEAD.
This could be addressed by cmd_git_init(), but the same situation can be
crafted by using "git checkout -b".
Some backends, like the one we have at Google, can restrict access to
certain files. For such files, if they return a regular
`BackendError::ReadObject`, then that will terminate iteration in many
cases (e.g. when diffing or listing files). This patch adds a new
error variant for them to return instead, plus handling of such errors
in diff output and in the working copy.
In order to test the feature, I added a new commit backend that
returns the new `ReadAccessDenied` error when the caller tries to read
certain objects.
Merge commits are very similar to non-merge commits in jj. An empty
merge commit with no description is not really different from an empty
non-merge commit with no description. As we discussed on
https://github.com/martinvonz/jj/issues/2859, we should not treat
merge commits differently when updating away from them. This patch
adds test for the current behavior (which is to leave the merge commit
in place).
I recently needed to test something on top of a two branches at the
same time, so I created a new commit on top of both of them (i.e. a
merge commit). I then ran tests and made some adjustments to the
code. These adjustments belonged in one of the parent branches, so I
used `jj squash --into` to squash it in there. Unfortunately, that
meant that my working copy became a single-parent commit based on one
of the branches only. We already had #2859 for tracking this issue.
This patch changes the behavior so we create a new working-copy commit
with all of the previous parents.
This should be a no-op, though that is not necessarily obvious in corner
cases.
Note that libgit2 already performs the push negotiation even when
pushing without force (without `+` in the refspec).
As explained in the commit, our logic is a bit more complicated than
that of `git push --force-with-lease`. This is to match the behavior of
`jj git fetch` and branch conflict resolution rules.
In chromium/src.git, this gives an approximate ~0.83s speedup for
commands if the Git index is empty, and a ~0.14s slowdown if the Git
index is non-empty.
As most users using jj will likely be using jj to write to a colocated
repo - and therefore avoid modifying the Git index - this should be a
general speedup for most colocated checkouts.
This revset correctly implements "reachability" from a set of source commits following both parent and child edges as far as they can go within a domain set. This type of 'bfs' query is currently impossible to express with existing revset functions.
This follows up on https://github.com/martinvonz/jj/pull/3459 and adds a
label to the closing delimeter of each conflict, e.g. "Conflict 1 of 3
ends".
I didn't initially put any label at the ending delimeter since the
starting delimeter is already marked with "Conflict 1 of 3". However,
I'm now realizing that when I resolve conflicts, I usually go from top
to bottom. The first thing I do is delete the starting conflict
delimeter. It is when I get to the *end* of the conflict that I wonder
whether there are any more conflicts left in the file.
The function now returns an iterator over `Result`s, matching
`Operation::parents()`.
I updated callers to also propagate the error where it was trivial.
For example,
```
<<<<<<< Conflict 1 of 3
+++++++ Contents of side #1
left 3.1
left 3.2
left 3.3
%%%%%%% Changes from base to side #2
-line 3
+right 3.1
>>>>>>>
```
or
```
<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-line 3
+right 3.1
+++++++ Contents of side #2
left 3.1
left 3.2
left 3.3
>>>>>>>
```
Currently, there is no way to disable these, this is TODO for a future
PR. Other TODOs for future PRs: make these labels configurable. After
that, we could support a `diff3/git`-like conflict format as well, in
principle.
Counting conflicts helps with knowing whether you fixed all the
conflicts while you are in the editor.
While labeling "side #1", etc, does not tell you the commit id or
description as requested in #1176, I still think it's an improvement.
Most importantly, I hope this will make `jj`'s conflict format less
scary-looking for new users.
I've used this for a bit, and I like it. Without the labels, I would see
that the two conflicts have a different order of conflict markers, but I
wouldn't be able to remember what that means. For longer diffs, it can
be tricky for me to quickly tell that it's a diff as opposed to one of
the sides. This also creates some hope of being able to navigate a
conflict with more than 2 sides.
Another not-so-secret goal for this is explained in
https://github.com/martinvonz/jj/pull/3109#issuecomment-2014140627. The
idea is a little weird, but I *think* it could be helpful, and I'd like
to experiment with it.
The format is 7 characters of the separator followed by a space and arbitrary
text, followed by a newline. Separator followed by a newline is also allowed.
E.g.:
<<<<<<< Random text
%%%%%%% Random text
line 2
-line 3
+left
line 4
+++++++ Random text
right
%%%%%%% Random text
line 2
+forward
line 3
line 4
>>>>>>> Random text
This commit only allows reading such conflicts.
I considered allowing longer separators (`<<<<<<<<<<<<<< Random text`), but we
wouldn't currently write them, so let's be strict for now.
7 characters if they are followed by a space and arbitrary text
We already have two uses for this function and I think we're soon
going to have more.
The function record the old commit as abandoned with the new parents,
which is typically what you want. We could record it as abandoned with
the old parents instead but then we'd have to do an extra iteration to
find the parents when rebasing any children. It would also be
confusing if
`rewriter.set_parents(new_parents).record_abandoned_commit()` didn't
respect the new parents.
This addresses the test instability. The underlying problem still exists, but
it's unlikely to trigger user-facing issues because of that. A repo instance
won't be reused after gc() call.
Fixes#3537
Apparently, these gc() invocations rely on that the previous "git gc" packed
all refs so there are no loose refs to compare mtimes. If there were new (or
remaining) loose refs, mtime comparison could fail. I also added +1sec to
effectively turn off the keep_newer option, which isn't important in these
tests.
There are several existing commands that would benefit from an API
that makes it easier to rewrite a whole graph of commits while
transforming them in some way.
`jj squash` is one example. When squashing into an ancestor, that
command currently rewrites the ancestor, then rebases descendants, and
then rewrites the rewritten source commit. It would be better to
rewrite the source commit (and any descendants) only once.
Another example is the future `jj fix`. That command will want to
rewrite a graph while updating the trees. There's currently no good
API for that; you have to manually iterate over descendants and
rewrite them.
This patch adds a new `MutableRepo::transform_descendants()` method
that takes a callback which gets a `CommitRewriter` passed to it. The
callback can then decide to change the parents, the tree, etc. The
callback is also free to leave the commit in place or to abandon it.
I updated the regular `rebase_descendants()` to use the new function
in order to exercise it. I hope we can replace all of the
`rebase_descendant_*()` flavors later.
I added a `replace_parent()` method that was a bit useful for the test
case. It could easily be hard-coded in the test case instead, but I
think the method will be useful for `jj git sync` and similar in the
future.
`CommitRewriter` wraps 3 of the arguments, so I think it makes sense
to pass it instead. More importantly, I hope to continue refactoring
so many of the callers already have a `CommitRewriter`.
It's cheap to look up commits again from the cache in `Store` but it
can be expensive to look up commits we didn't end up needing. This
will make it easier to refactor further and be able to cheaply set
preliminary parents for a rewritten commits and then let the caller
update them.
I'm going to add a helper struct to help with rewriting commits. I
want to make that struct own the old commit and the new parents to
simplify lifetimes. This patch prepares for that by passing the
commits by value to `rebase_commit()`.
This implements the other workaround described in 57167cefda "git: on
import_refs(), don't abandon ancestors of newly fetched refs":
> I think there are two ways to fix the problem:
> a. pin non-tracking remote branches just like local refs
> b. pin newly fetched refs in addition to local refs
> This patch implements (b) because it's simpler and more obvious that the
> fetched commits would never be abandoned immediately.
The idea of (a) is that untracked remote branches are independent read-only
refs, and read-only branches shouldn't be rewritten implicitly. Once the
branch gets rewritten or abandoned by user, these remote refs will be hidden,
and won't be pinned anymore.
Since (a) effectively supersedes (b), this patch also removes the original
workaround.
Fixes#3495
I don't think we have any callers left that call
`record_rewritten_commit()` multiple times within a transaction and
expect it to result in divergence. I think we should consider it a bug
to do that.
Apart from (IMO) looking nicer, this will also sidestep the potential problem
that if the file contains actual jj conflict markers (`>>>>>>>` in the beginning
of a line, for example), jj would currently have trouble materializing and
subsequently parsing conflicts in the file if it actually became conflicted.
I'll demo this bug in either this or a subsequent PR. It's the kind of bug that
sounds serious in theory but might never cause a problem in practice.
After this PR, only `docs/tutorial.md` has a conflict marker that's not indented.
There's only one there, so hopefully it won't be too much of a pain to deal with.
I also indented other strings in `test_conflicts.rs`. IMO, this looks nice and
more consistent with the `insta::assert_snapshot` output. I didn't spend the
time to do the same for `test_resolve_command`.
Suppose the generation value is usually small, it should be faster to do
bounded range look up first 'y-', then walk ancestors with the unwanted set
'y-..x'.
This helps to eliminate higher-ranked trait bounds from RevWalkRevset and
RevWalk combinators to be added. Since &CompositeIndex is now a real reference,
it can be passed to functions as index: &T.
Initially we were thinking to have `Revset` return something like
`CachedRevset`:
```
pub trait CachedRevset {
fn iter(&self) -> Box<dyn Iterator<Item = Commit>>;
fn contains(&self, &CommitId) -> bool;
}
```
But we weren't sure what use case for `iter` would be, so we dropped the `iter`
method. `CachedRevset` with single `contains` method needed a better name. We
weren't able to come up with one, so we decided instead to have a method on
`Revset` that returns a closure to check if a commit is in a revset.
Follows up 7552f939c6 "tests: disable most gpg integration tests on Windows."
I couldn't find this test failing in a few samples before, but it does now.
This removes the special handling of the working-copy commit. By
recording when an empty/emptied commit was abanoned, we rebase
descendants correctly and create a new empty working-copy commit on
top.
This adds a guard to the gpg signing tests which will skip the test if
`gpg` is not installed on the system.
This is done in order to avoid requiring all collaborators to have setup
all the tools on their local machines that are required to test commit
signing.
When doing things like testing snapshot performance differences,
this allows you to turn off the monitor, no matter what the enabled
user or repository configuration has, e.g.
jj st --config-toml='core.fsmonitor="none"'
Signed-off-by: Austin Seipp <aseipp@pobox.com>
It should be useful at least in the presentation layer to know which
operations correspond to working-copy snapshots. They might be
rendered differently in the graph, for example. Or maybe an undo
command wants to warn if you just undid a snapshot operation. This
patch just introduces a field in the metadata to store the
information.
I think the conclusion from #2600 is that at least auto-rebasing
should not simplify merge commits that merge a commit with its
ancestor. Let's start by adding an option for that in the library.
The shortest change id prefix will become a few digits longer, but I think
that's acceptable. Entries included in the "revsets.short-prefixes" set are
unaffected.
The reachable set is calculated eagerly, but this is still faster as we no
longer need to sort the reachable entries by change id. The lazy version will
save another ~100ms in mid-size repos.
"jj log" without working copy snapshot:
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
--ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 353.6 ms ± 11.9 ms [User: 266.7 ms, System: 87.0 ms]
Range (min … max): 329.0 ms … 365.6 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 271.3 ms ± 9.9 ms [User: 183.8 ms, System: 87.7 ms]
Range (min … max): 250.5 ms … 282.7 ms 20 runs
Relative speed comparison
1.99 ± 0.16 target/release-with-debug/jj-0 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
1.53 ± 0.12 target/release-with-debug/jj-1 -R ~/mirrors/linux --ignore-working-copy log -r.. -l100 --config-toml='revsets.short-prefixes=""'
```
"jj status" with working copy snapshot (watchman enabled):
```
% hyperfine --sort command --warmup 3 --runs 20 -L bin jj-0,jj-1,jj-2 \
-s "target/release-with-debug/{bin} -R ~/mirrors/linux debug reindex" \
"target/release-with-debug/{bin} -R ~/mirrors/linux \
status --config-toml='revsets.short-prefixes=\"\"'"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 396.6 ms ± 10.1 ms [User: 300.7 ms, System: 94.0 ms]
Range (min … max): 373.6 ms … 408.0 ms 20 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
Time (mean ± σ): 318.6 ms ± 12.6 ms [User: 219.1 ms, System: 94.1 ms]
Range (min … max): 294.2 ms … 333.0 ms 20 runs
Relative speed comparison
1.85 ± 0.14 target/release-with-debug/jj-0 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
1.48 ± 0.12 target/release-with-debug/jj-1 -R ~/mirrors/linux status --config-toml='revsets.short-prefixes=""'
```
This basically means that the change ids are interned. We'll implement binary
search over the sorted change ids table. The table could be sorted differently
for better cache locality, but it is in lexicographical order for simplicity.
With my testing, the cost of the id lookup isn't dominant.
Unlike the parent entries, the size of the per-id overflow items isn't saved.
That's s because the number of the same-change-id commits is either 1 or many.
It doesn't make sense to allocate 8 bytes for each change id. Instead, we'll
pay extra indirection cost to determine the size.
Apparently, gix has 100ms timeout. Since this test tries to create contended
situation, it's possible that the ref lock can't be acquired. I've added
upper bound to the retry loop at b37293fa68 "tests: add upper bound to
test_concurrent_read_write_commit() loop", so ignoring arbitrary errors
should be okay.
The problem can be reproduced on my Linux machine by inserting 10ms sleep() to
gix and increasing the concurrency.
Fixes#3069
The `ContentHash` documentation specifies that implementations for enums should
hash the ordinal number of the variant contained in the enum as a 32-bit
little-endian number and then hash the contents of the variant, if any.
The current implementations for `std::Option`, `MergedTreeId`, and
`RemoteRefState` are non-conformant since they hash the ordinal number as a u8
with platform specific endianness.
Fixes#3051
Since IdIndex sorts the entries by using .sort_unstable_by_key(), the order of
the same-key elements is undefined. Perhaps, it's stable for short arrays, and
the test passes because of that.
I'm going to introduce breaking changes in index format. Some of them will
affect the file size, so version number or signature won't be needed. However,
I think it's safer to detect the format change as early as possible.
I have no idea if embedded version number is the best way. Because segment
files are looked up through the operation links, the version number could be
stored there and/or the "segments" directory could be versioned. If we want to
support multiple format versions and clients, it might be better to split the
tables into data chunks (e.g. graph entries, commit id table, change id table),
and add per-chunk version/type tag. I choose the per-file version just because
it's simple and would be non-controversial.
As I'm going to introduce format change pretty soon, this patch doesn't
implement data migration. The existing index files will be deleted and new
files will be created from scratch.
Planned index format changes include:
1. remove unused "flags" field
2. inline commit parents up to two
3. add sorted change ids table
This is #3002 with tests rerun to account for changes
to `strsim`, as @thoughtpolice noticed in
https://github.com/martinvonz/jj/pull/3002#issuecomment-1936763101
The string similarity changes include an example that
seems better and one that seems worse. Decreasing
the threshold definitely makes things worse.
I was a bit surprised to learn (or be reminded?) that checking out
symlinks on Windows leads to a panic. This patch fixes the crash by
materializing symlinks from the repo as regular files. It also updates
the snapshotting code so we preserve the symlink-ness of a path. The
user can update the symlink in the repo by updating the regular file
in the working copy. This seems to match Git's behavior on Windows
when symlinks are disabled.
this greatly speeds up the time to run all tests, at the cost of slightly larger recompile times for individual tests.
this unfortunately adds the requirement that all tests are listed in `runner.rs` for the crate.
to avoid forgetting, i've added a new test that ensures the directory is in sync with the file.
## benchmarks
before this change, recompiling all tests took 32-50 seconds and running a single test took 3.5 seconds:
```
; hyperfine 'touch lib/src/lib.rs && cargo t --test test_working_copy'
Time (mean ± σ): 3.543 s ± 0.168 s [User: 2.597 s, System: 1.262 s]
Range (min … max): 3.400 s … 3.847 s 10 runs
```
after this change, recompiling all tests take 4 seconds:
```
; hyperfine 'touch lib/src/lib.rs ; cargo t --test runner --no-run'
Time (mean ± σ): 4.055 s ± 0.123 s [User: 3.591 s, System: 1.593 s]
Range (min … max): 3.804 s … 4.159 s 10 runs
```
and running a single test takes about the same:
```
; hyperfine 'touch lib/src/lib.rs && cargo t --test runner -- test_working_copy'
Time (mean ± σ): 4.129 s ± 0.120 s [User: 3.636 s, System: 1.593 s]
Range (min … max): 3.933 s … 4.346 s 10 runs
```
about 1.4 seconds of that is the time for the runner, of which .4 is the time for the linker. so
there may be room for further improving the times.
Our virtual file system at Google (CitC) would like to know the commit
so it can scan backwards and find the closest mainline tree based on
it. Since we always record an operation id (which resolves to a
working-copy commit) when we write the working-copy state, it doesn't
seem like a restriction to require a commit.
This mostly reverts https://github.com/martinvonz/jj/pull/2901 as well as its
fixup https://github.com/martinvonz/jj/pull/2903. The related bug is reopened,
see https://github.com/martinvonz/jj/issues/2869#issuecomment-1920367932.
The problem is that while the fix did fix#2869 in most cases, it did
reintroduce the more severe bug https://github.com/martinvonz/jj/issues/2760
in one case, if the working copy is the commit being rebased.
For example, suppose you have the tree
```
root -> A -> B -> @ (empty) -> C
```
### Before this commit
#### Case 1
`jj rebase -s B -d root --skip-empty` would work perfectly before this
commit, resulting in
```
root -> A
\-------B -> C
\- @ (new, empty)
```
#### Case 2
Unfortunately, if you run `jj rebase -s @ -d A --skip-empty`, you'd have the
following result (before this commit), which shows the reintroduction of #2760:
```
root -> A @ -> C
\-- B
```
with the working copy at `A`. The reason for this is explained in
https://github.com/martinvonz/jj/pull/2901#issuecomment-1920043560.
### After this commit
After this commit, both case 1 and case 2 will be wrong in the sense of #2869,
but it will no longer exhibit the worse bug #2760 in the second case.
Case 1 would result in:
```
root -> A
\-------B -> @ (empty) -> C
```
Case 2 would result in:
```
root -> A -> @ -> C
\-- B
```
with the working copy remaining a descendant of A
With my jj repo, the number of jj/keep refs went down from 87887 to 27733.
The .git directory size is halved, but we'll need to clean up extra and index
files to save disk space. "git gc --prune=now && jj debug reindex" passed, so
the repo wouldn't be corrupted.
#12
I'm going to make WorkspaceCommandHelper::maybe_snapshot() snapshot the working
copy before importing refs. git::import_some_refs() can rebase the working copy
branch and therefore @ can be moved. git::import_head() doesn't, and it should
be invoked before snapshotting.
git::import_head() is inserted to some of the git:import_refs() callers where
HEAD seems to matter. I feel it's a bit odd that the HEAD ref is imported to
non-colocated repo, but "jj init --git-repo" relies on that, and I think the
existence of HEAD@git is harmless. It's merely a ref to the revision checked
out somewhere else.
We didn't have any tests with negative snapshots (after a `-------`
line). I initially thought we couldn't produce such conflict markers
anymore. I'm not sure we want to render conflicts like the one in the
test like this. I don't think I intended for `add_index` in the code
to be able to be two steps ahead of the remove. Maybe we should
rewrite the algorithm to not do that and thus never produce negative
snapshots.
It seems obvious in hindsight to have a virtual root operation just
like we have a virtual root commit. It removes the same kind of
problems by making sure there's always a common ancestor (or multiple)
between any two commits.
I think the reason I didn't add a root operation from the beginning
was that there used to be a mandatory working-copy commit in the view
(this was before support for multiple workspaces).
Perhaps we should remove the "initialize repo" operation now. The only
difference between their view objects is that the "initialize repo"
operation adds the root commit as a head. We could add that to the
root operation, but then the root operation's value depends on the
commit backend.
We've had the public_heads for as long as we've had the View object,
IIRC (I didn't check), but we still don't use it for anything. I don't
have any concrete plans for using it either. Maybe our config for
immutable commits is good enough, or maybe we'll want something more
generic (like Mercurial's phases). For now, I think we should simplify
by removing it the storage for public heads.
Since hidden commits can be looked up by remote_branches() revset for example,
reindexing should traverse ancestors from all named refs in addition to the
visible heads.
change_id_index() is only used by Readonly/MutableRepo, so we don't need an
abstraction at Index. evaluate_revset() is somewhat similar, but the callers
rely on &dyn Repo.
Since new operations and views may be added concurrently by another process,
there's a risk of data corruption. The keep_newer parameter is a mitigation
for this problem. It's set to preserve files modified within the last 2 weeks,
which is the default of "git gc". Still, a concurrent process may replace an
existing view which is about to be deleted by the gc process, and the view
file would be lost.
#12
We current have `Revset::change_id_index()` for creating a
`ChangeIdIndex` for a given revset. I think it will be hard to make it
performant for general revsets, especially in very large repos and
with custom index implementations, like the one we have at Google. If
we instead restrict it to including all ancestors of a set of heads, I
think it will be much easier to implement. We only use
`Revset::change_id_index()` with revsets including all visible commits
today, so we won't lose any current functionality by making it more
restricted.
I plan to replace `Revset::change_id_index()` by
`Index::change_id_index(heads)`, but one of the tests currently uses a
set of commits that does not include ancestors. This patch updates it
to include ancestors (and changes the set of heads to keep the set
small enough for the test).
This isn't technically needed, but it prevents API misuse. Another option
is to do some compile-time substitution, but most callers are tests and the
runtime performance wouldn't matter.
I'm going to add try_from_hex(), which requires Self: Sized. Such trait bound
could be added, but I don't think we'll need abstracted ObjectId constructors
at all.
I'm going to add a prefix resolution method to OpStore, but OpStore is
unrelated to the index. I think ObjectId, HexPrefix, and PrefixResolution can
be extracted to this module.
This will be used in "jj op abandon ..op_id" command. The "op_id..@" range will
be reparented onto the root operation.
The current implementation is good enough for local repos, but it won't scale.
We might want to extract it as a trait method or introduce OpIndex for
efficient DAG operation.
Fixes#2760
Given the tree:
```
A-B-C
\
B2
```
And the command `jj rebase -s B -d B2`
We were previously marking B as abandoned, despite the comment stating that we were marking it as being succeeded by B2. This resulted in a call to `rewrite(rewrites={}, abandoned={B})` instead of `rewrite(rewrites={B=>B2}, abandoned={})`, which then made the new parent of `C` into `A` instead of `B2`
This completes the process of removing DescendantRebaser-related APIs from
tests. It requires creating some new test utils and a new
`rebase_descendants_with_option_return_map`.
This removes uses of `DescendantRebaser::new` or
`MutRepo::create_descendant_rebaser` from most tests. The exceptions are the
tests having to do with abandoning empty commits on rebase, since adjusting
those is a bit more elaborate (see follow-up commits).
A possible use case is when doing some archaeology around a certain operation.
The current implementation is quadratic if + is repeated. Suppose op_id is
usually close to the current op heads, I think it'll practically work better
than building a reverse lookup table.
My jj repo contains such head commits, and "jj debug reindex" fails. To address
this problem, we'll probably need to implement GC, and the user will discard
operations before the first bad op id.
This requires creating a new public API as a substitute. I took the opportunity
to also add some comments to the
`MutRepo::record_rewritten_commit`/`record_abandoned_commit` functions.
I imade the simplest possible addition to the API; it is not a very elegant
one. Eventually, the entire `record_rewritten_commit` API should probably be
refactored again.
I also added some comments explaining what these functions do.
Spotted while adding error propagation there. This wouldn't likely be a real
problem because "jj debug reindex" removes all of the operation links.
The "} else {" condition is removed because it doesn't make sense to exclude
only the exact parent_op_id operation. This can be optimized to not walk
ancestors of the parent_op_id operation, but I don't see a motivation to add
tests covering such scenarios. It's pretty rare that an intermediate operation
link is missing.
We'll probably need a better abstraction, but a separate method is good
enough to remove unsafe code from ReadonlyRepo.
I'm not sure if this is feasible for the other backends, but I guess there
would be less lifetimed variables than DefaultReadonlyIndex.
It seems better to have the caller pass the transaction description
when we finish the transaction than when we start it. That way we have
all the information we want to include more readily available.
Previously, the function relied on both the `self.parent_mapping` and
`self.rebased`. If `(A,B)` was in `parent_mapping` and `(B,C)` was in `rebased`,
`new_parents` would map `A` to `C`.
Now, `self.rebased` is ignored by `new_parents`. In the same situation,
DescendantRebaser is changed so that both `(A,B)` and `(B,C)` are in
`parent_mapping` before. `new_parents` now applies `parent_mapping` repeatedly,
and will map `A` to `C` in this situation.
## Cons
- The semantics are changed; `new_parents` now panics if `self.parent_mapping`
contain cycles. AFAICT, such cycles never happen in `jj` anyway, except for
one test that I had to fix. I think it's a sensible restriction to live with;
if you do want to swap children of two commits, you can call
`rebase_descendants` twice.
## Pros
- I find the new logic much easier to reason about. I plan to extract it into a
function, to be used in refactors for `jj rebase -r` and `jj new --after`. It
will make it much easier to have a correct implementation of `jj rebase -r
--after`, even when rebasing onto a descendant.
- The de-duplication is no longer O(n^2). I tried to keep the common case fast.
## Alternatives
- We could make `jj rebase` and `jj new` use a separate function with the
algorithm shown here, without changing DescendantRebaser. I believe that the new
algorithm makes DescendatRebaser easier to understand, though, and it feels more
elegant to reduce code duplication.
- The de-duplication optimization here is independent of other changes, and
could be used on its own.
default_index_store.rs is relatively big, and it contains types and impls in
arbitrary order. Let's split them into sub modules. After everything moved,
mod.rs will only contain tests.