This will allow us to issue multiple prevent_gc() requests all at once. It's
not important here, but will be unavoidable when implementing GC. Deleting
tons of refs from packed refs is super slow if the requests were processed one
by one.
* Move canonicalization of the external git repo path into the Workspace::init_git_external().
This keeps necessary code together.
* Add a new variant of WorkspaceInitError for reporting path not found errors. The user error
string is written to pass existing tests.
Because the default index cuts off the traversal at min(generations), including
the root id means all ancestors will be visited. This could be worked around at
the index side, but I think it's the repo/view's responsibility. That being
said, it's not uncommon to pad a revset with "root()", so it might make sense
for the index to special case the root id.
I also removed the redundant .clone().
It seems obvious in hindsight to have a virtual root operation just
like we have a virtual root commit. It removes the same kind of
problems by making sure there's always a common ancestor (or multiple)
between any two commits.
I think the reason I didn't add a root operation from the beginning
was that there used to be a mandatory working-copy commit in the view
(this was before support for multiple workspaces).
Perhaps we should remove the "initialize repo" operation now. The only
difference between their view objects is that the "initialize repo"
operation adds the root commit as a head. We could add that to the
root operation, but then the root operation's value depends on the
commit backend.
We've had the public_heads for as long as we've had the View object,
IIRC (I didn't check), but we still don't use it for anything. I don't
have any concrete plans for using it either. Maybe our config for
immutable commits is good enough, or maybe we'll want something more
generic (like Mercurial's phases). For now, I think we should simplify
by removing it the storage for public heads.
When debugging behavior of badly-GCed repos, I find it's annoying that "op log"
fails because the index can't be loaded. Since "op log" doesn't need a repo, I
think it's better to display the exact op-heads state without merging.
I thought this would be done by dag_walk::topo_order_reverse_lazy_ok(), but
apparently I made it preserve the input order in a way topo_order_reverse()
would do.
Since hidden commits can be looked up by remote_branches() revset for example,
reindexing should traverse ancestors from all named refs in addition to the
visible heads.
change_id_index() is only used by Readonly/MutableRepo, so we don't need an
abstraction at Index. evaluate_revset() is somewhat similar, but the callers
rely on &dyn Repo.
Since new operations and views may be added concurrently by another process,
there's a risk of data corruption. The keep_newer parameter is a mitigation
for this problem. It's set to preserve files modified within the last 2 weeks,
which is the default of "git gc". Still, a concurrent process may replace an
existing view which is about to be deleted by the gc process, and the view
file would be lost.
#12
We current have `Revset::change_id_index()` for creating a
`ChangeIdIndex` for a given revset. I think it will be hard to make it
performant for general revsets, especially in very large repos and
with custom index implementations, like the one we have at Google. If
we instead restrict it to including all ancestors of a set of heads, I
think it will be much easier to implement. We only use
`Revset::change_id_index()` with revsets including all visible commits
today, so we won't lose any current functionality by making it more
restricted.
I plan to replace `Revset::change_id_index()` by
`Index::change_id_index(heads)`, but one of the tests currently uses a
set of commits that does not include ancestors. This patch updates it
to include ancestors (and changes the set of heads to keep the set
small enough for the test).
I'd like to move `change_id_index()` from `Revset` to `Index` (and
make it take the set of visible heads as argument). We currently use
`evaluate_revset_static()` only to get a `ChangeIdIndex`, so a good
place to start is to convert that into `change_id_index_static()`.
The `ChangeIdIndex` type is currently in defined in the `revset`
module because that's the only placed it's used. However, I'd like to
start using it directly from `index`. The idea is to make it possible
to create a `ChangeIdIndex` given a set of heads, without first
creating a `Revset`.
This isn't technically needed, but it prevents API misuse. Another option
is to do some compile-time substitution, but most callers are tests and the
runtime performance wouldn't matter.
The OpStore backends should have a better way to look up operation by id than
traversing from the op heads. The added method is similar to the commit Index
one, but returns an OpStoreResult because the backend operation can fail.
FWIW, if we want .shortest() in the op log template, we'll probably need a
trait method that returns an OpIndex instead.
I'm going to add try_from_hex(), which requires Self: Sized. Such trait bound
could be added, but I don't think we'll need abstracted ObjectId constructors
at all.
I'm going to add a prefix resolution method to OpStore, but OpStore is
unrelated to the index. I think ObjectId, HexPrefix, and PrefixResolution can
be extracted to this module.
This will be used in "jj op abandon ..op_id" command. The "op_id..@" range will
be reparented onto the root operation.
The current implementation is good enough for local repos, but it won't scale.
We might want to extract it as a trait method or introduce OpIndex for
efficient DAG operation.
Fixes#2760
Given the tree:
```
A-B-C
\
B2
```
And the command `jj rebase -s B -d B2`
We were previously marking B as abandoned, despite the comment stating that we were marking it as being succeeded by B2. This resulted in a call to `rewrite(rewrites={}, abandoned={B})` instead of `rewrite(rewrites={B=>B2}, abandoned={})`, which then made the new parent of `C` into `A` instead of `B2`
Finally, there are no test uses of these APIs. `DescendantRebaser` is made
`pub(crate)`, since it is used by `MutRepo`. Other functions are made private.
This completes the process of removing DescendantRebaser-related APIs from
tests. It requires creating some new test utils and a new
`rebase_descendants_with_option_return_map`.
This commit is a little out of place in this sequence, but
it seems to make more sense for MutRepo to own these maps.
@yuja [pointed out] that any tests written using `create_descendant_rebaser` now
need to do this cleanup, but there are no longer any such tests after the
previous commits and a follow-up commit removes `create_descendant_rebaser`
entirely.
[pointed out]: https://github.com/martinvonz/jj/pull/2737#discussion_r1435754370
This removes uses of `DescendantRebaser::new` or
`MutRepo::create_descendant_rebaser` from most tests. The exceptions are the
tests having to do with abandoning empty commits on rebase, since adjusting
those is a bit more elaborate (see follow-up commits).
A possible use case is when doing some archaeology around a certain operation.
The current implementation is quadratic if + is repeated. Suppose op_id is
usually close to the current op heads, I think it'll practically work better
than building a reverse lookup table.