If the existing index was corrupt, it would have to be rebuilt while loading a
repo (at least in colocated environment.) Then, the index will be rebuilt again.
change_id_index() is only used by Readonly/MutableRepo, so we don't need an
abstraction at Index. evaluate_revset() is somewhat similar, but the callers
rely on &dyn Repo.
Since new operations and views may be added concurrently by another process,
there's a risk of data corruption. The keep_newer parameter is a mitigation
for this problem. It's set to preserve files modified within the last 2 weeks,
which is the default of "git gc". Still, a concurrent process may replace an
existing view which is about to be deleted by the gc process, and the view
file would be lost.
#12
It doesn't make sense to do gc from a non-head operation because that means
either the head operation would be corrupted or the --at-op argument is
ignored.
We current have `Revset::change_id_index()` for creating a
`ChangeIdIndex` for a given revset. I think it will be hard to make it
performant for general revsets, especially in very large repos and
with custom index implementations, like the one we have at Google. If
we instead restrict it to including all ancestors of a set of heads, I
think it will be much easier to implement. We only use
`Revset::change_id_index()` with revsets including all visible commits
today, so we won't lose any current functionality by making it more
restricted.
I plan to replace `Revset::change_id_index()` by
`Index::change_id_index(heads)`, but one of the tests currently uses a
set of commits that does not include ancestors. This patch updates it
to include ancestors (and changes the set of heads to keep the set
small enough for the test).
I'd like to move `change_id_index()` from `Revset` to `Index` (and
make it take the set of visible heads as argument). We currently use
`evaluate_revset_static()` only to get a `ChangeIdIndex`, so a good
place to start is to convert that into `change_id_index_static()`.
The `ChangeIdIndex` type is currently in defined in the `revset`
module because that's the only placed it's used. However, I'd like to
start using it directly from `index`. The idea is to make it possible
to create a `ChangeIdIndex` given a set of heads, without first
creating a `Revset`.
This is quite minor, but it took me a few minutes to figure out
the correct command.
It might be slightly better to print this text inside the build
logs, where people will be looking for certain if the CI fails,
but I didn't immediately find a good way to do so without
complicating the config too much.
We would like to use a non-static version string at Google. We have a
workaround using Lazy, but it seems unfortunate to have to do
that. Using dynamic strings requires Clap's `string` feature, which
increases the binary size by 140 kiB (0.6%).
This isn't technically needed, but it prevents API misuse. Another option
is to do some compile-time substitution, but most callers are tests and the
runtime performance wouldn't matter.
The OpStore backends should have a better way to look up operation by id than
traversing from the op heads. The added method is similar to the commit Index
one, but returns an OpStoreResult because the backend operation can fail.
FWIW, if we want .shortest() in the op log template, we'll probably need a
trait method that returns an OpIndex instead.
I'm going to add try_from_hex(), which requires Self: Sized. Such trait bound
could be added, but I don't think we'll need abstracted ObjectId constructors
at all.
I'm going to add a prefix resolution method to OpStore, but OpStore is
unrelated to the index. I think ObjectId, HexPrefix, and PrefixResolution can
be extracted to this module.
In order to implement GC (#12), we'll need to somehow prune old operations.
Perhaps the easiest implementation is to just remove unwanted operation files
and put tombstone file instead (like git shallow.) However, the removed
operations might be referenced by another jj process running in parallel. Since
the parallel operation thinks all the historical head commits are reachable, the
removed operations would have to be resurrected (or fix up index data, etc.)
when the op heads get merged.
The idea behind this patch is to split the "op log" GC into two steps:
1. recreate operations to be retained and make the old history unreachable,
2. delete unreachable operations if the head was created e.g. 3 days ago.
The latter will be run by "jj util gc". I don't think GC can be implemented
100% safe against lock-less append-only storage, and we'll probably need some
timestamp-based mechanism to not remove objects that might be referenced by
uncommitted operation.
FWIW, another nice thing about this implementation is that the index is
automatically invalidated as the op id changes. The bad thing is that the
"undo" description would contain an old op id. It seems the performance is
pretty okay.
This will be used in "jj op abandon ..op_id" command. The "op_id..@" range will
be reparented onto the root operation.
The current implementation is good enough for local repos, but it won't scale.
We might want to extract it as a trait method or introduce OpIndex for
efficient DAG operation.
Fixes#2760
Given the tree:
```
A-B-C
\
B2
```
And the command `jj rebase -s B -d B2`
We were previously marking B as abandoned, despite the comment stating that we were marking it as being succeeded by B2. This resulted in a call to `rewrite(rewrites={}, abandoned={B})` instead of `rewrite(rewrites={B=>B2}, abandoned={})`, which then made the new parent of `C` into `A` instead of `B2`
Finally, there are no test uses of these APIs. `DescendantRebaser` is made
`pub(crate)`, since it is used by `MutRepo`. Other functions are made private.
This completes the process of removing DescendantRebaser-related APIs from
tests. It requires creating some new test utils and a new
`rebase_descendants_with_option_return_map`.
This commit is a little out of place in this sequence, but
it seems to make more sense for MutRepo to own these maps.
@yuja [pointed out] that any tests written using `create_descendant_rebaser` now
need to do this cleanup, but there are no longer any such tests after the
previous commits and a follow-up commit removes `create_descendant_rebaser`
entirely.
[pointed out]: https://github.com/martinvonz/jj/pull/2737#discussion_r1435754370
This removes uses of `DescendantRebaser::new` or
`MutRepo::create_descendant_rebaser` from most tests. The exceptions are the
tests having to do with abandoning empty commits on rebase, since adjusting
those is a bit more elaborate (see follow-up commits).