Commit graph

496 commits

Author SHA1 Message Date
Yuya Nishihara
38a7e7fd62 git_backend: on read_commit(), bulk-update extra metadata table of ancestors
Otherwise, "jj init --git-repo ." would create extra table files per commit,
and merge them.

I considered adding an explicit GitBackend method to be called from
git::import_refs(), but the call order matters. The method should be invoked
before calling store.get_commit(..) or mut_repo.add_head(..). Since commits
are likely to be loaded from the head, we can instead make read_commit()
import ancestor metadata at all.

Alternatively, we could make a Git commit hidden until it's inserted into
the extra table. It's rather big change, and I wouldn't like to do that
without thinking more thoroughly.
2023-05-21 08:29:00 +09:00
Yuya Nishihara
a9422460cb git_backend: ensure change id generated from git commit id never reassigned
Fixes #924
2023-05-20 15:53:23 +09:00
Yuya Nishihara
9aa72f6f1d git_backend: add lock to prevent racy change id assignments
My first attempt was to fix up corrupted index when merging, but it turned
out to be not easy because the self side may contain corrupted data. It's
also possible that two concurrent commit operations have exactly the same
view state (because change id isn't hashed into commit id), and only the
table heads diverge.

#924
2023-05-20 15:53:23 +09:00
Yuya Nishihara
3655da4f01 tests: add tests for concurrent git commit/change id assignment
Since non-Git metadata isn't hashed, we can't rely on the consistency
provided by content-addressed storage. The problem is also described in
https://github.com/martinvonz/jj/issues/3#issuecomment-947998487

#924
2023-05-20 15:53:23 +09:00
Ilya Grigoriev
714aff63e6 git.rs: properly abandon commits from moved/deleted branches on remote (#864)
This bug concerns the way `import_refs` that gets called by `fetch` computes
the heads that should be visible after the import.

Previously, the list of such heads was computed *before* local branches were
updated based on changes to the remote branches. So, commits that should have
been abandoned based on this update of the local branches weren't properly
abandoned.

Now, `import_refs` tracks the heads that need to be visible because of some ref
in a mapping keyed by the ref. If the ref moves or is deleted, the
corresponding heads are updated.

Fixes #864
2023-05-17 17:57:58 -07:00
Ilya Grigoriev
cf4a603eb4 Tests demonstrating a similar bug with moved rather than deleted branch 2023-05-17 17:57:58 -07:00
Ilya Grigoriev
a0ee2b0dbd lib/tests/test_git.rs: New test to demonstrate #864's root cause 2023-05-17 17:57:58 -07:00
Ilya Grigoriev
bda3d3e50b test_import_refs_reimport: very minor improvement to a test 2023-05-17 17:57:58 -07:00
Martin von Zweigbergk
87a925d736 git_backend: return timestamps for what was actually written
Now that we return the written commit from `write_commit()`, let's
make the timestamps match what was actually written, accounting for
the whole-second precision and the adjustment we do to avoid
collisions.
2023-05-12 15:20:44 -07:00
Martin von Zweigbergk
e7419e76a1 backend: replace git_repo() by as_any()
This has several advantages:

 * Makes it possible to downcast to non-Git custom backends (might be
   useful at Google, but we haven't needed it yet)

 * Lets us access more specific functionality on the `GitBackend`,
   making it possible to access the `git2::Repository` without
   creating a copy of it.

 * Removes the dependency on Git from the backend
2023-05-12 08:05:09 -07:00
Yuya Nishihara
f58beca760 revset: move resolve_symbol() to tests
It's no longer used in library code.
2023-05-12 21:31:29 +09:00
Martin von Zweigbergk
916b00c33e id_prefix: remove repo field from IdPrefixContext
By passing the repo as argument to the methods instead, we can remove
the `repo` field and the associated lifetime. Thanks to Yuya for the
suggestion.
2023-05-11 23:41:24 -07:00
Martin von Zweigbergk
6a4502cb5d prefixes: allow resolving shorter ids within a revset
In large repos, the unique prefixes can get somewhat long (~6 hex
digits seems typical in the Linux repo), which makes them less useful
for manually entering on the CLI. The user typically cares most about
a small set of commits, so it would be nice to give shorter unique ids
to those. That's what Mercurial enables with its
`experimental.revisions.disambiguatewithin` config. This commit
provides an implementation of that feature in `IdPrefixContext`.

In very large repos, it can also be slow to calculate the unique
prefixes, especially if it involves a request to a server. This
feature becomes much more important in such repos.
2023-05-11 23:41:24 -07:00
Martin von Zweigbergk
efd743339c revset: don't allow symbols in RevsetExpression::resolve()
When creating `RevsetExpression` programmatically, I think we should
use commit ids instead of symbols in the expression. This commit adds
a check for that by using a `SymbolResolver` that always errors
out.
2023-05-11 23:41:24 -07:00
Martin von Zweigbergk
99e9cd70d1 cli: make WorkspaceCommandHelper create SymbolResolver
I would eventually want the `SymbolResolver` to be customizable (in
custom `jj` binaries), so we want to make sure we always use the
customized version of it.

I left `RevsetExpression::resolve()` unchanged. I consider that to be
for programmatically created expressions.
2023-05-11 23:41:24 -07:00
Yuya Nishihara
92cfffd843 git: on external HEAD move, do not abandon old branch
The current behavior was introduced by 20eb9ecec1 "git: don't abandon
HEAD commit when it loses a branch." While the change made HEAD mutation
behavior more consistent with a plain ref operation, HEAD can also move on
checkout, and checkout shouldn't be considered a history rewriting operation.

I'm not saying the new behavior is always correct, but I think it's safer
than losing old HEAD branch. I also think this change will help if we want
to extract HEAD management function from git::import_refs().

Fixes #1042.
2023-05-11 10:15:31 +09:00
Yuya Nishihara
66d405fa5f git: add tests that simulate external checkout/amend in colocated repo
I'm going to change the behavior of _without_ref() case to mitigate #1042.
2023-05-11 10:15:31 +09:00
Benjamin Saunders
ccbb34fddb working_copy: introduce snapshot progress callback 2023-05-06 11:07:46 -07:00
Yuya Nishihara
837e8aa81a revset: add substitution rule for nested descendants/children
The substitution rule and tests are copied from ancestors/parents. The backend
logic will be reimplemented later. For now, it naively repeats children().
2023-04-24 20:45:13 +09:00
Martin von Zweigbergk
c60f14899a index: remove entry_by_id() from trait
It no longer needs to be on the `Index` trait, thereby removing the
last direct use of `IndexEntry` in the trait (it's still used
indirectly in `walk_revs()`).
2023-04-18 18:32:23 -07:00
Yuya Nishihara
5351371d51 revset: resolve visible heads prior to evaluation 2023-04-10 00:39:58 +09:00
Yuya Nishihara
0fcc13a6f4 revset: make resolve() return different type describing evaluation plan
New ResolvedExpression enum ensures that the evaluation engine doesn't have
to know the symbol resolution details. In this commit, I've moved Filter
and NotIn resolution to resolve_visibility(). Implicit All/VisibleHeads
resolution will be migrated later.

It's tempting to combine resolve_symbols() and resolve_visibility() to get
rid of panic!()s, but the resolution might have to be two passes to first
resolve&collect explicit commit ids, and then substitute "all()" with
"(:visible_heads())|commit_id|..". It's also possible to apply some tree
transformation after symbol resolution.
2023-04-10 00:39:58 +09:00
Yuya Nishihara
6c2525cb93 revset: add "resolve" method to RevsetExpression, always call it
I'll make the resolution stage mandatory, and have it return a "resolved"
type. RevsetExpression::evaluate() will be moved to the "resolved" type.
2023-04-10 00:39:58 +09:00
Yuya Nishihara
adfd52445b revset: reimplement children to not scan visible ancestors twice
It's slightly faster, and removes the use of RevsetExpression::descendants()
API.
2023-04-08 12:13:30 +09:00
Yuya Nishihara
85fb1f74c3 revset: for roots:heads, terminate ancestor lookup at min(roots) 2023-04-08 12:13:30 +09:00
Martin von Zweigbergk
24a512683b revset: add a revset function for finding commits with conflicts
This adds `conflict()` revset that selects commits with conflicts. We
may want to extend it later to consider only conflicts at certain
paths.
2023-04-06 16:46:21 -07:00
Martin von Zweigbergk
e1c57338a1 revset: split out no-args head() to visible_heads()
The `heads()` revset function with one argument is the counterpart to
`roots()`. Without arguments, it returns the visible heads in the
repo, i.e. `heads(all())`. The two use cases are quite different, and
I think it would be good to clarify that the no-arg form returns the
visible heads, so let's split that out to a new `visible_heads()`
function.
2023-04-03 23:46:34 -07:00
Yuya Nishihara
982062bd75 revset: do not always evaluate filter node to InternalRevset
This basically removes hidden 'all() &' from union/negation of filters. To
achieve that, I have two options: 1. add separate evaluation path (like the
one this commit introduced), or 2. wrap "all()" revset to override predicate
as Box::new(|_| true) function. I took the former since it's less ad-hoc.

We can add an explicit RevsetExpression node to branch between evaluate()
and evaluate_predicate(), but I don't think it would simplify the
implementation at this point. We might need such node if we want to resolve
"all()" at resolve_symbols(). It might be even better to extract a subset of
RevsetExpression enum, which only contains evaluatable nodes.

The cost of 'all() &' isn't significant for most filters. '~merges()' is
the exception. For jj repo,

    revsets/:v0.3.0 & (author(martinvonz) | committer(martinvonz))
    --------------------------------------------------------------
    base     1.06      11.2±0.04m
    new      1.00      10.5±0.05m

    revsets/~merges()
    -----------------
    base     1.69     750.0±8.47µ
    new      1.00     444.1±3.50µ
2023-04-04 15:21:21 +09:00
Yuya Nishihara
c28d2d7784 revset: split RevsetError into RevsetResolution/EvaluationError
This makes it clear that RevsetExpression::Present node is noop at the
evaluation stage.

RevsetEvaluationError::StoreError is unused right now, but I'm not sure if
it should be removed. It makes some sense that evaluate() can propagate
StoreError as it has access to the store.
2023-04-03 10:55:03 +09:00
Martin von Zweigbergk
3546cc1bf6 revset: pass in store, index, and heads instead of whole Repo
The `Repo` is a higher-level type that the index shouldn't have to
know about. With this change, a custom revset implementation should be
able evaluate the revset on a server without knowing which repo it
refers to.
2023-03-30 20:15:45 -07:00
Martin von Zweigbergk
3ff1ab520b revset: remove public_heads()
The `public_heads()` revset only contains the root commit in
practice. I'm not sure what we want to do about phases, but since we
don't have any real support for them yet, let's just remove this
revset. I didn't update the changelog because we don't seem to have
documented the revset function (and it seems unlikely that users who
found out about it found it useful enough to use it when they could
just use `root`).
2023-03-30 20:15:45 -07:00
Yuya Nishihara
0532301e03 revset: add latest(candidates, count) predicate
This serves the role of limit() in Mercurial. Since revsets in JJ is
(conceptually) an unordered set, a "limit" predicate should define its
ordering criteria. That's why the added predicate is named as "latest".

Closes #1110
2023-03-25 23:48:50 +09:00
Martin von Zweigbergk
baea314fc0 index: get generation number from specific impl in test 2023-03-24 10:09:40 -07:00
Martin von Zweigbergk
75605e36af revset: iterate over commit ids instead of index entries
There are no remaining places where we iterate over a revset and need
the `IndexEntry`s, so we can now make `Revset::iter()` yield
`CommitId`s instead.
2023-03-23 21:58:15 -07:00
Martin von Zweigbergk
b5ea79f32e revset: add new graph iterator function for tests
I'm about to make `Revset::iter()` yield just `CommitId`s, but the
tests in `test_default_revset_graph_iterator.rs` need an `IndexEntry`
iterator so they can pass it into `RevsetGraphIterator::new()`. This
commits prepares for the change by adding a
`RevsetImpl::iter_graph_impl()` that returns `RevsetGraphIterator`,
keeping `InternalRevset` still hidden within the revset engine. We
could instead have made that (and `ToPredicateFn`) visible to tests. I
can't say which is better.
2023-03-23 21:58:15 -07:00
Martin von Zweigbergk
c8f387d5b3 revset: pass IndexEntry iterator to graph iterator
The graph iterator is specific to the index implementation, and it
needs access to `IndexEntry`, which `Revset::iter()` will soon not
yield.
2023-03-23 21:58:15 -07:00
Martin von Zweigbergk
27a7fccefa revset: add a method returning a change id index
One of the remaining places we depend on index positions is when
creating a `ChangeIdIndex`. This moves that into the revset engine
(which is coupled to the commit index implementation) by adding a
`Revset::change_id_index()` method. We will also use this function
later when add support for resolving change id prefixes within a small
revset.

The current implementation simply creates an in-memory index using the
existing `IdIndex` we have in `repo.rs`.

The custom implementation at Google might do the same for small
revsets that are available on the client, but for revsets involving
many commits on the server, it might use a suboptimmal implementation
that uses longer-than-necessary prefixes for performance reasons. That
can be done by querying a server-side index including changes not in
the revset, and then verifying that the resulting commits are actually
in the revset.
2023-03-23 20:49:15 -07:00
Martin von Zweigbergk
d3cf543abc revset: move revset_for_commits() to test
The function is only used in tests, so it doesn't belong in
`default_revset_engine`. Also, it's not specific to that
implementation, so I rewrote as a revset evaluation.
2023-03-23 04:50:33 -07:00
Martin von Zweigbergk
5f74dd5db3 repo: implement Repo on ReadonlyRepo instead of its Arc
I'd like to be able to pass a `self` of `type `&ReadonlyRepo` to
functions that take a `&dyn Repo`. For that, we need `ReadonlyRepo`
itself to implement `Repo` instead of having `Arc<ReadonlyRepo>`
implement it. I could have solved it in a different way, but the `Arc`
requirement seems like an unnecessary constraint.
2023-03-21 21:43:44 -07:00
Martin von Zweigbergk
01d7239732 revset: make graph iterator yield commit ids (not index entries)
We only need `CommitId`s, and `IndexEntry` is specific to the default
index implementation.
2023-03-20 01:45:54 -07:00
Martin von Zweigbergk
2f876861ae graphlog: key by commit id (not index position)
The index position is specific to the default index implementation and
we don't want to use it in outside of there. This commit removes the
use of it as a key for nodes in the graphlog.

I timed it on the git.git repo using `jj log -r 'all()' -T commit_id`
(the worst case I can think of) and it slowed down from ~2.02 s to
~2.20 s (~9%).
2023-03-20 01:45:54 -07:00
Martin von Zweigbergk
91df7ec4c5 revset: rename graph iterator test to match implementation 2023-03-20 01:45:54 -07:00
Martin von Zweigbergk
f758b646a9 commit_builder: add accessors for most fields
I'd like to be able to access the current committer on a
`CommitBuilder`.
2023-03-19 00:48:05 -07:00
Martin von Zweigbergk
70d4a0f42e revset: remove context parameter from evaluate()
The `RevsetWorkspaceContext` argument is now instead used by the new
`resolve_symbol()` function.
2023-03-17 22:42:41 -07:00
Martin von Zweigbergk
d971148e4e revset: move resolve_symbol() back to revset module
The only caller is now in `revset.rs`.
2023-03-17 22:42:41 -07:00
Martin von Zweigbergk
94aec90bee revset: resolve symbols earlier, before passing to revset engine
For large repos, it's useful to be able to use shorter change id and
commit id prefixes by resolving the prefix in a limited subset of the
repo (typically the same subset that you'd want to see in your default
log output). For very large repos, like Google's internal one, the
shortest unique prefix evaluated within the whole repo is practically
useless because it's long enough that the user would want to copy and
paste it anyway.

Mercurial supports this with its `revisions.disambiguatewithin` config
(added in https://www.mercurial-scm.org/repo/hg/rev/503f936489dd). I'd
like to add the same feature to jj. Mercurial's implementation works
by attempting to resolve the prefix in the whole repo and then, if the
prefix was ambiguous, it resolves it in the configured subset
instead. The advantage of doing it that way is that there's no extra
cost of resolving the revset defining the subset if the prefix was not
ambiguous within the whole repo. However, there are two important
reasons to do it differently in jj:

* We support very large repos using custom backends, and it's probably
  cheaper to resolve a prefix within the subset because it can all be
  cached on the client. Resolving the prefix within the whole repo
  requires a roundtrip to the server.

* We want to be able to resolve change id prefixes, which is always
  done in *some* revset. That revset is currently `all()`, i.e. all
  visible commits. Even on local disk, it's probably cheaper to
  resolve a small revset first and then resolve the prefix within that
  than it is to build up the index of all visible change ids.

We could achieve the goal by letting each revset engine respect the
configured subset, but since the solution proposed above makes sense
also for local-disk repos, I think it's better to do it outside of the
revset engine, so all revset engines can share the code.

This commit prepares for the new functionality by moving the symbol
resolution out of `Index::evaluate_revset()`.
2023-03-17 22:42:41 -07:00
Martin von Zweigbergk
5afe5091a0 revset: add default_ prefix to graph iterator module
The current revset graph iterator is the default one, which the
default revset engine provides.
2023-03-14 05:32:02 -07:00
Martin von Zweigbergk
3871efd2f9 revset: move ReverseRevsetGraphIterator into revset module
The iterator is not specific to the implementation in
`revset_graph_iterator`, so it belongs in the standard `revset`
module.
2023-03-14 05:32:02 -07:00
Martin von Zweigbergk
f62fac24ac revset: move graph iteration onto Revset trait
We want to allow custom revset engines define their own graph
iterator. This commit helps with that by adding a
`Revset::iter_graph()` function that returns an abstract iterator.

The current `RevsetGraphIterator` can be configured to skip or include
transitive edges. It skips them by default and we don't expose option
in the CLI. I didn't bother including that functionality in the new
`iter_graph()` either. At least for now, it will be up to the
implementation whether it includes such edges (it would of course be
free to ignore the caller's request even if we added an option for it
in the API).
2023-03-14 05:32:02 -07:00
Martin von Zweigbergk
eed0b23009 revset: move current implementation to new module
We want to allow customization of the revset engine, so it can query
server indexes, for example. The current revset implementation will be
our default implementation for now. What's left in the `revset` module
after this commit is mostly parsing code.
2023-03-14 05:32:02 -07:00