When deciding the order to visit commits to rebase, we currently look
up parents in the index. I'm trying to remove the current `IndexEntry`
type and will probably have revsets iterators yield simply
`CommitId`. Let's therefore look up commit objects here.
I timed this by rewriting all commits in the jj repo. I couldn't
measure any difference. That makes sense since we cache the commits in
`Store` and we would read the commit when rebasing it anyway.
The function is only used in tests, so it doesn't belong in
`default_revset_engine`. Also, it's not specific to that
implementation, so I rewrote as a revset evaluation.
I'd like to be able to pass a `self` of `type `&ReadonlyRepo` to
functions that take a `&dyn Repo`. For that, we need `ReadonlyRepo`
itself to implement `Repo` instead of having `Arc<ReadonlyRepo>`
implement it. I could have solved it in a different way, but the `Arc`
requirement seems like an unnecessary constraint.
The functions resolving a change id to commits currently return a
`Vec<IndexEntry>`. We want to avoid depending on `IndexEntry` and we
only need the commit ids here.
The index position is specific to the default index implementation and
we don't want to use it in outside of there. This commit removes the
use of it as a key for nodes in the graphlog.
I timed it on the git.git repo using `jj log -r 'all()' -T commit_id`
(the worst case I can think of) and it slowed down from ~2.02 s to
~2.20 s (~9%).
Since we hid the graph iterator implementation behind
`Revset::iter_graph()`, I don't think we have any callers of
`Revset::iter()` require the iteration to be in index position order,
so let's not promise that. We do want to promise that the iteration is
in topological order with children before parents, however.
We need 1.64 to bump `clap` to `4.1`. We don't really need to upgrade
to that, but being on an older version causes minor confusions like
#1393. Rust 1.64 is very close to 6 months old at this point.
For large repos, it's useful to be able to use shorter change id and
commit id prefixes by resolving the prefix in a limited subset of the
repo (typically the same subset that you'd want to see in your default
log output). For very large repos, like Google's internal one, the
shortest unique prefix evaluated within the whole repo is practically
useless because it's long enough that the user would want to copy and
paste it anyway.
Mercurial supports this with its `revisions.disambiguatewithin` config
(added in https://www.mercurial-scm.org/repo/hg/rev/503f936489dd). I'd
like to add the same feature to jj. Mercurial's implementation works
by attempting to resolve the prefix in the whole repo and then, if the
prefix was ambiguous, it resolves it in the configured subset
instead. The advantage of doing it that way is that there's no extra
cost of resolving the revset defining the subset if the prefix was not
ambiguous within the whole repo. However, there are two important
reasons to do it differently in jj:
* We support very large repos using custom backends, and it's probably
cheaper to resolve a prefix within the subset because it can all be
cached on the client. Resolving the prefix within the whole repo
requires a roundtrip to the server.
* We want to be able to resolve change id prefixes, which is always
done in *some* revset. That revset is currently `all()`, i.e. all
visible commits. Even on local disk, it's probably cheaper to
resolve a small revset first and then resolve the prefix within that
than it is to build up the index of all visible change ids.
We could achieve the goal by letting each revset engine respect the
configured subset, but since the solution proposed above makes sense
also for local-disk repos, I think it's better to do it outside of the
revset engine, so all revset engines can share the code.
This commit prepares for the new functionality by moving the symbol
resolution out of `Index::evaluate_revset()`.
We want to allow custom revset engines define their own graph
iterator. This commit helps with that by adding a
`Revset::iter_graph()` function that returns an abstract iterator.
The current `RevsetGraphIterator` can be configured to skip or include
transitive edges. It skips them by default and we don't expose option
in the CLI. I didn't bother including that functionality in the new
`iter_graph()` either. At least for now, it will be up to the
implementation whether it includes such edges (it would of course be
free to ignore the caller's request even if we added an option for it
in the API).
This commit adds an `evaluate_revset()` function to the `Index`
trait. It will require some further cleanup, but it already achieves
the goal of letting the index implementation decide which revset
engine to use.
We want to allow customization of the revset engine, so it can query
server indexes, for example. The current revset implementation will be
our default implementation for now. What's left in the `revset` module
after this commit is mostly parsing code.
Now that there's a single implementation of `Revset`, I think it makes
more sense for `is_empty()` to be defined there. Maybe different
revset engines have different ways of implementing it. Even if they
don't, this is trivial to re-implement in each revset engine.
As the comment above `ToPredicateFn` says, it could be a private
type. This commit makes that happen by making the private `Revset`
implementations (`DifferenceRevset` etc.) instead implement an
internal revset type called `InternalRevset`. That type is what
extends `ToPredicateFn`, so the public type doesn't have to. The new
type will not need to implement the new functions I'm about to add to
the `Revset` trait.
We don't want the public `Revset` interface to know about
`ToPredicateFn`. In order to hide it, I'm wrapping the internal type
in another type, so only the internal type can keep implementing
`ToPredicateFn`.
I'd like to be able to change the return type of `evaluate_revset()`
to be an internal type. Since all external callers currently call the
function via `RevsetExpression::evaluate()`, it turns out it's easy to
make it private. To benefit from an internal type, we also need to
make the recursive calls be directly to the internal function.
We don't want custom index implementations to have to conform to the
same kind of stats as the default implementation. This commit also
makes the command error out on non-default index types.
I broke the commands in a27da7d8d5 and thought I just fixed it in
c7cf914694a8. However, as I added a test, I realized that I made it
only reindex the commits since the previous operation. I meant for the
command to do a full reindexing of th repo. This fixes that.
I broke `jj debug reindex` in a27da7d8d5. From that commit, we no
longer delete the pointer to the old index, so nothing happens when we
reload the index. This commit fixes that, and also makes the command
error out if run on a repo with a non-default index type.
This is yet another step towards making the index pluggable. The
`IndexStore` trait seems reasonable after this commit. There's still a
lot of work to remove `IndexPosition` from the `Index` trait.
I didn't make `ReadonlyIndex` extend `Index` because it needed an
`as_index()` to convert to `&dyn Index` trait object
anyway. Separating the types also gives us flexibility to implement
the two traits on different types.
Not all index implementations may want to store the readonly index
implementation in an Arc. Exposing the Arc in the interface is also
problematic because `Arc<IndexImpl>` cannot be cast to `Arc<dyn
Index>`.
These two files are closely related, and `Index` and `IndexStore` are
expected to be customized together, so it seems better to keep them in
a single file.
This is another step towards allowing a custom `jj` binary to have its
own index type. We're going to have a server-backed index
implementation at Google, for example.
This is a step towards making the index storage pluggable. The
interface will probably change a bit soon, but let's start with
functions that match the current implementation.
I called the current implementation the `DefaultIndexStore`. Calling
it `SimpleIndexStore` (like `SimpleOpStore` and `SimpleOpHeadsStore`)
didn't seem accurate.
In `git_fetch()`, any glob present in `globs` is an "allow" mark. Using
`&[]` to represent an "allow-all" may be misleading, as it could
indicate that no branch (only the git HEAD) should be fetched.
By using an `Option<&[&str]>`, it is clearer that `None` means that
all branches are fetched.
Using &[String] forces the caller to materalize owned strings if they
have only references, which is costly. Using &[&str] makes it cheap
if the caller owns strings as well.
To be able to make e.g. `jj log some/path` perform well on cloud-based
repos, a custom revset engine needs to be able to see the paths to
filter by. That way it is able pass those to a server-side index. This
commit helps with that by effectively converting `jj log -r foo
some/path` into `jj log -r 'foo & file(some/path)'`.
It makes the APIs much simpler if we don't have to pass in information
about the initial operation when we create the `OpHeadsStore`. It also
makes the alternative `OpHeadsStore` implementations simpler since we
move some logic into a shared location (`ReadonlyRepo::init()`).
This effectively undoes ec07104126. Maybe some further refactoring
made it possible to move it back as I'm doing in this commit?
By taking an `OperationId` argument to `IndexStore::write_index()`, we
can remove `associate_file_with_operation()` from the trait. That
simplifies the interace a little bit. The reason I noticed this was
that I'm trying to extract a trait for `IndexStore`, and the word
"file" in it is too specific for e.g. a cloud-based implementation.
I plan to make `RepoLoader::init()` return a `Result`, which means
that `WorkspaceLoader::load()` will need to return more kinds of
errors. Making it return `WorkspaceLoadError` is a good start. By also
extracting a function for converting `WorkspaceLoadError` to
`CommandError`, we can reuse a the handling of `PathError` in
`cli_util`.
I'm about to make `RepoLoader::init()` return a `Result`, and I don't
want to have to wrap that in a new error in
`ReadonlyRepo::load_at_head()` since that's only used in tests.
This should fix#1304. I think the added test simulates the behavior of
multiple rebase conflicts, but I don't have expertise around this.
add_index could be replaced with a peekable iterator, but the iterator version
wouldn't be as readable as the current implementation.
The outermost "op-log" label isn't moved to the default template. I think
it belongs to the command's formatter rather than the template.
Old bikeshedding items:
- "current_head", "is_head", or "is_head_op"
=> renamed to "current_operation"
- "templates.op-log" vs "templates.op_log" (the whole template is labeled
as "op-log")
=> renamed to "op_log"
- "template-aliases.'format_operation_duration(time_range)'"
=> renamed to 'format_time_range(time_range)'
The type doesn't seem to provide any benefit. I don't think I had a
good reason for creating it in the first place; it was probably just
unfamiliarity with Rust.
This is another step towards removing `RevsetIterator`. These types
are private, so someone using the library can't accidentally create a
`UnionRevsetIterator` with inputs in different order, for example.
I was thinking of replacing `RevsetIterator` by a regular
`Iterator<Item=IndexEntry>`. However, that would make it easier to
pass in an iterator that produces revisions in a non-topological order
into `RevsetGraphIterator`, which would produce unexpected results (it
would result in nodes that are not connected to their parents, if
their parents had already been emitted). I think it makes sense to
instead pass in a revset into `RevsetGraphIterator`.
Incidentally, it will also be useful to have the full revset available
in `RevsetGraphIterator` if we rewrite the algorithm to be more
similar to Mercurial's and Sapling's algorithm, which involves asking
the revset if it contains parent revisions.
We write conflict to the working copy by materializing them as
conflict markers in a file. When the file has been modified (or just
the mtime has changed), we parse the markers to reconstruct the
conflict. For example, let's say we see this conflict marker:
```
<<<<<<<
+++++++
b
%%%%%%%
-a
+c
>>>>>>>
```
Then we will create a hunk with ["a"] as removed and ["b", "c"] as
added.
Now, since commit b84be06c08, when we materialize conflicts, we
minimize the diff part of the marker (the `%%%%%%%` part). The problem
is that that minimization may result in a different order of the
positive conflict terms. That's particularly bad because we do the
minimization per hunk, so we can end up reconstructing an input that
never existed.
This commit fixes the bug by only considering the next add and the one
after that, and emitting either only the first with `%%%%%%%`, or both
of them, with the first one in `++++++++` and the second one in
`%%%%%%%`.
Note that the recent fix to add context to modify/delete conflicts
means that when we parse modified such conflicts, we'll always
consider them resolved, since the expected adds/removes we pass will
not match what's actually in the file. That doesn't seem so bad, and
it's not obvious what the fix should be, so I'll leave that for later.
The function only needs the `TreeValue` so it makes more sense this
way, I think. That will also let the caller keep the rest of the
`Conflict` value owned (though there is nothing but the `value` field
in it right now).
It took a while before I realized that conflicts could be modeled as
simple algebraic expressions with positive and negative terms (they
were modeled as recursive 3-way conflicts initially). We've been
thinking of them that way for a while now, so let's make the
`ConflictPart` name match that model.
When we materialize modify/delete conflicts, we currently don't
include any context lines. That's because modify/delete conflicts have
only two sides, so there's no common base to compare to. Hunks that
are unchanged on the "modify" side are therefore not considered
conflicting, and since they they don't contribute new changes, they're
simply skipped (here:
3dfedf5814/lib/src/files.rs (L228-L230)).
It seems more useful to instead pretend that the missing side is an
empty file. That way we'll get a conflict in the entire file.
We can still decide later to make e.g. `jj resolve` prompt the user on
modify/delete conflicts just like `hg resolve` does (or maybe it
actually happens earlier there, I don't remember).
Closes#1244.
If I understand correctly, the 'revset lifetimes on `Box<dyn
Revset<'index> + 'revset>` are not constrained by the lifetime of a
revset; we don't have any revsets that borrow data from other
revsets. Instead, they're all about constraining a boxed revset to the
index's lifetime. Without the lifetime annotation, it would default to
'static, and the borrow-checker doesn't like `dyn Revset<'index> +
'static`, since the revset could then live longer than the index it
borrows.
This is just a little preparation for extracting a `Repo` trait that's
implemented by both `ReadonlyRepo` and `MutableRepo`. The `index()`
function in that trait will of course have to return the same type in
both implementations, and that type will be `&dyn Index`.
Even though we don't know the details yet, we know that we want to
make the index pluggable like the commit and opstore
backends. Defining a trait for it should be a good step. We can refine
the trait later.
By separating the value spaces change ids and commit ids, we can
simplify lookup of a prefix. For example, if we know that a prefix is
for a change id, we don't have to try to find matching commit ids. I
think it might also help new users more quickly understand that change
ids are not commit ids.
This commit is a step towards that separation. It allows resolving
change ids by using hex digits from the back of the alphabet instead
of 0-f, so 'z'='0', 'y'='1', etc, and 'k'='f'. Thanks to @ilyagr for
the idea. The regular hex digits are still allowed.
Supported values are,
- `none` for no author information,
- `full` for both the name and email,
- `name` for just the name,
- `username` for username part of the email,
- (default) `email` (or any other gibberish for that matter) for the full email.
There's a subtle difference between
- 'expression = { whitespace* ... whitespace* }', and
- '_{ whitespace* ~ expression ~ whitespace* }'.
The former includes surrounding whitespace in an "expression", the latter
doesn't. This affects the span of error indication.
The added expect_arguments() is basically a copy from the template_parser.
I'll reimplement it to support keyword arguments, so I don't care much about
the current implementation.
I leave expect_no/one_argument() as wrappers because parsing 0/1 arguments
is pretty common.
Error messages are slightly changed. I personally prefer not to add extra
code for singular/plural handling, but if we do, I'll add 'if N == 1' case.
Our internal backend at Google uses a 32-byte change id, so I'd like
to make the backend able to decide the length. To start with, let's
make the backend able to decide what the root change id should
be. That's consistent with how we already let the backend decide what
the root commit id should be.
The function is currently only about the length of commit IDs, so
let's clarify that. I'm going to add another function for the length
of change IDs next. I don't know if we're going to care about lengths
of other hashes in the future. We might even be able to remove the
current restriction that all commit IDs and all change IDs have the
same length.
I think the CLI currently checks that the backend is not told to write
a merge commit with the root as one parent, but we should not panic if
those checks fail.
Git's HEAD ref is similar to other refs and can logically have
conflicts just like the other refs in `git_refs`. As with the other
refs, it can happen if you run concurrent commands importing two
different updates from Git. So let's treat `git_head` the same as
`git_refs` by making it an `Option<RefTarget>`.
Add a new git.auto-local-branch config option. When set to false, a
remote-tracking branch imported from Git will not automatically create a
local branch target. This is implemented by a new GitSettings struct
that passes Git-related settings from UserSettings.
This behavior is particularly useful in a co-located jj and Git repo,
because a Git remote might have branches that are not of everyday
interest to the user, so it does not make sense to export them as local
branches in Git. E.g. https://github.com/gitster/git, the maintainer's
fork of Git, has 379 branches, most of which are topic branches kept
around for historical reasons, and Git developers wouldn't be expected
to have local branches for each remote-tracking branch.
I don't think there's a good reason not to write the
`.jj/working_copy/tree_state` file on init. Being able to assume that
the file exists means that we won't need the store object to to lazily
load the `TreeState` object. Well, except that `TreeState` keeps an
`Arc<Store>`, but I'm trying to change that.
When building an initial index from an existing Git repo, for example,
we walk parents and predecessors to find all commits to index. Part of
that code was looking up the whole parent and predecessor commits even
though it only needed the ids. I don't know if this has a measurable
impact on performance, but it's not really any more complex to just
get the ids anyway.
I would expect `Commit::is_empty()` to check if the commit is empty in
our usual sense, i.e. that there are no changes compared to the
auto-merged parents. However, it would return `false` for any merge
commit (and for the root commit). Since we only use it in one place,
let's inline it there. The use there does seem reasonable, because
it's about abandoning an "uninteresting" working-copy commit.
I think of it more as style than a format, so using `style` in the
config key makes sense to me.
I didn't bother making upgrades easy by supporting the old name since
this was just released and only a few developers probably have it set.
revset::resolve_change_id() for ReadonlyRepo will be replaced with this
implementation. This doesn't mean revset query will speed up. A trivial
query will become slower due to the initialization cost of the change id
index. "jj log -r hex" will get faster since we have to pay the cost anyway.
Benchmark numbers (against my "linux" repo):
Command:
hyperfine --warmup 3 --runs 20 \
"jj log -r $hex -T '' --no-commit-working-copy --no-graph"
Linear search (e874570947):
Time (mean ± σ): 223.9 ms ± 16.2 ms [User: 181.2 ms, System: 42.7 ms]
Range (min … max): 207.7 ms … 247.6 ms 50 runs
Building IdIndex:
Time (mean ± σ): 855.0 ms ± 21.7 ms [User: 788.4 ms, System: 66.6 ms]
Range (min … max): 822.6 ms … 927.5 ms 50 runs
Building IdIndex, but hacked to store SmallVec<[u8; 20]>:
Time (mean ± σ): 406.1 ms ± 15.9 ms [User: 354.1 ms, System: 52.0 ms]
Range (min … max): 382.2 ms … 428.6 ms 50 runs
For my "jj" work repo, changes are < ~1ms.
I've preferred "working-copy commit" over "checkout" for a while
because I think it's clearer, but there were lots of places still
using "checkout". I've left "checkout" in places where it refers to
the action of updating the working copy or the working-copy commit.
`SimpleOpHeadsStore` currently stores its files in
`.jj/repo/op_heads/simple_op_heads/`. The `.jj/repo/op_heads/type`
file indicates the type of op-heads backend. If that contains
"simple_op_head_store", we use the `SimpleOpHeadsStore`
backend. There's no need for the `simple_op_heads` directory to also
indicate the type of backend in its name. I kept just the `heads` in
the name to make it less redundant with the parent directory (which is
`op_heads)`. We could alternatively call the directory `values` or
similar.
Since this function depends on both index and view, it can't be moved to
one of the storage objects. If we go forward with this approach, some
revset::resolve_*() functions will also be migrated to RepoRef.
This patch slightly changes the function name since a "prefix" might have
various meanings.
This should fix the panic in the case reported in #1107. It's a bit
hard to reproduce because we normally notice the missing commit when
we snapshot the working copy, but it's possible to reproduce it using
`--no-commit-working-copy`.
I suspect the added test is too brittle because it checks the exact
error message. On the other hand, it might be useful to have one test
case like this so we catch accidental changes in the format.
Since IdIndex is immutable, we don't need fast insertion provided by BTreeMap.
Let's simply use Vec for some speed up. More importantly, this allows us to
store multiple (ChangeId, CommitId) pairs for the same change id, and will
unblock the use of IdIndex in revset::resolve_symbol().
Some benchmark numbers (against my "linux" repo) follow.
Command:
hyperfine --warmup 3 "jj log -r master \
-T 'commit_id.short_prefix_and_brackets()' \
--no-commit-working-copy --no-graph"
Original:
Time (mean ± σ): 1.892 s ± 0.031 s [User: 1.800 s, System: 0.092 s]
Range (min … max): 1.833 s … 1.935 s 10 runs
This commit:
Time (mean ± σ): 867.5 ms ± 2.7 ms [User: 809.9 ms, System: 57.7 ms]
Range (min … max): 862.3 ms … 871.0 ms 10 runs
With my "jj" work repo, this saves ~4ms to show the log with default revset.
Command:
JJ_CONFIG=/dev/null hyperfine --warmup 3 --runs 100 \
"jj log -T 'commit_id.short_prefix_and_brackets() \
change_id.short_prefix_and_brackets()' \
--no-commit-working-copy"
Baseline (a7541e1ba4):
Time (mean ± σ): 54.1 ms ± 16.4 ms [User: 46.4 ms, System: 7.8 ms]
Range (min … max): 36.5 ms … 78.1 ms 100 runs
This commit:
Time (mean ± σ): 49.5 ms ± 16.4 ms [User: 42.4 ms, System: 7.2 ms]
Range (min … max): 31.4 ms … 70.9 ms 100 runs
This iterator will be used to merge neighbor commit ids across segments.
resolve_prefix() is simplified to non-short-circuiting loop. I think that's
fine because visiting parents is cheap, and the costly operation here is
segment_resolve_prefix().
entry_by_pos() could also be migrated to iterator, but I leave the unsafe
bits there.
ReadonlyIndex implementation leverages the existing binary search
function. MutableIndex one is basically the same as repo::IdIndex.
Shortest prefix length could be calculated for each segment, but I think
returning neighbors is better for testing.
This is ugly, but we need a special case because root_change_id and
root_commit_id aren't equal but share the same prefix bytes. In practice,
no one would care for the shortest root id prefix, but we'll need to deal
with a similar problem when migrating prefix id resolution to repo layer.
This helps us to migrate commit_id index to ReadonlyIndex. For large
repositories, this also reduces initialization cost, but that's not the main
intent of this change.
https://github.com/martinvonz/jj/pull/1041#issuecomment-1399225876
common_hex_len() and iter_half_bytes() are added to backend.rs since more
call sites will be added to index.rs, and I feel index.rs isn't a good place
to host this kind of utility functions.
I made it a free function. Alternatively, the root id could be instantiated
by and obtained through backend, but I don't think we'll need such level of
abstraction.
I'm going to add a workaround for shortest prefix calculation of the root ids,
where this function will be used.
Make op resolution a closed operation, powered by a callback provided by the
caller which runs under an internal lock scope. This allows for greatly
simplifying the internal lifetime structuring.
If commit_id[..prefix_len] < prefix, commit_id < prefix is obviously true.
If commit_id[..prefix_len] == prefix, commit_id < prefix returns false. So
slicing isn't needed.
This makes commit_id_byte_prefix_to_pos() basically the same as
segment_commit_id_to_pos(), and these two functions can be merged.
matches() is called from resolve_change_id() loop right now, so it's better to
not allocate String there. Regarding new IdIndex integration, I'll probably make
IdIndex store raw byte ids instead of hexes, and use HexPrefix to look up
range and test prefixes. I think this is basically the same as prefix lookup
in MutableIndex, but I have no idea if we can factor out a common interface.
I made HexPrefix store (Vec<u8>, bool) instead of (Vec<u8>, Option<u8>) so
both min/partial prefixes can be borrowed as slice.
By inlining `wite_commit_internal()` into `write_commit()`, we can
avoid redoing some steps when we retry. This includes taking the mutex
lock, and reading the tree object and parent commits. It also means
that we avoid cloning the input commit object, which we otherwise
would even in the non-retrying case. I haven't measured if any of this
makes a significant difference, but I think it also slightly
simplifies the code, so it doesn't have to.
This is fast enough to be used on medium-sized repositories such as git/git.
It is a bit slow, but bearable, on huge repositories such as torvalds/linux.
There is 0 performance penalty if the display of unique prefixes is disabled
A trie-based implementation will be submitted for consideration in a
follow-up PR. It is faster, but more complicated.
**Update:** I also just discovered https://sapling-scm.com/docs/internals/indexedlog/
There are three important aspects of performance that seemed relevant:
1. Speed of computing the shortest unique prefix per id. It is worlds faster
than the naive implementation before this commit. It can be optimized
furher by using a trie or maybe the `fst` crate.
2. Speed of inital loading of the index that happens before the first commit is
shown. This is the part that's noticeable but bearable on torvalds/linux.
This could be optimized by storing a sorted list of commit and change ids on
disk. This would likely involve reworking the `Index`.
Failing that, the speed of inital loading doesn't change if a trie is used
and would likely be worse with the `fst` crate
3. Memory use is unremarkable here. I don't have good tools to measure it
precisely, but it does not balloon to gigabytes even on the linux repo.
This creates a templater function `short_underscore_prefix` for commit and
change ids. It is similar to `short` function, but shows one fewer hexadecimal
digit and inserts an underscore after the shortest unique prefix.
Highlighting with an underline and perhaps color/bold will be in a follow-up
PR.
The implementation is quadratic, a simple comparison of each id with every
other id. It is replaced in a subsequent commit. The problem with it is that,
while it works fine for a `jj`-sized repo, it becomes is painfully slow with a
repo the size of git/git.
Still, this naive implemenation is included here since it's simple, and could
be used as a reference implementation.
The `shortest_unique_prefix_length` function goes into `repo.rs` since that's
convenient for follow-up commits in this PR to have nicer diffs.
Fixes https://github.com/martinvonz/jj/issues/1050
Thanks to Martin for suggesting the exact fix.
The tests go into the new tests/test_duplicate_command.rs, which will be
expanded shortly with other tests depending on this bugfix.
You may use "abc\\" in .gitignore to ignore a file named "abc\". In this
case, removing training spaces on "abc\\ " must result in "abc\\" as the
trailing space is not escaped, the preceeding backslash being part of
the previous "\\" escaping sequence.
- branches has the signature branches([needle]), meaning the needle is optional (branches() is equivalent to branches("")) and it matches all branches whose name contains needle as a substring
- remote_branches has the signature remote_branches([branch_needle[, remote_needle]]), meaning it can be called with no arguments, or one argument (in which case, it's similar to branches), or two arguments where the first argument matches branch names and the second argument matches remote names (similar to branches, remote_branches(), remote_branches("") and remote_branches("", "") are all equivalent)
Dereferencing `self` as `*self` in order to perform patten-matching
using `ref` is unnecessary and will be done automatically by the
compiler (match ergonomics, introduced in Rust 1.26).
We don't care the ref content as long as it is unique, so using threaded
RNG should be fine.
This change means refs/jj/keep will now contain refs of the following
forms:
- new create_no_gc_ref(): 0f8d6cd9721823906cfb55dac99d7bf5
- old create_no_gc_ref(): 0f6d93fe-0507-4db8-ad0a-6317f02e27b9
- prevent_gc(commit_id): 0f9c15100b6f1373f38186357e274a829fb6c4e2
I don't think Workspace::load() should be permissive in that regard.
WorkspaceLoader could provide such function, but I feel it's more like
CLI business. CLI can also look for parent '.git' directory to suggest
'jj init --git-repo=..' if needed.
Since per-repo config may contain CLI settings, it must be visible to CLI.
Therefore, UserSettings::with_repo() -> RepoSettings isn't used, and its
implementation is nullified by this commit.
#616
It's unclear whether parse_args() or its caller should update LayeredConfigs.
--config-toml is processed by callee to apply --color early. -R/--repository
will be processed by caller since it will instantiate WorkspaceLoader.
Maybe --config-toml can be removed from EarlyArgs, and handle_early_args()
just updates ui state based on --color argument?
This will be needed to test functionality for showing shortest
unique prefix for commit and change ids. As a bonus, this also
allows us to test log output with change ids.
As another bonus, this will prevent occasional CI failures like
https://github.com/martinvonz/jj/actions/runs/3817554687/jobs/6493881468.
I needed this in the course of debugging an error. Before this commit, the error looked like this:
```
Error: Unexpected error from backend: Object not found
```
After this commit, it looks like this:
```
Error: Unexpected error from backend: Object with CommitId 8f59646bc9bb6bb44b5624f1248f4a708f37003c not found: object not found - no match for id (8f59646bc9bb6bb44b5624f1248f4a708f37003c); class=Odb (9); code=NotFound (-3)
```
Strictly speaking, we could rely on e.g. `git2::Oid::from_str` to produce an error, but I figure that having an explicit error for a mismatching hash length might demystify some error condition in the future, since commit IDs and change IDs and potentially other backends' IDs may have different lengths, so this could flag a mismatch earlier/more obviously.
We forgot to actually call `StoreFactories::load_op_heads_store()` to
load the right type of `OpHeadsStore` depending on the contents of
`.jj/repo/op_heads/type`. That shouldn't have any effect yet since we
only have one type so far, and there are no out-of-tree types yet
either (clearly, since they would not work).
A file entry is represented as a Dirs of is_file flag set. This might seem
odd at this point, but allows us to remove special case from PrefixMatcher.
PrefixMatcher::new(&[RepoPath::root()]) will set is_file to the root entry.
When you're done with the `CommitBuilder`, you're going to have to
call `write_to_repo()`, passing it a mutable `MutableRepo`
reference. It's a bit simpler to pass that reference when we create
the `CommitBuilder` instead, so that's what this patch does.
A drawback of passing in the mutable reference when we create the
builder is that we can't have multiple unfinished `CommitBuilder`
instance live at the same time. We don't have any such use cases yet,
and it's not hard to work around them, so I think this change is worth
it.
When we fail to read the user's config, it seems obviously better to
use the default config than to not use it. It doesn't matter yet, but
it will matter when I've moved color configs out of `formatter.rs` and
into a `.toml` file. Without this change, we'd lose the default
coloring of the error message for config errors.
It's unlikely we'll need to customize these impls per type, so let's ensure
that these newtypes have identical implementations. This commit also adds
from_hex() to FileId, SymlinkId, and ConflictId.
Suggested by @arxanas.
Actually, it's easier to support these infix ops than erroring out, but I
don't want to make revset syntax more cryptic. "x- y" can't be handled by
this rule because "x-" is parsed as a parents expression.
The next commit will introduce a newtype for -m/--message argument which
can be converted Into<String>.
Since CommitBuilder is a thin wrapper, code bloat caused by generic parameters
wouldn't matter. I have another set of commits that makes all builder methods
accept Into/IntoIterator, which will remove some of .clone() calls from tests.
Performance on repositories with many commits is limited somewhat by repeatedly
stating the tablestore directory to work out what the head is. By caching the
table rather than looking it up from disk on every request, we can much more
rapidly satisfy requests.
This avoids the pathological case in #845 where jj operations take several
minutes to complete.
This patch doesn't change the normal flow of the write path: that will still
always call get_head() on the underlying TableStore, which will stat the
directory before writing out changes. It will however empty the cache when the
metadata has been written.
Fixes#845.
The implementation has some hoops to jump through because Rust does not allow
`self: &Arc<Self>` on trait methods, and two of the OpHeadsStore functions need
to return cloned selves. This is worked around by making the implementation type
itself a wrapper around Arc<>.
This is not particularly note worthy for the current implementation type where
the only data copied is a PathBuf, but for extensions it is likely to be more
critical that the lifetime management of the OpHeadsStore is properly
maintained.
I ran an upgraded Clippy on the codebase. All the changes seem to be
about using variables directly in format strings instead of passing
them as separate arguments.
While working on ancestor generation, I noticed Mercurial has this
substitution rule. Since it's easier to deal with Ancestors() than Range {},
'roots..heads' is first decomposed to ':heads & ~:roots'.
I failed to solve type puzzle for to_predicate_fn<'a>(&'a self) where
'repo: 'a, so struct RevWalkRevset<'repo, T> is bounded by T to consume
the lifetime parameter.
This will be a building block of 'parents(base)' revset. 'base---' will
be .filter_by_generation(3..4) for example. I think 'ancestors(base)' can
also have an optional generation parameter, but I haven't considered any
particular syntax yet.
Even though I couldn't determine if RevWalkGenerationRange has a measurable
cost compared to RevWalk, I'm not comfortable with enabling generation
tracking by default. So this patch adds a separate struct. I duplicated
Iterator::next() method as it seemed rather complicated to extract a common
iterator wrapper.
Actual filtering function and tests will be added by the next commit.
We're more likely to filter out empty commits, so this should be slightly
faster in practice.
The extra Option<> isn't needed, but it should clarify that "prefix([])"
is not "everything".
This basically transforms 's1 & (f() | s2)' to
's1.iter().filter(all && f || s2)'. Still the predicate part includes "all",
the filter function doesn't need to load commit data for every entry since
's1.iter().filter(all)' is tested first. To optimize "all" predicate out,
maybe we can add a wrapper that returns '|_: &IndexEntry| true'.
Instead of inserting AsFilter(_) node, I could add a recursive is_filter()
function. That would also work so long as the height of RevsetExpression tree
is limited. I chose node insertion just for ease of snapshot testing.