Although watchman client appears to fail at decoding non-UTF-8 path (somewhere
in serde), jj shouldn't panic if watchman could deal with that.
The outer error message "path not in the repo" would sounds odd, but I think
that's okay because 1. it's unlikely that a user input is not UTF-8, and 2.
it's technically correct that a non-UTF-8 path is not contained in the repo.
This should address both use cases:
1. If from_relative_path() is directly called, the error says ".." shouldn't
be included in the (normalized) relative path.
2. If parse_fs_path() is used, the error message contains paths relative to
cwd. #3216
I'll replace the current lazy loading mechanism with this. Read-only methods
are implemented on the borrowed type so that we can narrow lookup scope
recursively.
self.contains(other) means that the self tree contains the other tree (i.e.
the self path is prefix of the other), but it could be confused the other way
around if we were thinking about the path literal, not the tree. Let's add
.starts_with() instead by copying the std::path::Path definition.
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().
I'm going to add borrowed RepoPath type, and most from_internal_string()
callers will be migrated to it. For the remaining callers, it makes more
sense to move the ownership of String to RepoPathBuf.
RepoPath::from_components() is removed since it is no longer a primitive
function.
The components iterator could be implemented on top of str::split(), but
it's not as we'll probably want to add components.as_path() -> &RepoPath.
Tree walking and tree_states map construction get slightly faster thanks to
fewer allocations and/or better cache locality. If we add a borrowed RepoPath
type, we can also implement a cheap &str to &RepoPath conversion on top. Then,
we can get rid of BTreeMap<RepoPath, FileState> construction at all.
Snapshot without watchman:
```
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux status"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux status
Time (mean ± σ): 950.1 ms ± 24.9 ms [User: 1642.4 ms, System: 681.1 ms]
Range (min … max): 913.8 ms … 990.9 ms 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux status
Time (mean ± σ): 872.1 ms ± 14.5 ms [User: 1922.3 ms, System: 625.8 ms]
Range (min … max): 853.2 ms … 895.9 ms 10 runs
Relative speed comparison
1.09 ± 0.03 target/release-with-debug/jj-0 -R ~/mirrors/linux status
1.00 target/release-with-debug/jj-1 -R ~/mirrors/linux status
```
Tree walk:
```
% hyperfine --sort command --warmup 3 --runs 10 -L bin jj-0,jj-1 \
"target/release-with-debug/{bin} -R ~/mirrors/linux files --ignore-working-copy"
Benchmark 1: target/release-with-debug/jj-0 -R ~/mirrors/linux files --ignore-working-copy
Time (mean ± σ): 375.3 ms ± 15.4 ms [User: 223.3 ms, System: 151.8 ms]
Range (min … max): 359.4 ms … 394.1 ms 10 runs
Benchmark 2: target/release-with-debug/jj-1 -R ~/mirrors/linux files --ignore-working-copy
Time (mean ± σ): 357.1 ms ± 16.2 ms [User: 214.7 ms, System: 142.6 ms]
Range (min … max): 341.6 ms … 378.9 ms 10 runs
Relative speed comparison
1.05 ± 0.06 target/release-with-debug/jj-0 -R ~/mirrors/linux files --ignore-working-copy
1.00 target/release-with-debug/jj-1 -R ~/mirrors/linux files --ignore-working-copy
```
The added test would fail if paths were purely ordered by concatenated strings.
I'm not sure if we want to preserve the current ordering, but let's not break
it for the moment.
This is a step towards introducing a borrowed RepoPath type. The current
RepoPath type is inefficient as each component String is usually short. We
could apply short-string optimization, but still each inlined component would
consume 24 bytes just for e.g. "src", and increase the chance of random memory
access. If the owned RepoPath type is backed by String, we can implement cheap
cast from &str to borrowed &RepoPath type.
Leading/trailing slashes would introduce a bit of complexity if we migrate
the backing type from Vec<String> to String. Empty components are okay, but
let's reject them as they are cryptic and invalid.
I've noticed `WorkspaceCommandHelper::format_file_path()` appear in
profiles a few times. A big part of that is spent in
`RepoPath::to_fs_path()`. I think I had been thinking that
`PathBuf::join()` takes `self` by value and mutates it, but it turns
out it creates a new instance. So our `result = result.join(...)` in a
loop was copying the `PathBuf` over and over. This fixes that and also
reserves the expected size. That speeds up `jj files
--ignore-working-copy -r v6.0` in the Linux repo from 546.0 s to 509.3
s (6.7%).
While playing with perf.data captured with "jj log", I noticed RepoPath::join()
has measurable cost. The first half is small alloc()s for Vec and Strings, and
the latter is realloc() on Vec::push(). Removing realloc() is easy, so let's
do that.
Let's acknowledge everyone's contributions by replacing "Google LLC"
in the copyright header by "The Jujutsu Authors". If I understand
correctly, it won't have any legal effect, but maybe it still helps
reduce concerns from contributors (though I haven't heard any
concerns).
Google employees can read about Google's policy at
go/releasing/contributions#copyright.
The `testutils` module should ideally not be part of the library
dependencies. Since they're used by the integration tests (and the CLI
tests), we need to move them to a separate crate to achieve that.
It seems a bit invasive that RepoPath constructor processes an environment
like cwd, but we need an unmodified input string to build a readable error.
The error could be rewrapped at cli boundary, but I don't think it would
worth inserting indirection just for that.
I made s/file_path/fs_path/ change because there's already to_fs_path()
function, and "file path" in RepoPath context may be ambiguous.
It's useful to be able to match path prefixes for many commands,
e.g. to allow `jj restore src` to restore all files in under `src/`
(or a file called `src`). I also plan to use it for sparse checkouts.
We'll need to be able to match path prefixes
I had initially hoped that the type-safety provided by the separate
`FileRepoPath` and `DirRepoPath` types would help prevent bugs. I'm
not sure if it has prevented any bugs so far. It has turned out that
there are more cases than I had hoped where it's unknown whether a
path is for a directory or a file. One such example is for the path of
a conflict. Since it can be conflict between a directory and a file,
it doesn't make sense to use either. Instead we end up with quite a
bit of conversion between the types. I feel like they are not worth
the extra complexity. This patch therefore starts simplifying it by
replacing uses of `FileRepoPath` by `RepoPath`. `DirRepoPath` is a
little more complicated because its string form ends with a '/'. I'll
address that in separate patches.