mirror of
https://github.com/martinvonz/jj.git
synced 2024-12-25 21:47:42 +00:00
sparse-v2: design doc proposition for Sparse Patterns refactoring
Kicks off work on issue #1896
This commit is contained in:
parent
28dd0180ad
commit
01825953ac
2 changed files with 309 additions and 0 deletions
308
docs/design/sparse-v2.md
Normal file
308
docs/design/sparse-v2.md
Normal file
|
@ -0,0 +1,308 @@
|
|||
# Sparse Patterns v2 redesign
|
||||
|
||||
Authors: [Daniel Ploch](mailto:dploch@google.com)
|
||||
|
||||
**Summary:** This Document documents a redesign of the sparse command and
|
||||
it's internal storage format in jj, in order to facilitate several desirable
|
||||
improvements for large repos. It covers both the migration path and the planned
|
||||
end state.
|
||||
|
||||
## Objective
|
||||
|
||||
Redesign Sparse Patterns to accommodate more advanced features for native
|
||||
and custom implementations. This includes three main goals:
|
||||
|
||||
1. Sparse Patterns should be versioned with the working copy
|
||||
1. Sparse Patterns should support more [flexible matching rules](https://github.com/martinvonz/jj/issues/1896)
|
||||
1. Sparse Patterns should support [client path remapping](https://github.com/martinvonz/jj/issues/2288)
|
||||
|
||||
## Current State (as of jj 0.13.0)
|
||||
|
||||
Sparse patterns are an effectively unordered list of prefix strings:
|
||||
|
||||
```txt
|
||||
path/one
|
||||
path/to/dir/two
|
||||
```
|
||||
|
||||
The _set_ of files identified by the Sparse Patterns is all paths which match
|
||||
any provided prefix. This governs what gets materialized in the working copy on
|
||||
checkout, and what is updated on snapshot. The set is stored in working copy
|
||||
state files which are not versioned in the Op Store.
|
||||
|
||||
Because all paths are bare strings with no escaping or higher-level formatting,
|
||||
the current design makes it difficult to add new features like exclusions or
|
||||
path remappings.
|
||||
|
||||
## Proposed State (Sparse Patterns v2)
|
||||
|
||||
Sparse Patterns v2 will be stored as objects in the Op Store, referenced
|
||||
by a `WorkingCopyPatternsId` from the active `View`. They will have a new,
|
||||
ordered structure which can fully represent previous patterns.
|
||||
|
||||
```rust
|
||||
/// Analogues of RepoPath, specifically describing paths in the working copy.
|
||||
struct WorkingCopyPathBuf {
|
||||
String
|
||||
}
|
||||
struct WorkingCopyPath {
|
||||
str
|
||||
}
|
||||
|
||||
pub enum SparsePatternsPathType {
|
||||
Dir, // Everything under <path>/...
|
||||
Files, // Files under <path>/*
|
||||
Exact, // <path> exactly
|
||||
}
|
||||
|
||||
pub struct SparsePatternsPath {
|
||||
path_type: SparsePatternsPathType,
|
||||
include: bool, // True if included, false if excluded.
|
||||
path: RepoPathBuf,
|
||||
}
|
||||
|
||||
pub struct WorkingCopyMapping {
|
||||
src_path: RepoPathBuf,
|
||||
dst_path: WorkingCopyPathBuf,
|
||||
recursive: bool, // If false, only immediate children of src_path (files) are renamed.
|
||||
}
|
||||
|
||||
pub struct WorkingCopyPatterns {
|
||||
sparse_paths: Vec<SparsePatternsPath>,
|
||||
mappings: Vec<WorkingCopyMapping>,
|
||||
}
|
||||
|
||||
pub trait OpStore {
|
||||
...
|
||||
pub fn read_working_copy_patterns(&self, id: &WorkingCopyPatternsId) -> OpStoreResult<WorkingCopyPatterns> { ... }
|
||||
pub fn write_working_copy_patterns(&self, sparse_patterns: &WorkingCopyPatterns) -> OpStoreResult<WorkingCopyPatternsId> { .. }
|
||||
}
|
||||
```
|
||||
|
||||
To support these more complex behaviors, a new `WorkingCopyPatterns` trait will
|
||||
be introduced, initially only as a thin wrapper around the existing prefix
|
||||
format, but soon to be expanded with richer types and functionality.
|
||||
|
||||
```rust
|
||||
impl WorkingCopyPatterns {
|
||||
pub fn to_matcher(&self) -> Box<dyn Matcher> {
|
||||
...
|
||||
}
|
||||
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### Command Syntax
|
||||
|
||||
`SparsePatternsPath` rules can be specified on the CLI and in an editor via a
|
||||
compact syntax:
|
||||
|
||||
```txt
|
||||
(include|exclude):(dir|files|exact):<path>
|
||||
```
|
||||
|
||||
If both prefix terms are omitted, then `include:dir:` is assumed. If any prefix
|
||||
is specified, both must be specified. The editor and CLI will both accept path
|
||||
rules in either format going forward.
|
||||
|
||||
- `jj sparse set --add foo/bar` is equal to `jj sparse set --add include:dir:foo/bar`
|
||||
- `jj sparse set --add exclude:dir:foo/bar` adds a new `Dir` type rule with `include = false`
|
||||
- `jj sparse set --exclude foo/bar` as a possible shorthand for the above
|
||||
- `jj sparse list` will print the explicit rules
|
||||
|
||||
Paths will be stored in an ordered, canonical form which unambiguously describes
|
||||
the set of files to be included. Every `--add` command will append to the end of
|
||||
this list before the patterns are canonicalized. Whether a file is included is
|
||||
determined by the first matching rule in reverse order.
|
||||
|
||||
For example:
|
||||
|
||||
```txt
|
||||
include:dir:foo
|
||||
exclude:dir:foo/bar
|
||||
include:dir:foo/bar/baz
|
||||
exclude:dir:foo/bar/baz/qux
|
||||
```
|
||||
|
||||
Produces rule set which includes "foo/file.txt", excludes "foo/bar/file.txt",
|
||||
includes "foo/bar/baz/file.txt", and excludes "foo/bar/baz/qux/file.txt".
|
||||
|
||||
If the rules are subtly re-ordered, they become canonicalized to a smaller, but
|
||||
functionally equivalent form:
|
||||
|
||||
```txt
|
||||
# Before
|
||||
include:dir:foo
|
||||
exclude:dir:foo/bar/baz/qux
|
||||
include:dir:foo/bar/baz
|
||||
exclude:dir:foo/bar
|
||||
|
||||
# Canonicalized
|
||||
include:dir:foo
|
||||
exclude:dir:foo/bar
|
||||
```
|
||||
|
||||
#### Canonicalization
|
||||
|
||||
There are many ways to represent functionally equivalent `WorkingCopyPatterns`.
|
||||
For instance, the following 4 rule sets are all functionally equivalent:
|
||||
|
||||
```txt
|
||||
# Set 1
|
||||
include:dir:bar
|
||||
include:dir:foo
|
||||
|
||||
# Set 2
|
||||
include:dir:foo
|
||||
include:dir:bar
|
||||
|
||||
# Set 3
|
||||
include:dir:bar
|
||||
include:dir:bar/baz/qux
|
||||
include:dir:foo
|
||||
|
||||
# Set 4
|
||||
include:dir:foo
|
||||
exclude:dir:foo/baz
|
||||
include:dir:bar
|
||||
include:dir:foo/baz
|
||||
```
|
||||
|
||||
Because these patterns are stored in the Op Store now, it is useful for all of
|
||||
these representations to be rewritten into a minimal, canonical form before
|
||||
serialization. In this case, `Set 1` will be the canonical set. The canonical
|
||||
form of a `WorkingCopyPatterns` is defined as the form such that:
|
||||
|
||||
- Every rule affects the functionality (there are no redundant rules)
|
||||
- Rules are sorted lexicographically, but with '/' sorted before all else
|
||||
- This special sorting order is useful for constructing path tries
|
||||
|
||||
### Working Copy Map
|
||||
|
||||
WARNING: This section is intentionally lacking, more research is needed.
|
||||
|
||||
All `WorkingCopyPatterns` will come equipped with a default no-op mapping.
|
||||
These mappings are inspired by and similar to [Perforce client views](https://www.perforce.com/manuals/cmdref/Content/CmdRef/views.html).
|
||||
|
||||
```rust
|
||||
vec![WorkingCopyMapping {
|
||||
src_path: RepoPathBuf::root(),
|
||||
dst_path: WorkingCopyPathBuf::root(),
|
||||
recursive: true,
|
||||
}]
|
||||
```
|
||||
|
||||
`WorkingCopyPatterns` will provide an interface to map working copy paths into
|
||||
repo paths and vice versa. The `WorkingCopy`` trait will apply this mapping to
|
||||
all snapshot and checkout operations, and jj commands which accept relative
|
||||
paths will need to be updated to perform working copy path -> repo path
|
||||
translations as needed. It's not clear at this time _which_ commands will need
|
||||
changing, as some are more likely to refer to repo paths rather than working
|
||||
copy paths.
|
||||
|
||||
TODO: Expand this section.
|
||||
|
||||
In particular, the path rules for sparse patterns will _always_ be repo paths,
|
||||
not working copy paths. Thus, if the working copy wants to track "foo" and
|
||||
rename it to "subdir/bar", they must `jj sparse set --add foo` and
|
||||
`jj map set --from foo --to bar`. In other words, the mapping operation can
|
||||
be thought of as always _after_ the sparse operation.
|
||||
|
||||
#### Command Syntax
|
||||
|
||||
New commands will enable editing of the `WorkingCopyMapping`s:
|
||||
|
||||
TODO: Maybe this should be `jj workspace map ...`?
|
||||
|
||||
- `jj map list` will print all mapping pairs.
|
||||
- `jj map add --from foo --to bar` will add a new mapping to the end of the list.
|
||||
- `jj map remove --from foo` will remove a specific mapping rule.
|
||||
- `jj map edit` will pull up a text editor for manual editing.
|
||||
|
||||
Like sparse paths, mappings will have a compact text syntax for editing in file
|
||||
form, or for adding a rule textually on the CLI:
|
||||
|
||||
```txt
|
||||
"<from>" -> "<to>" [nonrecursive]
|
||||
```
|
||||
|
||||
Like sparse paths, mapping rules are defined to apply in _order_ and on any
|
||||
save operation will be modified to a minimal canonical form. Thus,
|
||||
`jj map set --from "" --to ""` will always completely wipe the map.
|
||||
The first matching rule in reverse list order determines how a particular
|
||||
repo path should be mapped into the working copy, and likewise how a particular
|
||||
working copy path should be mapped into the repo. For simplicity, the
|
||||
'last rule wins' applies both for repo->WC conversions, as well as WC->repo
|
||||
conversions, using the same ordering.
|
||||
|
||||
If a working copy mapping places the same repo file at two distinct working
|
||||
copy paths, snapshotting will fail unless these files are identical. Some
|
||||
specialized filesystems may even treat these as the 'same' file, allowing this
|
||||
to work in some cases.
|
||||
|
||||
If a working copy mapping places two distinct repo files at the same working
|
||||
copy path, checkout will fail with an error regardless of equivalence.
|
||||
|
||||
### Versioning and Storage
|
||||
|
||||
Updating the active `WorkingCopyPatterns` for a particular working copy will now
|
||||
take place in two separate steps: one transaction which updates the op store,
|
||||
and a separate `LockedWorkingCopy` operation which actually updates the working
|
||||
copy. The working copy proto will no longer store `WorkingCopyPatterns`
|
||||
directly, instead storing only a `WorkingCopyPatternsId`. On mismatch with the
|
||||
current op head, the user will be prompted to run `jj workspace update-stale`.
|
||||
|
||||
This gives the user the ability to update the active `WorkingCopyPatterns`
|
||||
whilst not interacting with the local working copy, which is useful for custom
|
||||
integrations which may not be _able_ to check out particular working copy
|
||||
patterns due to problems with the backend (encoding, permission errors, etc.). A
|
||||
bad `jj sparse set --add oops` command can thus be undone, even via `jj op undo`
|
||||
if desired.
|
||||
|
||||
#### View Updates
|
||||
|
||||
The View object will be migrated to store working copy patterns via id. The
|
||||
indirection will save on storage since working copy patterns are not expected to
|
||||
change very frequently.
|
||||
|
||||
```rust
|
||||
// Before:
|
||||
pub wc_commit_ids: HashMap<WorkspaceId, CommitId>,
|
||||
|
||||
// After:
|
||||
pub struct WorkingCopyInfo {
|
||||
pub commit_id: CommitId,
|
||||
pub wc_patterns_id: WorkingCopyPatternsId,
|
||||
}
|
||||
...
|
||||
pub wc_info: HashMap<WorkspaceId, WorkingCopyInfo>,
|
||||
```
|
||||
|
||||
A View object with no stored working copy patterns will be modified at read
|
||||
time to include the current working copy patterns, thus all `read_view`
|
||||
operations will need to pass in the current working copy patterns for a
|
||||
migration period of at least 6 months. After that, we may choose to auto-fill
|
||||
missing working copy infos with a default `WorkingCopyPatterns` as needed.
|
||||
|
||||
### Appendix
|
||||
|
||||
#### Related Work
|
||||
|
||||
[Perforce client maps](https://www.perforce.com/manuals/cmdref/Content/CmdRef/views.html)
|
||||
are very similar in concept to the entirety of `WorkingCopyPatterns`, and this
|
||||
design aims to achieve similar functionality.
|
||||
|
||||
The [Josh Project](https://github.com/josh-project/josh) implements partial git
|
||||
clones in a way similar to how sparse patterns try to work.
|
||||
|
||||
#### Patterns via configuration
|
||||
|
||||
There may be some scenarios where it is valuable to configure working copy
|
||||
patterns via a configuration file, rather than through explicit commands.
|
||||
Generally this only makes sense for automated repos, with the configuration
|
||||
coming from outside the repo - there are too many caveats and edge cases if the
|
||||
configuration comes from inside the repo and/or is fought with by a human.
|
||||
|
||||
No configuration syntax is planned at this time but if we add any, we should
|
||||
probably reuse the compact line syntaxes as much as possible for consistency.
|
|
@ -115,6 +115,7 @@ nav:
|
|||
- 'git-submodules': 'design/git-submodules.md'
|
||||
- 'git-submodule-storage': 'design/git-submodule-storage.md'
|
||||
- 'JJ run': 'design/run.md'
|
||||
- 'Sparse Patterns v2': 'design/sparse-v2.md'
|
||||
- 'Tracking branches': 'design/tracking-branches.md'
|
||||
|
||||
|
||||
|
|
Loading…
Reference in a new issue