353: start documenting plumbing r=nikomatsakis a=nikomatsakis

Feedback desired! I am trying to document an overview of the new salsa 2022 plumbing. I'd love for folks to [read these docs and tell me if they make sense](https://deploy-preview-353--salsa-rs.netlify.app/plumbing.html).

Co-authored-by: Niko Matsakis <niko@alum.mit.edu>
This commit is contained in:
bors[bot] 2022-08-18 23:43:44 +00:00 committed by GitHub
commit 5f3e0ec6f5
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
11 changed files with 365 additions and 15 deletions

View file

@ -29,14 +29,11 @@
- [How Salsa works](./how_salsa_works.md)
- [Videos](./videos.md)
- [Plumbing](./plumbing.md)
- [Generated code](./plumbing/generated_code.md)
- [Diagram](./plumbing/diagram.md)
- [Query groups](./plumbing/query_groups.md)
- [Database](./plumbing/database.md)
- [The `salsa` crate](./plumbing/salsa_crate.md)
- [Query operations](./plumbing/query_ops.md)
- [maybe changed after](./plumbing/maybe_changed_after.md)
- [Fetch](./plumbing/fetch.md)
- [Jars and ingredients](./plumbing/jars_and_ingredients.md)
- [Databases and runtime](./plumbing/database_and_runtime.md)
- [Query operations](./plumbing/query_ops.md)
- [maybe changed after](./plumbing/maybe_changed_after.md)
- [Fetch](./plumbing/fetch.md)
- [Derived queries flowchart](./plumbing/derived_flowchart.md)
- [Cycle handling](./plumbing/cycles.md)
- [Terminology](./plumbing/terminology.md)
@ -46,11 +43,14 @@
- [Derived query](./plumbing/terminology/derived_query.md)
- [Durability](./plumbing/terminology/durability.md)
- [Input query](./plumbing/terminology/input_query.md)
- [Ingredient](./plumbing/terminology/ingredient.md)
- [LRU](./plumbing/terminology/LRU.md)
- [Memo](./plumbing/terminology/memo.md)
- [Query](./plumbing/terminology/query.md)
- [Query function](./plumbing/terminology/query_function.md)
- [Revision](./plumbing/terminology/revision.md)
- [Salsa item](./plumbing/terminology/salsa_item.md)
- [Salsa struct](./plumbing/terminology/salsa_struct.md)
- [Untracked dependency](./plumbing/terminology/untracked.md)
- [Verified](./plumbing/terminology/verified.md)

View file

@ -1,9 +1,19 @@
# Plumbing
{{#include caveat.md}}
This chapter documents the code that salsa generates and its "inner workings".
We refer to this as the "plumbing".
## History
## Overview
* 2020-07-05: Updated to take [RFC 6](rfcs/RFC0006-Dynamic-Databases.md) into account.
* 2020-06-24: Initial version.
The plumbing section is broken up into chapters:
* The [jars and ingredients](./plumbing/jars_and_ingredients.md) covers how each salsa item (like a tracked function) specifies what data it needs and runtime, and how links between items work.
* The [database and runtime](./plumbing/database_and_runtime.md) covers the data structures that are used at runtime to coordinate workers, trigger cancellation, track which functions are active and what dependencies they have accrued, and so forth.
* The [query operations](./plumbing/query_ops.md) chapter describes how the major operations on function ingredients work. This text was written for an older version of salsa but the logic is the same:
* The [maybe changed after](./plumbing/maybe_changed_after.md) operation determines when a memoized value for a tracked function is out of date.
* The [fetch](./plumbing/fetch.md) operation computes the most recent value.
* The [derived queries flowchart](./plumbing/derived_flowchart.md) depicts the logic in flowchart form.
* The [cycle handling](./plumbing/cycles.md) handling chapter describes what happens when cycles occur.
* The [terminology](./plumbing/terminology.md) section describes various words that appear throughout.

View file

@ -0,0 +1,72 @@
# Database and runtime
A salsa database struct is declared by the user with the `#[salsa::db]` annotation.
It contains all the data that the program needs to execute:
```rust,ignore
#[salsa::db(jar0...jarn)]
struct MyDatabase {
storage: Storage<Self>,
maybe_other_fields: u32,
}
```
This data is divided into two categories:
* Salsa-governed storage, contained in the `Storage<Self>` field. This data is mandatory.
* Other fields (like `maybe_other_fields`) defined by the user. This can be anything. This allows for you to give access to special resources or whatever.
## Parallel handles
When used across parallel threads, the database type defined by the user must support a "snapshot" operation.
This snapshot should create a clone of the database that can be used by the parallel threads.
The `Storage` operation itself supports `snapshot`.
The `Snapshot` method returns a `Snapshot<DB>` type, which prevents these clones from being accessed via an `&mut` reference.
## The Storage struct
The salsa `Storage` struct contains all the data that salsa itself will use and work with.
There are three key bits of data:
* The `Shared` struct, which contains the data stored across all snapshots. This is primarily the ingredients described in the [jars and ingredients chapter](./jars_and_ingredients.md), but it also contains some synchronization information (a cond var). This is used for cancellation, as described below.
* The data in the `Shared` struct is only shared across threads when other threads are active. Some operations, like mutating an input, require an `&mut` handle to the `Shared` struct. This is obtained by using the `Arc::get_mut` methods; obviously this is only possible when all snapshots and threads have ceased executing, since there must be a single handle to the `Arc`.
* The `Routes` struct, which contains the information to find any particular ingredient -- this is also shared across all handles, and its construction is also described in the [jars and ingredients chapter](./jars_and_ingredients.md). The routes are separated out from the `Shared` struct because they are truly immutable at all times, and we want to be able to hold a handle to them while getting `&mut` access to the `Shared` struct.
* The `Runtime` struct, which is specific to a particular database instance. It contains the data for a single active thread, along with some links to shraed data of its own.
## Incrementing the revision counter and getting mutable access to the jars
Salsa's general model is that there is a single "master" copy of the database and, potentially, multiple snapshots.
The snapshots are not directly owned, they are instead enclosed in a `Snapshot<DB>` type that permits only `&`-deref,
and so the only database that can be accessed with an `&mut`-ref is the master database.
Each of the snapshots however onlys another handle on the `Arc` in `Storage` that stores the ingredients.
Whenever the user attempts to do an `&mut`-operation, such as modifying an input field, that needs to
first cancel any parallel snapshots and wait for those parallel threads to finish.
Once the snapshots have completed, we can use `Arc::get_mut` to get an `&mut` reference to the ingredient data.
This allows us to get `&mut` access without any unsafe code and
guarantees that we have successfully managed to cancel the other worker threads
(or gotten ourselves into a deadlock).
The code to acquire `&mut` access to the database is the `jars_mut` method:
```rust
{{#include ../../../components/salsa-2022/src/storage.rs:jars_mut}}
```
The key initial point is that it invokes `cancel_other_workers` before proceeding:
```rust
{{#include ../../../components/salsa-2022/src/storage.rs:cancel_other_workers}}
```
## The Salsa runtime
The salsa runtime offers helper methods that are accessed by the ingredients.
It tracks, for example, the active query stack, and contains methods for adding dependencies between queries (e.g., `report_tracked_read`) or [resolving cycles](./cycles.md).
It also tracks the current revision and information about when values with low or high durability last changed.
Basically, the ingredient structures store the "data at rest" -- like memoized values -- and things that are "per ingredient".
The runtime stores the "active, in-progress" data, such as which queries are on the stack, and/or the dependencies accessed by the currently active query.

View file

@ -0,0 +1,217 @@
# Jars and ingredients
{{#include ../caveat.md}}
This page covers how data is organized in salsa and how links between salsa items (e.g., dependency tracking) works.
## Salsa items and ingredients
A **salsa item** is some item annotated with a salsa annotation that can be included in a jar.
For example, a tracked function is a salsa item:
```rust
#[salsa::tracked]
fn foo(db: &dyn Db, input: MyInput) { }
```
...and so is a salsa input...
```rust
#[salsa::input]
struct MyInput { }
```
...or a tracked struct:
```rust
#[salsa::tracked]
struct MyStruct { }
```
Each salsa item needs certain bits of data at runtime to operate.
These bits of data are called **ingredients**.
Most salsa items generate a single ingredient, but sometimes they make more than one.
For example, a tracked function generates a [`FunctionIngredient`].
A tracked struct however generates several ingredients, one for the struct itself (a [`TrackedStructIngredient`],
and one [`FunctionIngredient`] for each value field.
[`FunctionIngredient`]: https://github.com/salsa-rs/salsa/blob/becaade31e6ebc58cd0505fc1ee4b8df1f39f7de/components/salsa-2022/src/function.rs#L42
[`TrackedStructIngredient`]: https://github.com/salsa-rs/salsa/blob/becaade31e6ebc58cd0505fc1ee4b8df1f39f7de/components/salsa-2022/src/tracked_struct.rs#L18
### Ingredients define the core logic of salsa
Most of the interesting salsa code lives in these ingredients.
For example, when you create a new tracked struct, the method [`TrackedStruct::new_struct`] is invoked;
it is responsible for determining the tracked struct's id.
Similarly, when you call a tracked function, that is translated into a call to [`TrackedFunction::fetch`],
which decides whether there is a valid memoized value to return,
or whether the function must be executed.
[`TrackedStruct::new_struct`]: https://github.com/salsa-rs/salsa/blob/becaade31e6ebc58cd0505fc1ee4b8df1f39f7de/components/salsa-2022/src/tracked_struct.rs#L76
[`TrackedFunction::fetch`]: https://github.com/salsa-rs/salsa/blob/becaade31e6ebc58cd0505fc1ee4b8df1f39f7de/components/salsa-2022/src/function/fetch.rs#L15
### Ingredient interfaces are not stable or subject to semver
Interfaces are not meant to be directly used by salsa users.
The salsa macros generate code that invokes the ingredients.
The APIs may change in arbitrary ways across salsa versions,
as the macros are kept in sync.
### The `Ingredient` trait
Each ingredient implements the [`Ingredient<DB>`] trait, which defines generic operations supported by any kind of ingredient.
For example, the method `maybe_changed_after` can be used to check whether some particular piece of data stored in the ingredient may have changed since a given revision:
[`Ingredient<DB>`]: https://github.com/salsa-rs/salsa/blob/becaade31e6ebc58cd0505fc1ee4b8df1f39f7de/components/salsa-2022/src/ingredient.rs#L15
[`maybe_changed_after`]: https://github.com/salsa-rs/salsa/blob/becaade31e6ebc58cd0505fc1ee4b8df1f39f7de/components/salsa-2022/src/ingredient.rs#L21-L22
We'll see below that each database `DB` is able to take an `IngredientIndex` and use that to get a `&dyn Ingredient<DB>` for the corresponding ingredient.
This allows the database to perform generic operations on a numbered ingredient without knowing exactly what the type of that ingredient is.
### Jars are a collection of ingredients
When you declare a salsa jar, you list out each of the salsa items that are included in that jar:
```rust,ignore
#[salsa::jar]
struct Jar(
foo,
MyInput,
MyStruct
);
```
This expands to a struct like so:
```rust
struct Jar(
<foo as IngredientsFor>::Ingredient,
<MyInput as IngredientsFor>::Ingredient,
<MyStruct as IngredientsFor>::Ingredient,
)
```
The `IngredientsFor` trait is used to define the ingredients needed by some salsa item, such as the tracked function `foo`
or the tracked struct `MyInput`.
Each salsa item defines a type `I`, so that `<I as IngredientsFor>::Ingredient` gives the ingredients needed by `I`.
### Database is a tuple of jars
Salsa's database storage ultimately boils down to a tuple of jar structs,
where each jar struct (as we just saw) itself contains the ingredients
for the salsa items within that jar.
The database can thus be thought of as a list of ingredients,
although that list is organized into a 2-level hierarchy.
The reason for this 2-level hierarchy is that it permits separate compilation and privacy.
The crate that lists the jars doens't have to know the contents of the jar to embed the jar struct in the database.
And some of the types that appear in the jar may be private to another struct.
### The HasJars trait and the Jars type
Each salsa database implements the `HasJars` trait,
generated by the `salsa::db` procedural macro.
The `HarJars` trait, among other things, defines a `Jars` associated type that maps to a tuple of the jars in the trait.
For example, given a database like this...
```rust,ignore
#[salsa::db(Jar1, ..., JarN)]
struct MyDatabase {
storage: salsa::Storage<Self>
}
```
...the `salsa::db` macro would generate a `HasJars` impl that (among other things) contains `type Jars = (Jar1, ..., JarN)`:
```rust,ignore
{{#include ../../../components/salsa-2022-macros/src/db.rs:HasJars}}
```
In turn, the `salsa::Storage<DB>` type ultimately contains a struct `Shared` that embeds `DB::Jars`, thus embedding all the data for each jar.
### Ingredient indices
During initialization, each ingredient in the database is assigned a unique index called the [`IngredientIndex`].
This is a 32-bit number that identifies a particular ingredient from a particular jar.
[`IngredientIndex`]: https://github.com/salsa-rs/salsa/blob/becaade31e6ebc58cd0505fc1ee4b8df1f39f7de/components/salsa-2022/src/routes.rs#L5-L9
### Routes
In addition to an index, each ingredient in the database also has a corresponding *route*.
A route is a closure that, given a reference to the `DB::Jars` tuple,
returns a `&dyn Ingredient<DB>` reference.
The route table allows us to go from the `IngredientIndex` for a particular ingredient
to its `&dyn Ingredient<DB>` trait object.
The route table is created while the database is being initialized,
as described shortly.
### Database keys and dependency keys
A `DatabaseKeyIndex` identifies a specific value stored in some specific ingredient.
It combines an [`IngredientIndex`] with a `key_index`, which is a `salsa::Id`:
```rust,ignore
{{#include ../../../components/salsa-2022/src/key.rs:DatabaseKeyIndex}}
```
A `DependencyIndex` is similar, but the `key_index` is optional.
This is used when we sometimes wish to refer to the ingredient as a whole, and not any specific value within the ingredient.
These kinds of indices are used to store connetions between ingredients.
For example, each memoized value has to track its inputs.
Those inputs are stored as dependency indices.
We can then do things like ask, "did this input change since revision R?" by
* using the ingredient index to find the route and get a `&dyn Ingredient<DB>`
* and then invoking the `maybe_changed_since` method on that trait object.
### HasJarsDyn
There is one catch in the above setup.
We need the database to be dyn-safe, and we also need to be able to define the database trait and so forth without knowing the final database type to enable separate compilation.
Traits like `Ingredient<DB>` require knowing the full `DB` type.
If we had one function ingredient directly invoke a method on `Ingredient<DB>`, that would imply that it has to be fully generic and only instantiated at the final crate, when the full database type is available.
We solve this via the `HasJarsDyn` trait. The `HasJarsDyn` trait exports method that combine the "find ingredient, invoking method" steps into one method:
```rust,ignore
{{#include ../../../components/salsa-2022/src/storage.rs:HasJarsDyn}}
```
So, technically, to check if an input has changed, an ingredient:
* Invokes `HasJarsDyn::maybe_changed_after` on the `dyn Database`
* The impl for this method (generated by `#[salsa::db]`):
* gets the route for the ingredient from the ingredient index
* uses the route to get a `&dyn Ingredient`
* invokes `maybe_changed_after` on that ingredient
### Initializing the database
The last thing to dicsuss is how the database is initialized.
The `Default` implementation for `Storage<DB>` does the work:
```rust,ignore
{{#include ../../../components/salsa-2022/src/storage.rs:default}}
```
First, it creates an empty `Routes` instance.
Then it invokes the `DB::create_jars` method.
The implementation of this method is defined by the `#[salsa::db]` macro; it simply invokes the `Jar::create_jar` method on each of the jars:
```rust,ignore
{{#include ../../../components/salsa-2022-macros/src/db.rs:create_jars}}
```
This implementation for `create_jar` is geneated by the `#[salsa::jar]` macro, and simply walks over the representative type for each salsa item and ask *it* to create its ingredients
```rust,ignore
{{#include ../../../components/salsa-2022-macros/src/jar.rs:create_jar}}
```
The code to create the ingredients for any particular item is generated by their associated macros (e.g., `#[salsa::tracked]`, `#[salsa::input]`), but it always follows a particular structure.
To create an ingredient, we first invoke `Routes::push` which creates the routes to that ingredient and assigns it an `IngredientIndex`.
We can then invoke (e.g.) `FunctionIngredient::new` to create the structure.
The *routes* to an ingredient are defined as closures that, given the `DB::Jars`, can find the data for a particular ingredient.

View file

@ -0,0 +1,4 @@
# Ingredient
An *ingredient* is an individual piece of storage used to create a [salsa item](./salsa_item.md)
See the [jars and ingredients](../jars_and_ingredients.md) chapter for more details.

View file

@ -0,0 +1,4 @@
# Salsa item
A salsa item is something that is decorated with a `#[salsa::foo]` macro, like a tracked function or struct.
See the [jars and ingredients](../jars_and_ingredients.md) chapter for more details.

View file

@ -0,0 +1,9 @@
# Salsa struct
A salsa struct is a struct decorated with one of the salsa macros:
* `#[salsa::tracked]`
* `#[salsa::input]`
* `#[salsa::interned]`
See the [salsa overview](../../overview.md) for more details.

View file

@ -86,8 +86,10 @@ fn has_jars_impl(args: &Args, input: &syn::ItemStruct, storage: &syn::Ident) ->
let jar_paths: Vec<&syn::Path> = args.jar_paths.iter().collect();
let db = &input.ident;
parse_quote! {
// ANCHOR: HasJars
impl salsa::storage::HasJars for #db {
type Jars = (#(#jar_paths,)*);
// ANCHOR_END: HasJars
fn jars(&self) -> (&Self::Jars, &salsa::Runtime) {
self.#storage.jars()
@ -97,6 +99,7 @@ fn has_jars_impl(args: &Args, input: &syn::ItemStruct, storage: &syn::Ident) ->
self.#storage.jars_mut()
}
// ANCHOR: create_jars
fn create_jars(routes: &mut salsa::routes::Routes<Self>) -> Self::Jars {
(
#(
@ -104,6 +107,7 @@ fn has_jars_impl(args: &Args, input: &syn::ItemStruct, storage: &syn::Ident) ->
)*
)
}
// ANCHOR_END: create_jars
}
}
}

View file

@ -104,6 +104,7 @@ pub(crate) fn jar_impl(
.zip(0..)
.map(|(f, i)| Ident::new(&format!("i{}", i), f.ty.span()))
.collect();
// ANCHOR: create_jar
quote! {
impl<'salsa_db> salsa::jar::Jar<'salsa_db> for #jar_struct {
type DynDb = dyn #jar_trait + 'salsa_db;
@ -119,6 +120,7 @@ pub(crate) fn jar_impl(
}
}
}
// ANCHOR_END: create_jar
}
pub(crate) fn jar_struct(input: &ItemStruct) -> ItemStruct {

View file

@ -43,6 +43,7 @@ where
}
}
// ANCHOR: DatabaseKeyIndex
/// An "active" database key index represents a database key index
/// that is actively executing. In that case, the `key_index` cannot be
/// None.
@ -51,6 +52,7 @@ pub struct DatabaseKeyIndex {
pub(crate) ingredient_index: IngredientIndex,
pub(crate) key_index: Id,
}
// ANCHOR_END: DatabaseKeyIndex
impl DatabaseKeyIndex {
pub fn ingredient_index(self) -> IngredientIndex {

View file

@ -16,16 +16,23 @@ use super::{ParallelDatabase, Revision};
/// The "storage" struct stores all the data for the jars.
/// It is shared between the main database and any active snapshots.
pub struct Storage<DB: HasJars> {
/// Data shared across all databases.
/// Data shared across all databases. This contains the ingredients needed by each jar.
/// See the ["jars and ingredients" chapter](https://salsa-rs.github.io/salsa/plumbing/jars_and_ingredients.html)
/// for more detailed description.
///
/// Even though this struct is stored in an `Arc`, we sometimes get mutable access to it
/// by using `Arc::get_mut`. This is only possible when all parallel snapshots have been dropped.
shared: Arc<Shared<DB>>,
/// The "ingredients" structure stores the information about how to find each ingredient in the database.
/// It allows us to take the [`IngredientIndex`] assigned to a particular ingredient
/// and get back a [`dyn Ingredient`][`Ingredient`] for the struct that stores its data.
///
/// This is kept separate from `shared` so that we can clone it and retain `&`-access even when we have `&mut` access to `shared`.
routes: Arc<Routes<DB>>,
/// The runtime for this particular salsa database handle.
/// Each handle gets its own runtime, but the runtimes have shared state between them.s
/// Each handle gets its own runtime, but the runtimes have shared state between them.
runtime: Runtime,
}
@ -43,6 +50,7 @@ struct Shared<DB: HasJars> {
cvar: Condvar,
}
// ANCHOR: default
impl<DB> Default for Storage<DB>
where
DB: HasJars,
@ -60,6 +68,7 @@ where
}
}
}
// ANCHOR_END: default
impl<DB> Storage<DB>
where
@ -84,23 +93,35 @@ where
&self.runtime
}
// ANCHOR: jars_mut
/// Gets mutable access to the jars. This will trigger a new revision
/// and it will also cancel any ongoing work in the current revision.
/// Any actual writes that occur to data in a jar should use
/// [`Runtime::report_tracked_write`].
pub fn jars_mut(&mut self) -> (&mut DB::Jars, &mut Runtime) {
// Wait for all snapshots to be dropped.
self.cancel_other_workers();
// Increment revision counter.
self.runtime.new_revision();
let routes = self.routes.clone();
// Acquire `&mut` access to `self.shared` -- this is only possible because
// the snapshots have all been dropped, so we hold the only handle to the `Arc`.
let shared = Arc::get_mut(&mut self.shared).unwrap();
// Inform other ingredients that a new revision has begun.
// This gives them a chance to free resources that were being held until the next revision.
let routes = self.routes.clone();
for route in routes.reset_routes() {
route(&mut shared.jars).reset_for_new_revision();
}
// Return mut ref to jars + runtime.
(&mut shared.jars, &mut self.runtime)
}
// ANCHOR_END: jars_mut
// ANCHOR: cancel_other_workers
/// Sets cancellation flag and blocks until all other workers with access
/// to this storage have completed.
///
@ -119,11 +140,14 @@ where
// We create a mutex here because the cvar api requires it, but we
// don't really need one as the data being protected is actually
// the jars above.
//
// The cvar `self.shared.cvar` is notified by the `Drop` impl.
let mutex = parking_lot::Mutex::new(());
let mut guard = mutex.lock();
self.shared.cvar.wait(&mut guard);
}
}
// ANCHOR_END: cancel_other_workers
pub fn ingredient(&self, ingredient_index: IngredientIndex) -> &dyn Ingredient<DB> {
let route = self.routes.route(ingredient_index);
@ -170,7 +194,8 @@ pub trait HasJar<J> {
fn jar_mut(&mut self) -> (&mut J, &mut Runtime);
}
// Dyn friendly subset of HasJars
// ANCHOR: HasJarsDyn
/// Dyn friendly subset of HasJars
pub trait HasJarsDyn {
fn runtime(&self) -> &Runtime;
@ -196,6 +221,7 @@ pub trait HasJarsDyn {
/// [`SalsaStructInDb::register_dependent_fn`](`crate::salsa_struct::SalsaStructInDb::register_dependent_fn`).
fn salsa_struct_deleted(&self, ingredient: IngredientIndex, id: Id);
}
// ANCHOR_END: HasJarsDyn
pub trait HasIngredientsFor<I>
where