This PR adjusts the billing logic to not write any records to
`billing_events` if:
- The user is staff, as we don't want to bill staff members
- Billing is disabled (we currently enable billing based on the presence
of the Stripe API key)
Release Notes:
- N/A
This PR adds usage-based billing for LLM interactions in the Assistant.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
This PR makes the `has_llm_subscription` and
`max_monthly_spend_in_cents` fields in the `LlmTokenClaims` required.
This change will be safe to deploy in ~45 minutes.
Release Notes:
- N/A
This PR adds a new `Cents` type that can be used to represent a monetary
value in cents.
This cuts down on the primitive obsession we were using when dealing
with money in the billing code.
Release Notes:
- N/A
This PR makes the `github_user_login` field required in the
`LlmTokenClaims`.
We previously added this in
https://github.com/zed-industries/zed/pull/16316 and made it optional
for backwards-compatibility.
It's been more than long enough for all of the previous LLM tokens to
have expired, so we can now make the field required.
Release Notes:
- N/A
This PR reworks our existing billing code in preparation for charging
based on LLM usage.
We aren't yet exercising the new billing-related code outside of
development.
There are some noteworthy changes for our existing LLM usage tracking:
- A new `monthly_usages` table has been added for tracking usage
per-user, per-model, per-month
- The per-month usage measures have been removed, in favor of the
`monthly_usages` table
- All of the per-month metrics in the Clickhouse rows have been changed
from a rolling 30-day window to a calendar month
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Richard <richard@zed.dev>
Co-authored-by: Max <max@zed.dev>
This PR extends the LLM usage tracking to support tracking usage for
cache writes and reads for Anthropic models.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Antonio <antonio@zed.dev>
Add `/auto` behind a feature flag that's disabled for now, even for
staff.
We've decided on a different design for context inference, but there are
parts of /auto that will be useful for that, so we want them in the code
base even if they're unused for now.
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
This PR adds a `GET /models` endpoint to the LLM service.
This endpoint returns the models that the authenticated user has access
to.
This is the first step towards populating the models for the hosted
service from the server.
Release Notes:
- N/A
- Cloudflare provides ISO-3166-1 country code for protectorates. Expand our allowlist to include the territories of countries on the allowlist (US, UK, France, Australia, New Zealand).
- Also include the country_code in the error message when we block.
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
This PR fixes an issue where active user counts were being computed
across _all_ measures instead of the per-minute measures.
We now compute them using the tokens per minute measure, as we're
concerned with usage in recent minutes.
Release Notes:
- N/A
This PR fixes an issue where the active user count spanned individual
models.
We now track the active user counts on a per-model basis.
Release Notes:
- N/A
This PR fixes the writing of LLM rate limit events to Clickhouse.
We had a table in the table name: `llm_rate_limits` instead of
`llm_rate_limit_events`.
I also extracted a helper function to write to Clickhouse so we can use
it anywhere we need to.
Release Notes:
- N/A
This PR reworks how we do checks for model names in the LLM service.
We now normalize the model names using the models defined in the
database.
Release Notes:
- N/A
This PR updates the LLM service to include the GitHub login on its
spans.
We need to pass this information through on the LLM token, so it will
temporarily be `None` until this change is deployed and new tokens have
been issued.
Release Notes:
- N/A
- db deadlock in GetLlmToken for non-staff users
- typo in allowed model name for non-staff users
Release Notes:
- N/A
---------
Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Joseph <joseph@zed.dev>
This PR adds the ability to revoke access tokens for the LLM service.
There is a new `revoked_access_tokens` table that contains the
identifiers (`jti`) of revoked access tokens.
To revoke an access token, insert a record into this table:
```sql
insert into revoked_access_tokens (jti) values ('1e887b9e-37f5-49e8-8feb-3274e5a86b67');
```
We now attach the `jti` as `authn.jti` to the tracing spans so that we
can associate an access token with a given request to the LLM service.
Release Notes:
- N/A
This PR makes it so Zed staff can use a separate Anthropic API key for
the LLM service.
We also added an `is_staff` column to the `usages` table so that we can
exclude staff usage from the "active users" metrics that influence the
rate limits.
Release Notes:
- N/A
---------
Co-authored-by: Max <max@zed.dev>
This prevents users from accessing other models, such as OpenAI's GPT-4
or Google's Gemini-Pro.
Staff members can still access all models.
Co-authored-by: Thorsten <thorsten@zed.dev>
Release Notes:
- N/A
---------
Co-authored-by: Thorsten <thorsten@zed.dev>
This PR removes the unused `ignore_checksum_mismatch` parameter to
`run_database_migrations`.
We were always passing `false`, which meant the behavior didn't need to
be parameterized.
Release Notes:
- N/A
This PR puts the initial infrastructure for the LLM service's database
in place.
The LLM service will be using a separate Postgres database, with its own
set of migrations.
Currently we only connect to the database in development, as we don't
yet have the database setup for the staging/production environments.
Release Notes:
- N/A
This PR updates the LLM service to authorize access to language model
providers based on the requester's country.
We detect the country using Cloudflare's
[`CF-IPCountry`](https://developers.cloudflare.com/fundamentals/reference/http-request-headers/#cf-ipcountry)
header.
The country code is then checked against the list of supported countries
for the given LLM provider. Countries that are not supported will
receive an `HTTP 451: Unavailable For Legal Reasons` response.
Release Notes:
- N/A
This PR introduces a separate backend service for making LLM calls.
It exposes an HTTP interface that can be called by Zed clients. To call
these endpoints, the client must provide a `Bearer` token. These tokens
are issued/refreshed by the collab service over RPC.
We're adding this in a backwards-compatible way. Right now the access
tokens can only be minted for Zed staff, and calling this separate LLM
service is behind the `llm-service` feature flag (which is not
automatically enabled for Zed staff).
Release Notes:
- N/A
---------
Co-authored-by: Marshall <marshall@zed.dev>
Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>