Summary
The table below provides an overview comparing different LangSmith configurations for various load patterns.Load Pattern (Reads/Writes) | Concurrent Frontend Users | Traces Submitted per Second | Frontend Replicas | Platform Backend Replicas | Queue Replicas | Backend Replicas | Redis Resources | ClickHouse Resources | ClickHouse Setup | Postgres Resources | Blob Storage |
---|---|---|---|---|---|---|---|---|---|---|---|
Low/Low | 5 | 10 | 1 (default) | 3 (default) | 3 (default) | 2 (default) | 8 Gi (default) | 4 CPU, 16 Gi (default) | Single instance | 2 CPU, 8 GB memory, 10GB storage (external) | Disabled |
Low/High | 5 | 1000 | 4 | 20 | 160 | 5 | 200 Gi external | 10 CPU, 32Gi memory | Single instance | 2 CPU, 8 GB memory, 10GB storage (external) | Enabled |
High/Low | 50 | 10 | 2 | 3 (default) | 6 | 40 | 8 Gi (default) | 8 CPU, 16 Gi per replica | 3-node replicated cluster | 2 CPU, 8 GB memory, 10GB storage (external) | Enabled |
Medium/Medium | 20 | 100 | 2 | 3 (default) | 10 | 16 | 13Gi external | 16 CPU, 24Gi memory | Single instance | 2 CPU, 8 GB memory, 10GB storage (external) | Enabled |
High/High | 50 | 1000 | 4 | 20 | 160 | 50 | 200 Gi external | 14 CPU, 24 Gi per replica | 3-node replicated cluster | 2 CPU, 8 GB memory, 10GB storage (external) | Enabled |
- Concurrent Frontend Users: Number of users actively viewing traces on the frontend
- Traces Submitted per Second: Number of traces being ingested via SDKs or API endpoints
- Replicated ClickHouse: Recommended for high read loads to prevent degraded performance. Another option would be managed clickhouse.
- PostgreSQL: We recommend using an external instance and enabling autoexpansion for the disk to handle growing data requirements.
values.yaml
snippet for you to start with for your self-hosted LangSmith instance.
Trace ingestion (write path)
Common usage that put load on the write path:- Ingesting traces via the Python or JavaScript LangSmith SDK
- Ingesting traces via the
@traceable
wrapper - Submitting traces via the
/runs/multipart
endpoint
- Platform backend service: Receives initial request to ingest traces and places traces on a Redis queue
- Redis cache: Used to queue traces that need to be persisted
- Queue service: Persists traces for querying
- ClickHouse: Persistent storage used for traces
- Give ClickHouse more resources (CPU and memory) if it is approaching resource limits.
- Increase the number of platform-backend pods if ingest requests are taking long to respond.
- Increase queue service pod replicas if traces are not being processed from Redis fast enough.
- Use a larger Redis cache if you notice that the current Redis instance is reaching resource limits. This could also be a reason why ingest requests take a long time.
Trace querying (read path)
Common usage that puts load on the read path:- Users on the frontend looking at tracing projects or individual traces
- Scripts used to query for trace info
- Hitting either the
/runs/query
or/runs/<run-id>
api endpoints
- Backend service: Receives the request and submits a query to ClickHouse to then respond to the request
- ClickHouse: Persistent storage for traces. This is the main database that is queried when requesting trace info.
- Increase the number of backend service pods. This would be most impactful if backend service pods are reaching 1 core CPU usage.
- Give ClickHouse more resources (CPU or Memory). ClickHouse can be very resource intensive, but it should lead to better performance.
- Move to a replicated ClickHouse cluster. Adding replicas of ClickHouse helps with read performance, but we recommend staying below 5 replicas (start with 3).
Example LangSmith configurations for scale
Below we provide some example LangSmith configurations based on expected read and write loads. For read load (trace querying):- Low means roughly 5 users looking at traces at a time (about 10 requests per second)
- Medium means roughly 20 users looking at traces at a time (about 40 requests per second)
- High means roughly 50 users looking at traces at a time (about 100 requests per second)
- Low means up to 10 traces submitted per second
- Medium means up to 100 traces submitted per second
- High means up to 1000 traces submitted per second
The exact optimal configuration depends on your usage and trace payloads. Use the examples below in combination with the information above and your specific usage to update your LangSmith configuration as you see fit. If you have any questions, please reach out to the LangChain team.
Low reads, low writes
The default LangSmith configuration will handle this load. No custom resource configuration is needed here.Low reads, high writes
You have a very high scale of trace ingestions, but single digit number of users on the frontend querying traces at any one time. For this, we recommend a configuration like this:High reads, low writes
You have a relatively low scale of trace ingestions, but many frontend users querying traces and/or have scripts that hit the/runs/query
or /runs/<run-id>
endpoints frequently.
For this, we strongly recommend setting up a replicated ClickHouse cluster to enable high read scale at low latency. See our external ClickHouse doc for more guidance on how to setup a replicated ClickHouse cluster. For this load pattern, we recommend using a 3 node replicated setup, where each replica in the cluster should have resource requests of 8+ cores and 16+ GB memory, and resource limit of 12 cores and 32 GB memory.
For this, we recommend a configuration like this:
Medium reads, medium writes
This is a good all around configuration that should be able to handle most usage patterns of LangSmith. In internal testing, this configuration allowed us to scale to 100 traces ingested per second and 40 read requests per second. For this, we recommend a configuration like this:If you still notice slow reads with the above configuration, we recommend moving to a replicated Clickhouse cluster setup
High reads, high writes
You have a very high rate of trace ingestion (approaching 1000 traces submitted per second) and also have many users querying traces on the frontend (over 50 users) and/or scripts that are consistently making requests to/runs/query
or /runs/<run-id>
endpoints.
For this, we very strongly recommend setting up a replicated ClickHouse cluster to prevent degraded read performance at high write scale. See our external ClickHouse doc for more guidance on how to set up a replicated ClickHouse cluster. For this load pattern, we recommend using a 3 node replicated setup, where each replica in the cluster should have resource requests of 14+ cores and 24+ GB memory, and resource limit of 20 cores and 48 GB memory. We also recommend that each node/instance of ClickHouse has 600 Gi of volume storage for each day of TTL that you enable (as per the configuration below).
Overall, we recommend a configuration like this:
Ensure that the Kubernetes cluster is configured with sufficient resources to scale to the recommended size. After deployment, all of the pods in the Kubernetes cluster should be in a
Running
state. Pods stuck in Pending
may indicate that you are reaching node pool limits or need larger nodes.Also, ensure that any ingress controller deployed on the cluster is able to handle the desired load to prevent bottlenecks.