Resolved
More timeline details.
Resolved
Updated timeline details.
Resolved
Added incident details regarding replay and metrics endpoint.
Resolved
After monitoring the fix, we can confirm our team has resolved the disruption. We appreciate your patience and understanding with our team today as we worked to address this.
Monitoring
It appears the high burst traffic that was overloading session metrics retrieval has been resolved by our new rate limits and query optimizations. We are monitoring the resolution and will continue posting updates.
Investigating
Problem: We are experiencing degraded performance on the Stagehand API.
Impact: Customers are encountering slow performance when accessing the Stagehand API `/replay` endpoint (used by `stagehand.metrics`).
Causes: We identified that Stagehand session replay metrics retrieval is sporadically overloaded due to high traffic on a few outlier sessions that have thousands of actions logged.
Steps to resolve: We have temporarily rate-limited these Stagehand API queries, updated the status page, and investigated both infrastructure and query execution, but a few customers are still experiencing degraded on `/act` and `/replay` endpoints.