Here's a few key steps to follow if your project has become unresponsive:
1. Gather Data
We need to figure out what's wrong before we can fix it.
First, head to your project's Performance Metrics page.
Check your CPU Usage graph. Does your database or production PHP CPU appear to be elevated?
Check your Requests Per Second graph. Are you seeing a spike in traffic?
Check the Response Time graph. Are they above 1-2 seconds on average? If so, make a note of the time when response times started to climb.
Head to your project's Logs page.
Make a note of any errors in the logs which potentially might explain the outage. It's possible they may be relevant, or they may be symptoms of another issue.
Head to your project's Traffic Metrics page.
Can you see any unusual spikes in traffic on the Traffic Over Time graph? If there is, try fine-tuning your time period by dragging an area on the graph that covers the traffic spike. Or if you need to taker a wider view, use the period dropdown to see data for a wider period of time.
Take a look in the Traffic By Source section. Are there are attributes (e.g. IP address, user agent, country, ASN, etc) which stand out as higher than expected? Try filtering specific attribute items that stand out to view a reduced set of requests that match that value. What paths are they hitting? Does the traffic appear legitimate, or does it seem to be malicious?
We should have enough data now to triage the issue.
2. Triage
Click one of the below that best matches what you're seeing:
High Request Rate, High CPU Usage
When we've got both an elevated request rate and high CPU usage, it's nearly always the case that the high traffic is overwhelming the resources of the project.
To resolve these kind of outages, we need to either:
Reduce the request rate
Increase the CPU capacity of the project
Reducing the request rate
If there are any IP addresses that are making a lot of requests and we're confident they're malicious and not legitimate users, then go ahead and block them.
If there are any countries or cloud ASNs that are making a suspicious amount of traffic and we're confident they're not legitimate, then we can target them with Cloudflare's Managed Challenge captchas in the project's Access Control page.
If it's clear the high levels of traffic hitting the project are automated and malicious, activate the Under Attack mode to force all visitors to the site through a Cloudflare captcha.
Increase the CPU capacity of the project
If the site is experiencing an influx of legitimate traffic that is overwhelming the site (e.g. a surprise email campaign has been sent out) then we'd suggest activating Spike Protection to temporarily move the project onto a large server with more capacity to handle the traffic.
If the site just needs a little extra PHP CPU capacity, then the Additional PHP Instance addon can be used to horizontally scale the project's PHP resources.
Recovering once traffic has reduced
When a large spike of traffic is received by your project, each request is put in a queue and waits to be processed. This backlog can last several minutes if a lot of traffic has been received and can continue to cause high CPU usage. To clear the request backlog, restart the project's PHP instances.
High CPU Usage, Quiet Queue
When there's no obvious traffic increase, a large CPU increase, and no queue jobs being run, then it's usually the specific paths that traffic is hitting that is key, not the volume.
The first task is identifying which paths are causing the high CPU usage. Often we need to dig deeper into the Traffic Metrics page to see if we can isolate a subset of paths that are getting hit more regularly. Zooming into the specific time on the Traffic By Time graph when the CPU became elevated can let us zero in on likely culprits.
Once we have a clearer idea on the paths causing the outage, we need to either:
Improve performance of the CPU-intensive paths
Increase CPU capacity of the project
Reduce malicious traffic to the CPU-intensive paths
Reduce malicious traffic to the CPU-intensive paths
If it's clear that malicious traffic is frequently hitting CPU-intensive paths (e.g. `/search`), then we've got a few options.
If there are any IP addresses that are making a lot of requests and we're confident they're malicious and not legitimate users, then go ahead and block them.
If there are any countries or cloud ASNs that are making a suspicious amount of traffic and we're confident they're not legitimate, then we can target them with Cloudflare's Managed Challenge captchas in the project's Access Control page.
If it's clear the high levels of traffic hitting the project are automated and malicious, activate the Under Attack mode to force all visitors to the site through a Cloudflare captcha.
Increase CPU capacity of the project
The CPU capacity of your project can be temporarily increased using Spike Protection or permanently increased by upgrading your project's plan.
If the site just needs a little extra PHP CPU capacity, then the Additional PHP Instance addon can be used to horizontally scale the project's PHP resources.
Improve performance of the CPU-intensive paths
Making use of Servd's Static Caching and twig `{% cache %}` tags can be a good way to reduce the load of CPU-intensive paths. However, as these kinds of changes usually require testing to ensure they don't have any unintended impact, we generally recommend getting the site back online by other means, and then rectify the underlying performance issues once the outage has been resolved.
High Database CPU Usage, Busy Queue
If the project is receiving a normal amount of traffic but has high database CPU usage, and you can see in the site's admin utilities queue section that there's a lot of queue jobs being performed, then it's possible that the queue jobs are pushing the database to it's limit and causing the outage.
To resolve this kind of issue we can either:
Release all queue jobs
Increase the database CPU capacity
Release all queue jobs
All pending queue jobs can be released via the project's Queue Manager, which can be found in the Craft Control Panel Utilities at the `/admin/utilities/queue-manager` by default. Or by running `./craft queue/release all` on the CLI.
It's important to note that if you release all jobs, they will be deleted and you won't be able to re-run them later on. To avoid that, you can take a manual backup before releasing and then later on re-import the queue database table to restore the released jobs.
Increase the database CPU capacity
The database CPU capacity can be temporarily increased using Spike Protection or permanently increased by upgrading your project's plan.
Low CPU Usage
When a site has high response times but low CPU usage, this is nearly always down to some form of I/O taking longer than normal. For example, if a page is making HTTP calls to a remote service to get content to render a page, and that service in experiencing some form of outage, then you'll see low CPU usage as the project is blocked waiting for the HTTP request to complete.
The project logs will generally contain lots of timeouts errors, or invalid response errors, etc.
A common cause of these situations is when the `svg()` twig function is used to inline a SVG asset file that's stored remotely. If the location that stores the SVG file experiences an outage, then this can block pages from rendering and take a site down.
To resolve this kind of issue we need to first locate the source of the slow remote I/O. Once we have, then we can either:
Temporarily disable or comment out the code which makes the I/O request and then re-enable it once it's clear the remote endpoint has recovered.
Switch the I/O endpoint to a healthy mirror or alternative that isn't experiencing an outage.
None Of The Above
If none of above scenarios appear applicable or you're unable to resolve the outage, then we'd recommend doing the following:
Restarting all the project instances to reboot everything. This is a good catch-all option that can often help resolve a lot of issues.
Get in touch with our support team either via the chat widget on the right hand side of this page, or via our [email protected] email address.