Race condition when scheduling commands with withoutOverlapping() using Redis/Memcached

### Laravel Version

10.46.0

### PHP Version

8.1.27

### Database Driver & Version

_No response_

### Description

We've identified an issue whereby in multi-server environments where commands are scheduled to run using `withoutOverlapping()` and `onOneServer()`, the commands will very occasionally not run at all on any server (about 0.2% of the time in our environment). Not a problem for things running every minute where we first spotted the problem, but more impactful when things running a few times or once per day are skipped.

Just going to start with outlining my understanding of how scheduled commands get run when `withoutOverlapping()` and `onOneServer()` are used:
- Commands due to run are iterated over ([ScheduleRunCommand::handle, L121](https://github.com/laravel/framework/blob/10.x/src/Illuminate/Console/Scheduling/ScheduleRunCommand.php#L121))
- Each command is checked to ensure filters pass before proceeding ([ScheduleRunCommand::handle, L122](https://github.com/laravel/framework/blob/10.x/src/Illuminate/Console/Scheduling/ScheduleRunCommand.php#L122))
  -  Because we used `withoutOverlapping()` a check is added to the filters to ensure the event mutex lock does not currently exist ([Event::withoutOverlapping, L713](https://github.com/laravel/framework/blob/10.x/src/Illuminate/Console/Scheduling/Event.php#L713))
- For each command that passes the filters (i.e. considered to be not currently running), `serverShouldRun()` is then called to check for and take a scheduling lock for the current server to run the command for the current hour/minute ([ScheduleRunCommand::runSingleServerEvent, L156](https://github.com/laravel/framework/blob/10.x/src/Illuminate/Console/Scheduling/ScheduleRunCommand.php#L156))
- Once confirmed that the event is not currently running and the server has the lock to run the event for the current schedule run, `Event::run` is called ([Event::run, L217](https://github.com/laravel/framework/blob/10.x/src/Illuminate/Console/Scheduling/Event.php#L217))
- As a final check before the event runs, `shouldSkipDueToOverlapping()` is called which checks whether `withoutOverlapping()` was used and checks that the event lock can be obtained ([Event::shouldSkipDueToOverlapping, L237](https://github.com/laravel/framework/blob/10.x/src/Illuminate/Console/Scheduling/Event.php#L237))

Just for clarity, there's two locks in play;
- The event lock which is used by `withoutOverlapping()` to identify whether a command is already being run.
- The scheduling lock used by `onOneSever()` to check whether a command has already been run on a server for the current schedule run (identified by the current hour and minute.)

The problem stems from the way in which the event mutex lock is checked for existence, which was introduced in PR #45963.

Originally, the CacheEventMutex would just check if the cache had an entry for the lock ([CacheEventMutex::exists, L69](https://github.com/laravel/framework/blob/10.x/src/Illuminate/Console/Scheduling/CacheEventMutex.php#L69)) which worked fine, but PR #45963 changed that behavior. For cache stores that implement the LockProvider interface (Redis, Memcached) it now attempts to get the lock and then immediately releases it again (but this is not done atomically.)

So the race condition looks something like this if Server B's system clock is just slightly slower than Server A:

| Time  | Server A | Server B |
| --- | --- | --- |
| 12:00:00 | Checking that Event A is not currently running, takes event lock for Event A | | 
| 12:00:01 | Releases event lock for Event A, determining that it is not currently running so should be run  | |
| 12:00:02 | Takes scheduling lock for Event A to run on Server A | |
| 12:00:03 | | Checking that Event A is not currently running, takes event lock for Event A  |
| 12:00:04 | Attempts to take the event lock for Event A to run it, fails because Server B has the lock, then skips due to overlapping | |
| 12:00:05 | | Releases event lock for Event A, determining that it is not currently running so should be run   |
| 12:00:06 | | Attempts to take scheduling lock for Event A to run on Server B, fails as the scheduling lock was already taken by Server A, skips assuming command was run on another server already | |

So in this scenario the scheduled command is never run, there's no error message, and the `ScheduledTaskSkipped` doesn't even get fired because the call to `run()` just returns if `shouldSkipDueToOverlapping()` returns `true`.

In terms of a solution, it might make sense to be able to check for the existence of a lock without taking and releasing it, maybe a new abstract `exists()` on Lock (https://github.com/laravel/framework/blob/10.x/src/Illuminate/Cache/Lock.php) which CacheEventMutex could use and RedisLock/MemcachedLock could implement their own atomic means of checking for lock existence, although that probably constitutes a backwards-compatibility break. Happy to put together a PR for whatever the agreed solution is.

As a workaround, I've just bound an older version of CacheEventMutex to `Illuminate\Console\Scheduling\EventMutex` to override the changes implemented in #45963 which has solved the issue in our project for now.

### Steps To Reproduce

Given its a race condition its very hard to reproduce, but hopefully there's enough details in the description to understand what's going on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition when scheduling commands with withoutOverlapping() using Redis/Memcached #50330

Laravel Version

PHP Version

Database Driver & Version

Description

Steps To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time	Server A	Server B
12:00:00	Checking that Event A is not currently running, takes event lock for Event A
12:00:01	Releases event lock for Event A, determining that it is not currently running so should be run
12:00:02	Takes scheduling lock for Event A to run on Server A
12:00:03		Checking that Event A is not currently running, takes event lock for Event A
12:00:04	Attempts to take the event lock for Event A to run it, fails because Server B has the lock, then skips due to overlapping
12:00:05		Releases event lock for Event A, determining that it is not currently running so should be run
12:00:06		Attempts to take scheduling lock for Event A to run on Server B, fails as the scheduling lock was already taken by Server A, skips assuming command was run on another server already

Race condition when scheduling commands with withoutOverlapping() using Redis/Memcached #50330

Description

Laravel Version

PHP Version

Database Driver & Version

Description

Steps To Reproduce

Activity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions