Go: sync.RWMutex internals and usage explained

5 min readJul 6, 2021

How does sync.RWMutex work?

sync.RWMutex helps synchronize write operations without synchronizing read operations, while ensuring that there are no active read operations when there is a write operation in progress.

Code blocks within the region enclosed by rwm.Lock() and rwm.Unlock() (henceforth referred to as “writable-region” within this article), where rwm is a object of type sync.RWMutex , ensures that they execute one after the other without their execution overlapping with each other.

But a code block within the region enclosed by the calls rwm.RLock() and rwm.RUnlock() (henceforth referred to as “readable-region” within this article) do not force the other regions enclosed by the same calls to execute synchronously. That is, different threads executing code enclosed by these calls can run concurrently. But the readable-region guarantees that no concurrently executed writable-region overlaps with its region.

Example usage:

Code 1: sync.RWMutex usage example

For a more stricter separation of roles, to prevent certain objects from using rwm.Lock() and rwm.Unlock() by only restricting them to the use of rwm.RLock() and rwm.RUnlock() , the rwm.RLocker() can be used. It returns an instance of the Locker interface providing access to only the methods RLock() and RUnlock() (by wrapping them as Lock() and Unlock(), which are the only interface methods declared in Locker)without granting them access to the writer lock.

The above example can be rewritten with rwm.RLocker() as follows:

Code 2: sync.RWMutex usage example modified with RLocker()

The internals of sync.RWMutex explained

The structure used by sync.RWMutex is given below:

Code 2: sync.RWMutex structure

Brief algorithms describing the methods defined by sync.RWMutex are given below. Locks usually need to fail if they are released more than the number of times they were acquired. But the algorithms below do not include the pathways to those checks for simplicity.

RLock():

Atomically adds one to readerCount
If the readCount updated in the previous step was less than zero, try acquiring semaphore readerSem or wait if unsuccessful, until it succeeds

RUnlock():

Atomically subtracts one from readerCount
If the value of readerCount immediately after the update in (1) was less than zero, atomically subtract one from readerWait
If the value of readerWait recorded immediately after the update at (2) was equal to zero, release writerSem semaphore

Lock():

Acquire lock w
Atomically subtract rwmutexMaxReaders from readerCount (note: rwmutexMaxReaders is set to 1 << 30)
If the value of readerCount was not zero immediately before the update in (2) (which signifies that there are non-zero number of reader locks acquired), atomically add the value of readerCount recorded immediately before the update in (2) to readerWait
If the value of readerWait immediately after the update at step (3) was not zero, try acquiring the semaphore writerSem or wait if unsuccessful, until it succeeds

Unlock():

Atomically add rwmutexMaxReaders to readerCount
Release the semaphore readerSem for the number of times denoted by the value of readerCount that was recorded immediately after the update in (1)
Release the lock w

All operations between the calls to Lock() and Unlock() are fully synchronized with every other goroutines that call the same pair of methods. This ensures that only one write happens at a time.

The operations between RLock() and RUnlock() need not be synchronized, but new calls to RLock() are made to wait when followed by a call to Lock(). In step (1) in RLock(), the updated value will only be less than zero if Lock() had subtracted rwmutexMaxReaders (which is a fairly large value, statically defined) from readerCount in step (2) of Lock().

Every call to RLock() atomically adds one to readerCount. But the call to Lock() subtracts a large enough value from readerCount that regardless of the number of calls to RLock(), readerCount will always be negative (in a realistic setting). Thus all calls to RLock() that follows Lock() will wait (step (2) of RLock()) until Unlock() is called; which atomically adds rwmutexMaxReaders to readerCount making the value positive again. Immediately after this add operation completes, the count that we get will be the number of goroutines waiting on RLock() .

Adding rwmutexMaxReaders to readerCount and subtracting rwmutexMaxReaders from readerCount acts as a signaling mechanism, which respectively communicates to each following call of RLock() if they should block until there is a call to Unlock() or if they must not block at all.

The moment step (1) of the Unlock() operation completes, every RLock() that follows becomes non-blocking. At this point, the only goroutines waiting on RLock() are the ones that invoked the read-lock after the execution of step (2) of Lock() and before the execution of step (1) of Unlock(). The value of readerCount recorded immediately after the update in step (1) of Unlock() tells the precise number of RLock() calls waiting on the semaphore readerSem. Let’s call this value n. Then it follows that there are n read-locks waiting on readerSem and Unlock() must release all the n semaphores to let them continue their execution.

The call to Lock() (step 3 and 4) blocks until the previously initiated read operations guarded by RLock() completes. Therefore, it was necessary to keep a count on the number of active read operations to later wait for the equal number of calls to the RUnlock() method until the first call to Lock() can become non-blocking. Steps 3 and 4 in Lock() first checks if there were any pending writes at the time of updating readerCount in step (2), and atomically assigns that value to readerWait .

RUnlock() has to undo the addition to readerCount made by the RLock() calls. And therefore, atomically subtracts one from it. But if Lock() had been called before the call to RUnlock() , it must also decrement one from readerWait atomically. If readerWait becomes zero, it must unblock Lock() by releasing its semaphore writerSem . readerWait becoming zero signifies that the RUnlock() that which was called belonged to the last active read-locked block, and this call then chooses the slow path that involves unblocking the waiting Lock() .

The benefits of using sync.RWMutex over sync.Mutex

sync.RWMutex does not block reads unless there is a call to write, therefore, making it a more favored choice to use when accessing a resource. RLock() and RUnlock() use atomic operations to maintain a count to their calls and only use a semaphore when they take the slow path of having to block until a write finishes. Or unblock a Lock() call, when it is the last active read operation invoking the call to RUnlock() before the start of a write operation. This reduces the lock contention frequency and improves performance.

Drawbacks of using sync.RWMutex

The use of atomic.AddInt32(&rw.readerCount, 1) and atomic.AddInt32(&rw.readerCount, -1) in RLock() and RUnlock() respectively causes cache-contention which degrades performance, with frequent calls to them. And in response to this the Go authors needed to add sync.Map to Go1.9, which does not use sync.RWMutex to synchronize access to a map. This improved the performance in a lot of other places in the standard library as well, as they had originally been using the built-in map guarded by sync.RWMutex wherever it was required.

Dear reader,

I am actively looking for a full time job/remote contract job related to Go or any other technology. I write highly scalable microservices, and I can also help you with implementing reliable AI/ML solutions as well. Please email me here sreramk360@gmail.com if you are interested.

Also, please feel free to email me for any reason, other than business as well!

Thanking you,
Sreram K

Or let’s connect on LinkedIn: https://www.linkedin.com/in/k-sreram-a04a90b7/