Adding Locking to the Linux-Kernel Memory Model [LWN.net]

March 12, 2019

This article was contributed by Jade Alglave, Will Deacon, Boqun Feng, David Howells, Daniel Lustig, Luc Maranget, Paul E. McKenney, Andrea Parri, Nicholas Piggin, Alan Stern, Akira Yokosawa, and Peter Zijlstra

Introduction

The Linux-kernel memory model (LKMM) (described here and here) was accepted into the Linux kernel in early 2018 and has seen continued development since. This article describes the addition of locking to LKMM.

This article is organized as follows:

Why Model Locking?
Modeling Locking
Multiple Locks
External Ordering Guarantees
Adding Locking to LKMM
Conclusions

This is followed by the inalienable answers to the quick quizzes.

Why Model Locking?

Do we really need to model locking? Perhaps the facilities that already exist within LKMM can be used to emulate locking well enough for our purposes. In theory, the operation of acquiring a lock can be described as little more than an atomic acquire-exchange repeated until it sees that the lock was previously unclaimed, and the operation of releasing a lock can be described as a simple release-store.

Unfortunately, although LKMM already supports xchg_acquire() and smp_store_release(), herd does not model loops. We therefore try omitting the spinloops from the emulation, keeping the atomic exchanges and tests, as follows:

Litmus Test #1

  1 C C-SB+l-o-o-u+l-o-o-u-IF
  2
  3 {
  4 }
  5
  6 P0(int *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9   int r2;
 10
 11   r2 = xchg_acquire(sl, 1);
 12   if (r2 == 0) {
 13     WRITE_ONCE(*x0, 1);
 14     r1 = READ_ONCE(*x1);
 15     smp_store_release(sl, 0);
 16   }
 17 }
 18
 19 P1(int *sl, int *x0, int *x1)
 20 {
 21   int r1;
 22   int r2;
 23
 24   r2 = xchg_acquire(sl, 1);
 25   if (r2 == 0) {
 26     WRITE_ONCE(*x1, 1);
 27     r1 = READ_ONCE(*x0);
 28     smp_store_release(sl, 0);
 29   }
 30 }
 31
 32 exists (0:r1=0 /\ 1:r1=0)

Line 11 uses xchg_acquire() to atomically set the lock word to the value 1, and if the return value is zero, P0() has successfully acquired the lock. In that case, lines 13 and 14 execute the critical section and line 15 uses smp_store_release() to store a zero, thus releasing the lock. Likewise for P1().

Not surprisingly, the missing spinloops cause this approach to yield undesired results:

Outcome for Litmus Test #1 (linux-kernel model)

 1 Test C-SB+l-o-o-u+l-o-o-u-IF Allowed
 2 States 3
 3 0:r1=0; 1:r1=0;
 4 0:r1=0; 1:r1=1;
 5 0:r1=1; 1:r1=0;
 6 Ok
 7 Witnesses
 8 Positive: 2 Negative: 2
 9 Condition exists (0:r1=0 /\ 1:r1=0)
10 Observation C-SB+l-o-o-u+l-o-o-u-IF Sometimes 2 2
11 Time C-SB+l-o-o-u+l-o-o-u-IF 0.05
12 Hash=d74dd8f6462fa76c8939c645e232a3cc

The final state on line 3 clearly should not happen. After all, with real locking both P0() and P1() would eventually acquire the lock, and the second to do so would have a non-zero final value for its instance of r1. This state appears because either (although not both) of Litmus Test #1's emulated lock acquisitions might fail, and in the absence of spinloops there will be no retries. Thus, if P0() succeeds in acquiring the lock but P1()'s attempt occurs before P0() releases it, then P0()'s read from x1 will return zero and P1() won't read x0 at all, leaving the default initial value of zero in 1:r1. This kind of execution results in the state shown on line 3 above.

It is therefore necessary to indicate to herd that only executions in which both P0() and P1() acquire the lock are to be considered. One way to do this is to augment the exists clause:

Litmus Test #2

  1 C C-SB+l-o-o-u+l-o-o-u-IFE
  2
  3 {
  4 }
  5
  6 P0(int *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9   int r2;
 10
 11   r2 = xchg_acquire(sl, 1);
 12   if (r2 == 0) {
 13     WRITE_ONCE(*x0, 1);
 14     r1 = READ_ONCE(*x1);
 15     smp_store_release(sl, 0);
 16   }
 17 }
 18
 19 P1(int *sl, int *x0, int *x1)
 20 {
 21   int r1;
 22   int r2;
 23
 24   r2 = xchg_acquire(sl, 1);
 25   if (r2 == 0) {
 26     WRITE_ONCE(*x1, 1);
 27     r1 = READ_ONCE(*x0);
 28     smp_store_release(sl, 0);
 29   }
 30 }
 31
 32 exists (0:r1=0 /\ 1:r1=0 /\ 0:r2=0 /\ 1:r2=0)

This gives the expected results:

Outcome for Litmus Test #2 (linux-kernel model)

 1 Test C-SB+l-o-o-u+l-o-o-u-IFE Allowed
 2 States 4
 3 0:r1=0; 0:r2=0; 1:r1=0; 1:r2=1;
 4 0:r1=0; 0:r2=0; 1:r1=1; 1:r2=0;
 5 0:r1=0; 0:r2=1; 1:r1=0; 1:r2=0;
 6 0:r1=1; 0:r2=0; 1:r1=0; 1:r2=0;
 7 No
 8 Witnesses
 9 Positive: 0 Negative: 4
10 Condition exists (0:r1=0 /\ 1:r1=0 /\ 0:r2=0 /\ 1:r2=0)
11 Observation C-SB+l-o-o-u+l-o-o-u-IFE Never 0 4
12 Time C-SB+l-o-o-u+l-o-o-u-IFE 0.07
13 Hash=4764f76de9c6b1696d4279e872de5a19

However, this augmented exists clause makes the if statements pointless: Any execution in which either P0() or P1() fails to acquire the lock is ignored anyway. There is therefore no harm in simplifying the litmus test by removing the if statements:

Litmus Test #3

  1 C C-SB+l-o-o-u+l-o-o-u-XE
  2
  3 {
  4 }
  5
  6 P0(int *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9   int r2;
 10
 11   r2 = xchg_acquire(sl, 1);
 12   WRITE_ONCE(*x0, 1);
 13   r1 = READ_ONCE(*x1);
 14   smp_store_release(sl, 0);
 15 }
 16
 17 P1(int *sl, int *x0, int *x1)
 18 {
 19   int r1;
 20   int r2;
 21
 22   r2 = xchg_acquire(sl, 1);
 23   WRITE_ONCE(*x1, 1);
 24   r1 = READ_ONCE(*x0);
 25   smp_store_release(sl, 0);
 26 }
 27
 28 exists (0:r1=0 /\ 0:r2=0 /\ 1:r1=0 /\ 1:r2=0)

As can be seen when running this model, this exists clause cannot be satisfied, as expected:

Outcome for Litmus Test #3 (linux-kernel model)

 1 Test C-SB+l-o-o-u+l-o-o-u-XE Allowed
 2 States 10
 3 0:r1=0; 0:r2=0; 1:r1=0; 1:r2=1;
 4 0:r1=0; 0:r2=0; 1:r1=1; 1:r2=0;
 5 0:r1=0; 0:r2=0; 1:r1=1; 1:r2=1;
 6 0:r1=0; 0:r2=1; 1:r1=0; 1:r2=0;
 7 0:r1=0; 0:r2=1; 1:r1=1; 1:r2=0;
 8 0:r1=1; 0:r2=0; 1:r1=0; 1:r2=0;
 9 0:r1=1; 0:r2=0; 1:r1=0; 1:r2=1;
10 0:r1=1; 0:r2=0; 1:r1=1; 1:r2=1;
11 0:r1=1; 0:r2=1; 1:r1=0; 1:r2=0;
12 0:r1=1; 0:r2=1; 1:r1=1; 1:r2=0;
13 No
14 Witnesses
15 Positive: 0 Negative: 18
16 Condition exists (0:r1=0 /\ 0:r2=0 /\ 1:r1=0 /\ 1:r2=0)
17 Observation C-SB+l-o-o-u+l-o-o-u-XE Never 0 18
18 Time C-SB+l-o-o-u+l-o-o-u-XE 0.09
19 Hash=3a3b6ac1d46c9d2aa0cd327ca6a37620

Although this use (or perhaps more accurately, abuse) of the exists clause works, it forces herd to fully and wastefully evaluate executions in which one of the locks was not acquired. We can speed things up by instead using a filter clause for the lock-related checks:

Litmus Test #4

  1 C C-SB+l-o-o-u+l-o-o-u-XF
  2
  3 {
  4 }
  5
  6 P0(int *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9   int r2;
 10
 11   r2 = xchg_acquire(sl, 1);
 12   WRITE_ONCE(*x0, 1);
 13   r1 = READ_ONCE(*x1);
 14   smp_store_release(sl, 0);
 15 }
 16
 17 P1(int *sl, int *x0, int *x1)
 18 {
 19   int r1;
 20   int r2;
 21
 22   r2 = xchg_acquire(sl, 1);
 23   WRITE_ONCE(*x1, 1);
 24   r1 = READ_ONCE(*x0);
 25   smp_store_release(sl, 0);
 26 }
 27
 28 filter (0:r2=0 /\ 1:r2=0)
 29 exists (0:r1=0 /\ 1:r1=0)

This gets a more-terse result in significantly less time because the irrelevant outcomes have been discarded:

Outcome for Litmus Test #4 (linux-kernel model)

 1 Test C-SB+l-o-o-u+l-o-o-u-XF Allowed
 2 States 2
 3 0:r1=0; 1:r1=1;
 4 0:r1=1; 1:r1=0;
 5 No
 6 Witnesses
 7 Positive: 0 Negative: 2
 8 Condition exists (0:r1=0 /\ 1:r1=0)
 9 Observation C-SB+l-o-o-u+l-o-o-u-XF Never 0 2
10 Time C-SB+l-o-o-u+l-o-o-u-XF 0.06
11 Hash=d5f3e4dfa8b130a480224bd080c85536

A disadvantage of using xchg_acquire() is that every invocation of xchg_acquire() generates a write event, and herd's runtime is exponential in the number of write events. One alternative is cmpxchg_acquire(), which avoids generating write events in the case where the lock is not available:

Litmus Test #5

  1 C C-SB+l-o-o-u+l-o-o-u-CF
  2
  3 {
  4 }
  5
  6 P0(int *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9   int r2;
 10
 11   r2 = cmpxchg_acquire(sl, 0, 1);
 12   WRITE_ONCE(*x0, 1);
 13   r1 = READ_ONCE(*x1);
 14   smp_store_release(sl, 0);
 15 }
 16
 17 P1(int *sl, int *x0, int *x1)
 18 {
 19   int r1;
 20   int r2;
 21
 22   r2 = cmpxchg_acquire(sl, 0, 1);
 23   WRITE_ONCE(*x1, 1);
 24   r1 = READ_ONCE(*x0);
 25   smp_store_release(sl, 0);
 26 }
 27
 28 filter (0:r2=0 /\ 1:r2=0)
 29 exists (0:r1=0 /\ 1:r1=0)

This also gets the same result, which is reassuring:

Outcome for Litmus Test #5 (linux-kernel model)

 1 Test C-SB+l-o-o-u+l-o-o-u-CF Allowed
 2 States 2
 3 0:r1=0; 1:r1=1;
 4 0:r1=1; 1:r1=0;
 5 No
 6 Witnesses
 7 Positive: 0 Negative: 2
 8 Condition exists (0:r1=0 /\ 1:r1=0)
 9 Observation C-SB+l-o-o-u+l-o-o-u-CF Never 0 2
10 Time C-SB+l-o-o-u+l-o-o-u-CF 0.05
11 Hash=7fcfda414ad9b01b777cf22d18dc7d4e

Of course, these two-process models do not consume all that much CPU time, but the exponential nature of herd's analog to full state-space search means that things deteriorate quickly as processes are added:

# Processes	XE	CE	XF	CF
2	0.058	0.039	0.027	0.022
3	3.203	1.653	0.968	0.743
4	500.96	151.962	74.818	59.565

In this table, the “XE” column gives the runtime in seconds for xchg_acquire() using the exists clause, the “CE” column gives the runtime in seconds for cmpxchg_acquire() using the exists clause, the “XF” column gives the runtime in seconds for xchg_acquire() using the filter clause, and the “CF” column gives the runtime in seconds for cmpxchg_acquire() using the filter clause.

Quick Quiz 1: What do the litmus tests corresponding to the “CE” column look like?
Answer

Use of cmpxchg_acquire() and the filter clause outperforms the other three emulation options by a wide margin. However, 60 seconds is still a long time to wait. Besides, four-process RCU litmus tests complete in a few tens of milliseconds, which suggests that directly modeling locking might provide some serious order-of-magnitude performance improvements over emulation. In addition, modeling locking instead of emulating it would permit litmus tests to call the spin_lock() and spin_unlock() functions, greatly improving ease of use.

Quick Quiz 2: Why don't the numbers in the table match those in the “Time” outcome lines?
Answer

Modeling Locking

Given the performance results presented in the previous section, it should be no surprise that we chose to directly model locking. This choice results in much more readable litmus tests. For example, the following is equivalent to the tests in the preceding section:

Litmus Test #6

  1 C C-SB+l-o-o-u+l-o-o-u
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9
 10   spin_lock(sl);
 11   WRITE_ONCE(*x0, 1);
 12   r1 = READ_ONCE(*x1);
 13   spin_unlock(sl);
 14 }
 15
 16 P1(spinlock_t *sl, int *x0, int *x1)
 17 {
 18   int r1;
 19
 20   spin_lock(sl);
 21   WRITE_ONCE(*x1, 1);
 22   r1 = READ_ONCE(*x0);
 23   spin_unlock(sl);
 24 }
 25
 26 exists (0:r1=0 /\ 1:r1=0)

Reassuringly, this has the same outcome as its emulated equivalents:

Outcome for Litmus Test #6 (linux-kernel model)

 1 Test C-SB+l-o-o-u+l-o-o-u Allowed
 2 States 2
 3 0:r1=0; 1:r1=1;
 4 0:r1=1; 1:r1=0;
 5 No
 6 Witnesses
 7 Positive: 0 Negative: 2
 8 Condition exists (0:r1=0 /\ 1:r1=0)
 9 Observation C-SB+l-o-o-u+l-o-o-u Never 0 2
10 Time C-SB+l-o-o-u+l-o-o-u 0.01
11 Hash=fa27fa4e010e86daa0452ad764f5d545

Locking is straightforward in the sense that a great many people have a good intuitive understanding of it and are comfortable using locks. And an informal definition covering most use cases is quite simple: Any accesses executed while holding a given lock will see the effects of any accesses made during any previous critical section for that same lock, but will not see the effects of accesses in later critical sections. There are nevertheless some complications, including the possibility of multiple locks, as discussed in the next section.

Multiple Locks

Mutual exclusion requires consistent use of locking, so it should be no surprise that using two different locks can be problematic:

Litmus Test #7

  1 C C-SB+l0-o-o-u0+l1-o-o-u1
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl0, int *x0, int *x1)
  7 {
  8   int r1;
  9
 10   spin_lock(sl0);
 11   WRITE_ONCE(*x0, 1);
 12   r1 = READ_ONCE(*x1);
 13   spin_unlock(sl0);
 14 }
 15
 16 P1(spinlock_t *sl1, int *x0, int *x1)
 17 {
 18   int r1;
 19
 20   spin_lock(sl1);
 21   WRITE_ONCE(*x1, 1);
 22   r1 = READ_ONCE(*x0);
 23   spin_unlock(sl1);
 24 }
 25
 26 exists (0:r1=0 /\ 1:r1=0)

And, as expected, this litmus test does not forbid the cycle:

Outcome for Litmus Test #7 (linux-kernel model)

 1 Test C-SB+l0-o-o-u0+l1-o-o-u1 Allowed
 2 States 4
 3 0:r1=0; 1:r1=0;
 4 0:r1=0; 1:r1=1;
 5 0:r1=1; 1:r1=0;
 6 0:r1=1; 1:r1=1;
 7 Ok
 8 Witnesses
 9 Positive: 1 Negative: 3
10 Condition exists (0:r1=0 /\ 1:r1=0)
11 Observation C-SB+l0-o-o-u0+l1-o-o-u1 Sometimes 1 3
12 Time C-SB+l0-o-o-u0+l1-o-o-u1 0.01
13 Hash=a55f2bae093da399bdf7cacd553fa9f9

Self-deadlock is just as bad an idea in LKMM as it is in real life:

Litmus Test #8

  1 C C-SB+l-l-o-o-u-u+l-l-o-o-u-u
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9
 10   spin_lock(sl);
 11   spin_lock(sl);
 12   WRITE_ONCE(*x0, 1);
 13   r1 = READ_ONCE(*x1);
 14   spin_unlock(sl);
 15   spin_unlock(sl);
 16 }
 17
 18 P1(spinlock_t *sl, int *x0, int *x1)
 19 {
 20   int r1;
 21
 22   spin_lock(sl);
 23   spin_lock(sl);
 24   WRITE_ONCE(*x1, 1);
 25   r1 = READ_ONCE(*x0);
 26   spin_unlock(sl);
 27   spin_unlock(sl);
 28 }
 29
 30 exists (0:r1=0 /\ 1:r1=0)

This results in no executions at all! Every possible execution deadlocks and thus cannot complete:

Outcome for Litmus Test #8 (linux-kernel model)

 1 Test C-SB+l-l-o-o-u-u+l-l-o-o-u-u Allowed
 2 States 0
 3 No
 4 Witnesses
 5 Positive: 0 Negative: 0
 6 Condition exists (0:r1=0 /\ 1:r1=0)
 7 Observation C-SB+l-l-o-o-u-u+l-l-o-o-u-u Never 0 0
 8 Time C-SB+l-l-o-o-u-u+l-l-o-o-u-u 0.01
 9 Hash=c65707ce9201b97c75ff028bd0446632

However, LKMM is not a replacement for lockdep. If there is a way for an execution to complete, it will be cheerfully reported, and executions that fail to complete will be equally as cheerfully discarded. For example, consider this ABBA locking scenario:

Litmus Test #9

  1 C C-SB+l1-l0-o-o-u0-u1+l0-l1-o-o-u1-u0
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
  7 {
  8   int r1;
  9
 10   spin_lock(sl1);
 11   spin_lock(sl0);
 12   WRITE_ONCE(*x0, 1);
 13   r1 = READ_ONCE(*x1);
 14   spin_unlock(sl0);
 15   spin_unlock(sl1);
 16 }
 17
 18 P1(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
 19 {
 20   int r1;
 21
 22   spin_lock(sl0);
 23   spin_lock(sl1);
 24   WRITE_ONCE(*x1, 1);
 25   r1 = READ_ONCE(*x0);
 26   spin_unlock(sl1);
 27   spin_unlock(sl0);
 28 }
 29
 30 exists (0:r1=0 /\ 1:r1=0)

Although this clearly is prone to deadlock, LKMM looks at the bright side and only reports those executions that do manage to complete:

Outcome for Litmus Test #9 (linux-kernel model)

 1 Test C-SB+l1-l0-o-o-u0-u1+l0-l1-o-o-u1-u0 Allowed
 2 States 2
 3 0:r1=0; 1:r1=1;
 4 0:r1=1; 1:r1=0;
 5 No
 6 Witnesses
 7 Positive: 0 Negative: 2
 8 Condition exists (0:r1=0 /\ 1:r1=0)
 9 Observation C-SB+l1-l0-o-o-u0-u1+l0-l1-o-o-u1-u0 Never 0 2
10 Time C-SB+l1-l0-o-o-u0-u1+l0-l1-o-o-u1-u0 0.05
11 Hash=fdfcdf743c97d9fc2dbb0f67f1b74f69

Again, LKMM is not a replacement for lockdep.

However, it is eminently possible and highly advised to use multiple locks in a more disciplined fashion, for example, by dedicating a separate lock for each variable:

Litmus Test #10

  1 C C-SB+l0-o-u0-l1-o-u1+l1-o-u1-l0-o-u0
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
  7 {
  8   int r1;
  9
 10   spin_lock(sl0);
 11   WRITE_ONCE(*x0, 1);
 12   spin_unlock(sl0);
 13   spin_lock(sl1);
 14   r1 = READ_ONCE(*x1);
 15   spin_unlock(sl1);
 16 }
 17
 18 P1(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
 19 {
 20   int r1;
 21
 22   spin_lock(sl1);
 23   WRITE_ONCE(*x1, 1);
 24   spin_unlock(sl1);
 25   spin_lock(sl0);
 26   r1 = READ_ONCE(*x0);
 27   spin_unlock(sl0);
 28 }
 29
 30 exists (0:r1=0 /\ 1:r1=0)

This approach is known to restore sequential consistency, which is corroborated by LKMM:

Outcome for Litmus Test #10 (linux-kernel model)

 1 Test C-SB+l0-o-u0-l1-o-u1+l1-o-u1-l0-o-u0 Allowed
 2 States 3
 3 0:r1=0; 1:r1=1;
 4 0:r1=1; 1:r1=0;
 5 0:r1=1; 1:r1=1;
 6 No
 7 Witnesses
 8 Positive: 0 Negative: 3
 9 Condition exists (0:r1=0 /\ 1:r1=0)
10 Observation C-SB+l0-o-u0-l1-o-u1+l1-o-u1-l0-o-u0 Never 0 3
11 Time C-SB+l0-o-u0-l1-o-u1+l1-o-u1-l0-o-u0 0.04
12 Hash=a5a35cc34ad44927b93bb363ed6f0be6

Another complication is the question of what guarantees locking provides to code not holding any locks, a topic taken up by the next section.

External Ordering Guarantees

In a perfect (or at least a more easily analyzed) world, all accesses to all objects protected by a given lock would be carried out while holding that lock. Unfortunately, this dubious form of perfection is sometimes ruled out by performance and deadlock considerations. In some such cases, it is necessary to define the ordering guarantees that locking provides to code not holding the lock in question.

And it does turn out that locking can in some cases order accesses outside of lock-based critical sections. For example, consider this variation of Litmus Test #6:

Litmus Test #11

  1 C C-SB+o-l-o-u+l-o-u-o
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9
 10   WRITE_ONCE(*x0, 1);
 11   spin_lock(sl);
 12   r1 = READ_ONCE(*x1);
 13   spin_unlock(sl);
 14 }
 15
 16 P1(spinlock_t *sl, int *x0, int *x1)
 17 {
 18   int r1;
 19
 20   spin_lock(sl);
 21   WRITE_ONCE(*x1, 1);
 22   spin_unlock(sl);
 23   r1 = READ_ONCE(*x0);
 24 }
 25
 26 exists (0:r1=0 /\ 1:r1=0)

The accesses to x0 have been moved out of their respective critical sections. Nevertheless:

Outcome for Litmus Test #11 (linux-kernel model)

 1 Test C-SB+o-l-o-u+l-o-u-o Allowed
 2 States 3
 3 0:r1=0; 1:r1=1;
 4 0:r1=1; 1:r1=0;
 5 0:r1=1; 1:r1=1;
 6 No
 7 Witnesses
 8 Positive: 0 Negative: 3
 9 Condition exists (0:r1=0 /\ 1:r1=0)
10 Observation C-SB+o-l-o-u+l-o-u-o Never 0 3
11 Time C-SB+o-l-o-u+l-o-u-o 0.02
12 Hash=c70b4405bd1942e7ad6fec45a59981f5

The cycle is still forbidden because ordering against a prior critical section applies not only to that critical section, but also to all accesses preceding it in the same process. The ordering provided by spin_unlock() on line 13 therefore applies to line 10. Similarly, ordering against a later critical section applies not only to that critical section, but also to all accesses following it in the same process. The ordering provided by spin_lock() on line 20 therefore applies to line 23.

Lock-based ordering can in some cases also apply to processes that never did acquire the lock. For example, the following litmus test checks whether a pair of reads from different variables in successive lock-based critical sections are seen as ordered by another process not holding the lock:

Litmus Test #12

  1 C C-lock-RR-3
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0, int *x2)
  7 {
  8   int *r1;
  9
 10   spin_lock(sl);
 11   r1 = READ_ONCE(*x0);
 12   WRITE_ONCE(*x2, 1);
 13   spin_unlock(sl);
 14 }
 15
 16 P1(spinlock_t *sl, int *x1, int *x2)
 17 {
 18   int *r1;
 19   int *r2;
 20
 21   spin_lock(sl);
 22   r1 = READ_ONCE(*x1);
 23   r2 = READ_ONCE(*x2);
 24   spin_unlock(sl);
 25 }
 26
 27 P2(int *x0, int *x1)
 28 {
 29   int r1;
 30
 31   WRITE_ONCE(*x1, 1);
 32   smp_mb();
 33   WRITE_ONCE(*x0, 1);
 34 }
 35
 36 exists (0:r1=1 /\ 1:r1=0 /\ 1:r2=1)

The two reads are on lines 11 and 22. The write on line 12 and the read on line 23 verify that P0()'s critical section precedes that of P1() (exists clause 1:r2=1). P2() is the external observer that checks the order of the two reads: If the read on line 11 sees the value written by line 33 (exists clause 0:r1=1) but the read on line 22 does not see the value written by line 31 (exists clause 1:r1=0), then locking is failing to order the reads from the viewpoint of an external observer not holding the lock.

Outcome for Litmus Test #12 (linux-kernel model)

 1 Test C-lock-RR-3 Allowed
 2 States 7
 3 0:r1=0; 1:r1=0; 1:r2=0;
 4 0:r1=0; 1:r1=0; 1:r2=1;
 5 0:r1=0; 1:r1=1; 1:r2=0;
 6 0:r1=0; 1:r1=1; 1:r2=1;
 7 0:r1=1; 1:r1=0; 1:r2=0;
 8 0:r1=1; 1:r1=1; 1:r2=0;
 9 0:r1=1; 1:r1=1; 1:r2=1;
10 No
11 Witnesses
12 Positive: 0 Negative: 7
13 Condition exists (0:r1=1 /\ 1:r1=0 /\ 1:r2=1)
14 Observation C-lock-RR-3 Never 0 7
15 Time C-lock-RR-3 0.03
16 Hash=a86b085d58b19425cd99c36046f0b8fb

The exists clause is not satisfied, so successive lock-based critical sections for the same lock do order reads.

The following litmus test checks for external-observable ordering of prior reads against later writes:

Litmus Test #13

  1 C C-lock-RW-3
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0, int *x1)
  7 {
  8   int *r1;
  9   int *r2;
 10
 11   spin_lock(sl);
 12   r1 = READ_ONCE(*x0);
 13   r2 = READ_ONCE(*x1);
 14   spin_unlock(sl);
 15 }
 16
 17 P1(spinlock_t *sl, int *x0, int *x1)
 18 {
 19   spin_lock(sl);
 20   WRITE_ONCE(*x1, 1);
 21   spin_unlock(sl);
 22 }
 23
 24 P2(int *x0, int *x1)
 25 {
 26   int r1;
 27
 28   r1 = READ_ONCE(*x1);
 29   smp_mb();
 30   WRITE_ONCE(*x0, 1);
 31 }
 32
 33 exists (0:r1=1 /\ 0:r2=0 /\ 2:r1=1)

The read is on line 12 and the write on line 20. The read on line 13 verifies that P0()'s critical section precedes that of P1() (exists clause 0:r2=0). P2() is again the external observer that checks the order of the prior read and later write: If the read on line 12 sees the value written by line 30 (exists clause 0:r1=1) and the read on line 28 sees the value written by line 20 (exists clause 2:r1=1), then locking is failing to order the prior read and the later write from the viewpoint of an external observer not holding the lock.

Outcome for Litmus Test #13 (linux-kernel model)

 1 Test C-lock-RW-3 Allowed
 2 States 7
 3 0:r1=0; 0:r2=0; 2:r1=0;
 4 0:r1=0; 0:r2=0; 2:r1=1;
 5 0:r1=0; 0:r2=1; 2:r1=0;
 6 0:r1=0; 0:r2=1; 2:r1=1;
 7 0:r1=1; 0:r2=0; 2:r1=0;
 8 0:r1=1; 0:r2=1; 2:r1=0;
 9 0:r1=1; 0:r2=1; 2:r1=1;
10 No
11 Witnesses
12 Positive: 0 Negative: 7
13 Condition exists (0:r1=1 /\ 0:r2=0 /\ 2:r1=1)
14 Observation C-lock-RW-3 Never 0 7
15 Time C-lock-RW-3 0.03
16 Hash=ac1d4b5f00c02c467b6b1659bc56871b

The exists clause is not satisfied, so successive lock-based critical sections for the same lock do order prior reads against later writes.

The following litmus test checks for external-observable ordering of prior writes against later reads:

Litmus Test #14

  1 C C-lock-WR-3
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0)
  7 {
  8   spin_lock(sl);
  9   WRITE_ONCE(*x0, 1);
 10   spin_unlock(sl);
 11 }
 12
 13 P1(spinlock_t *sl, int *x0, int *x1)
 14 {
 15   int r1;
 16   int r2;
 17
 18   spin_lock(sl);
 19   r1 = READ_ONCE(*x0);
 20   r2 = READ_ONCE(*x1);
 21   spin_unlock(sl);
 22 }
 23
 24 P2(int *x0, int *x1)
 25 {
 26   int r1;
 27
 28   WRITE_ONCE(*x1, 1);
 29   smp_mb();
 30   r1 = READ_ONCE(*x0);
 31 }
 32
 33 exists (1:r1=1 /\ 1:r2=0 /\ 2:r1=0)

The write is on line 9 and the read on line 20. The read on line 19 verifies that P0()'s critical section precedes that of P1() (exists clause 1:r1=1). P2() is once again the external observer that checks the order of the prior write and later read: If the read on line 30 sees the value written by line 9 (exists clause 2:r1=0) and the read on line 20 does not see the value written by line 28 (exists clause 1:r2=0), then locking is failing to order the prior write and the later read from the viewpoint of an external observer not holding the lock.

Outcome for Litmus Test #14 (linux-kernel model)

 1 Test C-lock-WR-3 Allowed
 2 States 8
 3 1:r1=0; 1:r2=0; 2:r1=0;
 4 1:r1=0; 1:r2=0; 2:r1=1;
 5 1:r1=0; 1:r2=1; 2:r1=0;
 6 1:r1=0; 1:r2=1; 2:r1=1;
 7 1:r1=1; 1:r2=0; 2:r1=0;
 8 1:r1=1; 1:r2=0; 2:r1=1;
 9 1:r1=1; 1:r2=1; 2:r1=0;
10 1:r1=1; 1:r2=1; 2:r1=1;
11 Ok
12 Witnesses
13 Positive: 1 Negative: 7
14 Condition exists (1:r1=1 /\ 1:r2=0 /\ 2:r1=0)
15 Observation C-lock-WR-3 Sometimes 1 7
16 Time C-lock-WR-3 0.03
17 Hash=6e790e9b8c7f389cd72d5c6e1ed824d8

The exists clause is satisfied, so successive lock-based critical sections for the same lock do not necessarily order prior writes against later reads.

Quick Quiz 3: But what if I need a lock release and later acquisition to order prior writes against later reads?
Answer

Finally, the following litmus test checks for external-observable ordering of writes to different variables in successive lock-based critical sections:

Litmus Test #15

  1 C C-lock-WW-3
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0)
  7 {
  8   spin_lock(sl);
  9   WRITE_ONCE(*x0, 1);
 10   spin_unlock(sl);
 11 }
 12
 13 P1(spinlock_t *sl, int *x0, int *x1)
 14 {
 15   int *r1;
 16
 17   spin_lock(sl);
 18   r1 = READ_ONCE(*x0);
 19   WRITE_ONCE(*x1, 1);
 20   spin_unlock(sl);
 21 }
 22
 23 P2(int *x0, int *x1)
 24 {
 25   int r1;
 26   int r2;
 27
 28   r1 = READ_ONCE(*x1);
 29   smp_mb();
 30   r2 = READ_ONCE(*x0);
 31 }
 32
 33 exists (1:r1=1 /\ 2:r1=1 /\ 2:r2=0)

The writes are on lines 9 and 19. The read on line 18 verifies that P0()'s critical section precedes that of P1() (exists clause 1:r1=1). P2() is yet again the external observer that checks the order of the two writes: If the read on line 28 sees the value written by line 19 (exists clause 2:r1=1) and the read on line 30 does not see the value written by line 9 (exists clause 2:r2=0), then locking is failing to order the two writes from the viewpoint of an external observer not holding the lock.

Outcome for Litmus Test #15 (linux-kernel model)

 1 Test C-lock-WW-3 Allowed
 2 States 7
 3 1:r1=0; 2:r1=0; 2:r2=0;
 4 1:r1=0; 2:r1=0; 2:r2=1;
 5 1:r1=0; 2:r1=1; 2:r2=0;
 6 1:r1=0; 2:r1=1; 2:r2=1;
 7 1:r1=1; 2:r1=0; 2:r2=0;
 8 1:r1=1; 2:r1=0; 2:r2=1;
 9 1:r1=1; 2:r1=1; 2:r2=1;
10 No
11 Witnesses
12 Positive: 0 Negative: 7
13 Condition exists (1:r1=1 /\ 2:r1=1 /\ 2:r2=0)
14 Observation C-lock-WW-3 Never 0 7
15 Time C-lock-WW-3 0.02
16 Hash=aeb873bf8def6e07ef92cc9e5edeb312

The exists clause is not satisfied, so successive lock-based critical sections for the same lock do order writes. However, this test demonstrates a difference between the modeling of locks (and of their Linux-kernel guarantees) and emulated locking, which can be seen by converting Litmus Test #15 to emulated form and running it through herd. (Spoiler: In contrast with Litmus Test #15, the emulated form's exists clause will be satisfied.) This difference is due to weak hardware and a desire for slightly stronger external ordering from locking primitives.

In short, from the viewpoint of a CPU not holding the lock, writes in prior critical sections are not ordered against reads in later critical sections, but everything else is ordered. This not-universally-loved exception to the ordering rule allows hardware to maintain its store-buffer optimization through both spin_unlock() and spin_lock() calls.

Quick Quiz 4: Does a spin_unlock() followed by a spin_lock() within the same process also provide similar ordering?
Answer

Adding Locking to LKMM

The locking model is implemented by lock.cat, which is as follows, give or take comments:

 1 include "cross.cat"
 2 let RL = try RL with emptyset
 3 let RU = try RU with emptyset
 4 let LF = LF | RL
 5 let ALL-LOCKS = LKR | LKW | UL | LF | RU
 6 flag ~empty [M \ IW] ; loc ; [ALL-LOCKS] as mixed-lock-accesses
 7 let lk-rmw = ([LKR] ; po-loc ; [LKW]) \ (po ; po)
 8 let rmw = rmw | lk-rmw
 9 flag ~empty LKW \ range(lk-rmw) as unpaired-LKW
10 flag ~empty LKR \ domain(lk-rmw) as unpaired-LKR
11 empty ([LKW] ; po-loc ; [LKR]) \ (po-loc ; [UL] ; po-loc) as lock-nest
12 flag ~empty [FW] ; loc ; [ALL-LOCKS] as lock-final
13 let R = R | LKR | LF | RU
14 let W = W | LKW
15 let Release = Release | UL
16 let Acquire = Acquire | LKR
17 let critical = ([LKW] ; po-loc ; [UL]) \ (po-loc ; [LKW | UL] ; po-loc)
18 flag ~empty UL \ range(critical) as unmatched-unlock
19 let UNMATCHED-LKW = LKW \ domain(critical)
20 empty ([UNMATCHED-LKW] ; loc ; [UNMATCHED-LKW]) \ id as unmatched-locks
21 let rfi-lf = ([LKW] ; po-loc ; [LF]) \ ([LKW] ; po-loc ; [UL] ; po-loc)
22 let all-possible-rfe-lf =
23   let possible-rfe-lf r =
24     let pair-to-relation p = p ++ 0
25     in map pair-to-relation ((LKW * {r}) & loc & ext)
26   in map possible-rfe-lf (LF \ range(rfi-lf))
27 with rfe-lf from cross(all-possible-rfe-lf)
28 let rf-lf = rfe-lf | rfi-lf
29 let rfi-ru = ([UL] ; po-loc ; [RU]) \ ([UL] ; po-loc ; [LKW] ; po-loc)
30 let all-possible-rfe-ru =
31   let possible-rfe-ru r =
32     let pair-to-relation p = p ++ 0
33     in map pair-to-relation (((UL | IW) * {r}) & loc & ext)
34   in map possible-rfe-ru RU
35 with rfe-ru from cross(all-possible-rfe-ru)
36 let rf-ru = rfe-ru | rfi-ru
37 let rf = rf | rf-lf | rf-ru
38 let co0 = co0 | ([IW] ; loc ; [LKW]) |
39   (([LKW] ; loc ; [UNMATCHED-LKW]) \ [UNMATCHED-LKW])
40 include "cos-opt.cat"
41 let W = W | UL
42 let M = R | W
43 let co = (co | critical | (critical^-1 ; co))+
44 let coe = co & ext
45 let coi = co & int
46 let rf = rf | ([IW | UL] ; singlestep(co) ; lk-rmw^-1)
47 let rfe = rf & ext
48 let rfi = rf & int
49 let fr = rf^-1 ; co
50 let fre = fr & ext
51 let fri = fr & int
52 show co,rf,fr

Quick Quiz 5: Yikes!!! The locking model is quite a bit larger than that of RCU! What gives?
Answer

Line 1 pulls in common definitions. Lines 2-5 gather basic locking events, with RL being a spin_is_locked() that returns true, RU being a spin_is_locked() that returns false, LF being a failing spin_trylock() (and line 4 extends LF to include cases where spin_is_locked() returns true), UL being a spin_unlock(), and LKR and LKW being the read and write portions, respectively, of a successful lock acquisition. Thus ALL-LOCKS is the set of all lock-related events. Line 6 complains if one of the locking primitives is ever invoked on a variable that is used as something other than a lock.

Line 7 forms the set of relations linking the read and write portions of each successful lock-acquisition operation, and line 8 pulls this set into the rmw set that covers other atomic operations. Lines 9 and 10 complain if there is a successful lock-acquisition operation that has a read portion but not a write portion and vice versa. Line 11 checks for self-deadlock due to double-acquiring a lock. Line 12 complains if a lock's value is tested as part of an exists clause.

Lines 13-16 pull locking events into herd's set of reads, writes, releases, and acquires, respectively.

Line 17 forms a relation linking each lock acquisition with the corresponding lock release and line 18 complains if there is an unmatched spin_unlock(). Line 19 then forms the set of unmatched spin_lock() and successful spin_trylock() operations, after which line 20 complains if there is more than one such unmatched lock acquisition for a given lock.

Quick Quiz 6: Why would anyone want to allow spin_lock() and successful spin_trylock() operations that don't have matching spin_unlock() operations???
Answer

Line 21 forms a relation linking spin_lock() calls to spin_trylock() calls within the ensuing critical section. Clearly, all such spin_trylock() calls must fail. Lines 22-27 form a relation linking spin_lock() calls to failing spin_trylock() calls in other processes. The point here is that for every failing spin_trylock() call, there must be a spin_lock() call that caused it to fail. This is a design choice: One could equally well allow spin_trylock() calls to spontaneously fail. (And the C11 standard in fact does exactly that, which means that there are a few lock-based algorithms that C11 is incapable of implementing.) Line 28 then forms a relation linking to any failing spin_trylock() call from the spin_lock() call that caused it to fail, regardless of whether or not the two calls were in the same process.

Line 29 forms a relation linking from a spin_unlock() to a subsequent spin_is_locked() call within that same process that returned false because of that spin_unlock(). Lines 30-35 then form a similar relation, but in the case where the spin_unlock() and spin_is_locked() calls are in different processes. Line 36 forms the union of these two relations, that is, linking from any spin_unlock() to all spin_is_locked() calls that returned false because of that spin_unlock(), regardless of which process these calls were in.

Finally, lines 37-52 add the lock-based ordering into the standard herd relations.

Additional changes were required to linux-kernel.cat, including the interactions of locking with smp_mb__after_spinlock() and smp_mb__after_unlock_lock() in the mb relation, and also the addition of locking primitives to the release-acquire mechanics via the po-unlock-rf-lock-po relation, which is in turn included into the ppo and cumul-fence relations.

So it is true that lock.cat is not the simplest example of cat code out there. However, the added complexity does have its advantages. For example, the following table shows the performance of the model compared to the four emulation approaches:

# Processes	XE	CE	XF	CF	Model
2	0.058	0.039	0.027	0.022	0.004
3	3.203	1.653	0.968	0.743	0.041
4	500.96	151.962	74.818	59.565	0.374
5					4.905

The lock.cat model, though considerably slower than the RCU model, has nevertheless delivered on the promise of multiple orders of magnitude performance improvement over emulation of locking.

Quick Quiz 7: Why not fill in the remaining values for the five-process row of the table?
Answer

Conclusions

We have demonstrated LKMM's versatility by adding support for locking. As before, we hope will be useful for education, concurrent design, and for inclusion in other tooling, especially given that locking is used very heavily within the Linux kernel.

Acknowledgments

We owe thanks to the LWN editors for their review of this document. We are also grateful to Jessica Murillo, Mark Figley, and Kara Todd for their support of this effort.

This work represents the views of the authors and does not necessarily represent the views of University College London, ARM Ltd., INRIA Paris, Amarula Solutions, Harvard University, Intel, Red Hat, Nvidia, or IBM Corporation.

Linux is a registered trademark of Linus Torvalds.

Other company, product, and service names may be trademarks or service marks of others.

Answers to Quick Quizzes

Quick Quiz 1: What do the litmus tests corresponding to the “CE” column look like?

Answer: Like this:

Litmus Test #16

  1 C C-SB+l-o-o-u+l-o-o-u-CE
  2
  3 {
  4 }
  5
  6 P0(int *sl, int *x0, int *x1)
  7 {
  8   int r1;
  9   int r2;
 10
 11   r2 = cmpxchg_acquire(sl, 0, 1);
 12   WRITE_ONCE(*x0, 1);
 13   r1 = READ_ONCE(*x1);
 14   smp_store_release(sl, 0);
 15 }
 16
 17 P1(int *sl, int *x0, int *x1)
 18 {
 19   int r1;
 20   int r2;
 21
 22   r2 = cmpxchg_acquire(sl, 0, 1);
 23   WRITE_ONCE(*x1, 1);
 24   r1 = READ_ONCE(*x0);
 25   smp_store_release(sl, 0);
 26 }
 27
 28 exists (0:r1=0 /\ 0:r2=0 /\ 1:r1=0 /\ 1:r2=0)

Back to Quick Quiz 1.

Quick Quiz 2: Why don't the numbers in the table match those in the “Time” outcome lines?

Answer: Because the tests generating the inline output were run on Paul's laptop while it was also running rcutorture. In contrast, the numbers in the table were collected using multiple runs on an idle system. Of course, as always, your mileage may vary.

Back to Quick Quiz 2.

Quick Quiz 3: But what if I need a lock release and later acquisition to order prior writes against later reads?

Answer: Then you should use smp_mb__after_unlock_lock(), like this:

Litmus Test #17

  1 C C-lock-WR-3
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0)
  7 {
  8   spin_lock(sl);
  9   WRITE_ONCE(*x0, 1);
 10   spin_unlock(sl);
 11 }
 12
 13 P1(spinlock_t *sl, int *x0, int *x1)
 14 {
 15   int r1;
 16   int r2;
 17
 18   spin_lock(sl);
 19   smp_mb__after_unlock_lock();
 20   r1 = READ_ONCE(*x0);
 21   r2 = READ_ONCE(*x1);
 22   spin_unlock(sl);
 23 }
 24
 25 P2(int *x0, int *x1)
 26 {
 27   int r1;
 28
 29   WRITE_ONCE(*x1, 1);
 30   smp_mb();
 31   r1 = READ_ONCE(*x0);
 32 }
 33
 34 exists (1:r1=1 /\ 1:r2=0 /\ 2:r1=0)

As you can see below, this provides the desired ordering.

Outcome for Litmus Test #17 (linux-kernel model)

 1 Test C-lock-WR-3 Allowed
 2 States 7
 3 1:r1=0; 1:r2=0; 2:r1=0;
 4 1:r1=0; 1:r2=0; 2:r1=1;
 5 1:r1=0; 1:r2=1; 2:r1=0;
 6 1:r1=0; 1:r2=1; 2:r1=1;
 7 1:r1=1; 1:r2=0; 2:r1=1;
 8 1:r1=1; 1:r2=1; 2:r1=0;
 9 1:r1=1; 1:r2=1; 2:r1=1;
10 No
11 Witnesses
12 Positive: 0 Negative: 7
13 Condition exists (1:r1=1 /\ 1:r2=0 /\ 2:r1=0)
14 Observation C-lock-WR-3 Never 0 7
15 Time C-lock-WR-3 0.02
16 Hash=67cef9d4ce6e7598ef50bc43d28afeca

The Linux-kernel RCU implementation makes heavy use of smp_mb__after_unlock_lock() in order to provide its grace-period memory-ordering guarantees.

Back to Quick Quiz 3.

Quick Quiz 4: Does a spin_unlock() followed by a spin_lock() within the same process also provide similar ordering?

Answer: Yes, assuming the spin_unlock() followed by a spin_lock() use the same lock. However, this ordering is not preserved when two different locks are used:

Litmus Test #18

  1 C C-lock-RW-2
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
  7 {
  8   int *r1;
  9
 10   spin_lock(sl0);
 11   r1 = READ_ONCE(*x0);
 12   spin_unlock(sl0);
 13   spin_lock(sl1);
 14   WRITE_ONCE(*x1, 1);
 15   spin_unlock(sl1);
 16 }
 17
 18 P1(int *x0, int *x1)
 19 {
 20   int r1;
 21
 22   r1 = READ_ONCE(*x1);
 23   smp_mb();
 24   WRITE_ONCE(*x0, 1);
 25 }
 26
 27 exists (0:r1=1 /\ 1:r1=1)

As you can see below, this fails to provide the desired ordering.

Outcome for Litmus Test #18 (linux-kernel model)

 1 Test C-lock-RW-2 Allowed
 2 States 4
 3 0:r1=0; 1:r1=0;
 4 0:r1=0; 1:r1=1;
 5 0:r1=1; 1:r1=0;
 6 0:r1=1; 1:r1=1;
 7 Ok
 8 Witnesses
 9 Positive: 1 Negative: 3
10 Condition exists (0:r1=1 /\ 1:r1=1)
11 Observation C-lock-RW-2 Sometimes 1 3
12 Time C-lock-RW-2 0.01
13 Hash=d27e32a7292825a15eb5ce3ef9944740

That said, you can insert smp_mb__after_unlock_lock() after the second spin_lock() to enforce the ordering.

Back to Quick Quiz 4.

Quick Quiz 5: Yikes!!! The locking model is quite a bit larger than that of RCU! What gives?

Answer: Several things.

First, simpler models are possible, it is just that they are significantly slower. (But please feel free to construct a correct model that is both simpler and at least as fast!)

Second, there is no rcu_read_trylock() for the RCU model to deal with, but locking must model spin_trylock().

Third, the RCU model does not implement rcu_read_lock_held(), but locking does implement spin_is_locked(). Note that rcu_read_lock_held() tends to be used in cases where RCU readers span many functions, so there does not appear to be much point in modeling it, given the limited size of LKMM litmus tests.

Fourth, unlike spin_lock() and spin_unlock(), rcu_read_lock() and rcu_read_unlock() provide no ordering except in conjunction with synchronize_rcu().

Fifth, unlike RCU, locking affects many of herd's basic relations, including co, fr, and rf relations, and thus must provide code to properly update those relations.

Sixth and finally, just you wait until we add reader-writer locking!!! Or until you add it! ;–)

Back to Quick Quiz 5.

Quick Quiz 6: Why would anyone want to allow spin_lock() and successful spin_trylock() operations that don't have matching spin_unlock() operations???

Answer: For one thing, this sort of thing really does occur in real code, if only by mistake.

Perhaps more usefully, unmatched spin_lock() calls allow litmus-test writers limited control over the order of lock acquisition without the need to add memory accesses and terms to the exists or filter clauses. Specifically, the process that acquires a lock but does not release it must necessarily be the last process to acquire that lock. For example, Litmus Test #12 could be simplified as follows:

Litmus Test #19

  1 C C-lock-RR-3
  2
  3 {
  4 }
  5
  6 P0(spinlock_t *sl, int *x0)
  7 {
  8   int *r1;
  9
 10   spin_lock(sl);
 11   r1 = READ_ONCE(*x0);
 12   spin_unlock(sl);
 13 }
 14
 15 P1(spinlock_t *sl, int *x1)
 16 {
 17   int *r1;
 18
 19   spin_lock(sl);
 20   r1 = READ_ONCE(*x1);
 21 }
 22
 23 P2(int *x0, int *x1)
 24 {
 25   int r1;
 26
 27   WRITE_ONCE(*x1, 1);
 28   smp_mb();
 29   WRITE_ONCE(*x0, 1);
 30 }
 31
 32 exists (0:r1=1 /\ 1:r1=0)

This litmus test has the same outcome as its fully unlocked counterpart, but without the need for the x2 variable:

Outcome for Litmus Test #19 (linux-kernel model)

 1 Test C-lock-RR-3 Allowed
 2 States 3
 3 0:r1=0; 1:r1=0;
 4 0:r1=0; 1:r1=1;
 5 0:r1=1; 1:r1=1;
 6 No
 7 Witnesses
 8 Positive: 0 Negative: 3
 9 Condition exists (0:r1=1 /\ 1:r1=0)
10 Observation C-lock-RR-3 Never 0 3
11 Time C-lock-RR-3 0.01
12 Hash=45ba9be3cdc348a2620169b040843b60

For larger litmus tests, this sort of simplification could greatly reduce herd's runtime.

Unmatched spin_lock() calls are also useful in debugging litmus tests, for example, when trying to see whether a litmus test can deadlock while holding a lock.

Back to Quick Quiz 6.

Quick Quiz 7: Why not fill in the remaining values for the five-process row of the table?

Answer: Impatience. But please feel free to run the full set of tests, send us the results, and we will happily update the table.

Back to Quick Quiz 7.