March 12, 2019
This article was contributed by
Jade Alglave, Will Deacon, Boqun Feng, David Howells, Daniel
Lustig, Luc Maranget, Paul E. McKenney, Andrea Parri, Nicholas
Piggin, Alan Stern, Akira Yokosawa, and Peter Zijlstra
Introduction
The Linux-kernel memory model (LKMM) (described
here and
here)
was accepted into the Linux kernel in early 2018 and has seen
continued development since.
This article describes the addition of locking to LKMM.
This article is organized as follows:
- Why Model Locking?
- Modeling Locking
- Multiple Locks
- External Ordering Guarantees
- Adding Locking to LKMM
- Conclusions
This is followed by the inalienable
answers to the quick quizzes.
Do we really need to model locking?
Perhaps the facilities that already exist within LKMM can be used to
emulate locking well enough for our purposes.
In theory, the operation of acquiring a lock can be described as
little more than an atomic acquire-exchange repeated until it sees
that the lock was previously unclaimed, and the operation of releasing
a lock can be described as a simple release-store.
Unfortunately, although LKMM already supports xchg_acquire()
and smp_store_release(), herd does not model loops.
We therefore try omitting the spinloops from the emulation,
keeping the atomic exchanges and tests, as follows:
Litmus Test #1
1 C C-SB+l-o-o-u+l-o-o-u-IF
2
3 {
4 }
5
6 P0(int *sl, int *x0, int *x1)
7 {
8 int r1;
9 int r2;
10
11 r2 = xchg_acquire(sl, 1);
12 if (r2 == 0) {
13 WRITE_ONCE(*x0, 1);
14 r1 = READ_ONCE(*x1);
15 smp_store_release(sl, 0);
16 }
17 }
18
19 P1(int *sl, int *x0, int *x1)
20 {
21 int r1;
22 int r2;
23
24 r2 = xchg_acquire(sl, 1);
25 if (r2 == 0) {
26 WRITE_ONCE(*x1, 1);
27 r1 = READ_ONCE(*x0);
28 smp_store_release(sl, 0);
29 }
30 }
31
32 exists (0:r1=0 /\ 1:r1=0)
Line 11 uses xchg_acquire() to atomically set the lock word
to the value 1, and if the return value is zero, P0() has
successfully acquired the lock.
In that case, lines 13 and 14 execute the critical section
and line 15 uses smp_store_release() to store
a zero, thus releasing the lock.
Likewise for P1().
Not surprisingly, the missing spinloops cause this approach to yield
undesired results:
Outcome for Litmus Test #1 (linux-kernel model)
1 Test C-SB+l-o-o-u+l-o-o-u-IF Allowed
2 States 3
3 0:r1=0; 1:r1=0;
4 0:r1=0; 1:r1=1;
5 0:r1=1; 1:r1=0;
6 Ok
7 Witnesses
8 Positive: 2 Negative: 2
9 Condition exists (0:r1=0 /\ 1:r1=0)
10 Observation C-SB+l-o-o-u+l-o-o-u-IF Sometimes 2 2
11 Time C-SB+l-o-o-u+l-o-o-u-IF 0.05
12 Hash=d74dd8f6462fa76c8939c645e232a3cc
The final state on line 3 clearly should not happen.
After all, with real locking both P0() and P1()
would eventually acquire the lock,
and the second to do so would have a non-zero final value for its
instance of r1.
This state appears because either (although not both) of
Litmus Test #1's
emulated lock acquisitions might fail, and in the absence of spinloops
there will be no retries.
Thus, if P0() succeeds in acquiring the lock
but P1()'s attempt occurs before P0() releases it,
then P0()'s
read from x1 will return zero and P1() won't read
x0 at all, leaving the default initial value of zero in
1:r1.
This kind of execution results in the state shown on line 3 above.
It is therefore necessary to indicate to herd that only executions in
which both P0() and P1() acquire the lock are to be
considered.
One way to do this is to augment the exists clause:
Litmus Test #2
1 C C-SB+l-o-o-u+l-o-o-u-IFE
2
3 {
4 }
5
6 P0(int *sl, int *x0, int *x1)
7 {
8 int r1;
9 int r2;
10
11 r2 = xchg_acquire(sl, 1);
12 if (r2 == 0) {
13 WRITE_ONCE(*x0, 1);
14 r1 = READ_ONCE(*x1);
15 smp_store_release(sl, 0);
16 }
17 }
18
19 P1(int *sl, int *x0, int *x1)
20 {
21 int r1;
22 int r2;
23
24 r2 = xchg_acquire(sl, 1);
25 if (r2 == 0) {
26 WRITE_ONCE(*x1, 1);
27 r1 = READ_ONCE(*x0);
28 smp_store_release(sl, 0);
29 }
30 }
31
32 exists (0:r1=0 /\ 1:r1=0 /\ 0:r2=0 /\ 1:r2=0)
This gives the expected results:
Outcome for Litmus Test #2 (linux-kernel model)
1 Test C-SB+l-o-o-u+l-o-o-u-IFE Allowed
2 States 4
3 0:r1=0; 0:r2=0; 1:r1=0; 1:r2=1;
4 0:r1=0; 0:r2=0; 1:r1=1; 1:r2=0;
5 0:r1=0; 0:r2=1; 1:r1=0; 1:r2=0;
6 0:r1=1; 0:r2=0; 1:r1=0; 1:r2=0;
7 No
8 Witnesses
9 Positive: 0 Negative: 4
10 Condition exists (0:r1=0 /\ 1:r1=0 /\ 0:r2=0 /\ 1:r2=0)
11 Observation C-SB+l-o-o-u+l-o-o-u-IFE Never 0 4
12 Time C-SB+l-o-o-u+l-o-o-u-IFE 0.07
13 Hash=4764f76de9c6b1696d4279e872de5a19
However, this augmented exists clause makes the if
statements pointless: Any execution in which either P0()
or P1() fails to acquire the lock is ignored anyway.
There is therefore no harm in simplifying the litmus test by removing
the if statements:
Litmus Test #3
1 C C-SB+l-o-o-u+l-o-o-u-XE
2
3 {
4 }
5
6 P0(int *sl, int *x0, int *x1)
7 {
8 int r1;
9 int r2;
10
11 r2 = xchg_acquire(sl, 1);
12 WRITE_ONCE(*x0, 1);
13 r1 = READ_ONCE(*x1);
14 smp_store_release(sl, 0);
15 }
16
17 P1(int *sl, int *x0, int *x1)
18 {
19 int r1;
20 int r2;
21
22 r2 = xchg_acquire(sl, 1);
23 WRITE_ONCE(*x1, 1);
24 r1 = READ_ONCE(*x0);
25 smp_store_release(sl, 0);
26 }
27
28 exists (0:r1=0 /\ 0:r2=0 /\ 1:r1=0 /\ 1:r2=0)
As can be seen when running this model, this exists clause cannot
be satisfied, as expected:
Outcome for Litmus Test #3 (linux-kernel model)
1 Test C-SB+l-o-o-u+l-o-o-u-XE Allowed
2 States 10
3 0:r1=0; 0:r2=0; 1:r1=0; 1:r2=1;
4 0:r1=0; 0:r2=0; 1:r1=1; 1:r2=0;
5 0:r1=0; 0:r2=0; 1:r1=1; 1:r2=1;
6 0:r1=0; 0:r2=1; 1:r1=0; 1:r2=0;
7 0:r1=0; 0:r2=1; 1:r1=1; 1:r2=0;
8 0:r1=1; 0:r2=0; 1:r1=0; 1:r2=0;
9 0:r1=1; 0:r2=0; 1:r1=0; 1:r2=1;
10 0:r1=1; 0:r2=0; 1:r1=1; 1:r2=1;
11 0:r1=1; 0:r2=1; 1:r1=0; 1:r2=0;
12 0:r1=1; 0:r2=1; 1:r1=1; 1:r2=0;
13 No
14 Witnesses
15 Positive: 0 Negative: 18
16 Condition exists (0:r1=0 /\ 0:r2=0 /\ 1:r1=0 /\ 1:r2=0)
17 Observation C-SB+l-o-o-u+l-o-o-u-XE Never 0 18
18 Time C-SB+l-o-o-u+l-o-o-u-XE 0.09
19 Hash=3a3b6ac1d46c9d2aa0cd327ca6a37620
Although this use (or perhaps more accurately, abuse) of the exists
clause works, it forces herd to fully and wastefully evaluate executions
in which one of the locks was not acquired.
We can speed things up by instead using a filter clause for
the lock-related checks:
Litmus Test #4
1 C C-SB+l-o-o-u+l-o-o-u-XF
2
3 {
4 }
5
6 P0(int *sl, int *x0, int *x1)
7 {
8 int r1;
9 int r2;
10
11 r2 = xchg_acquire(sl, 1);
12 WRITE_ONCE(*x0, 1);
13 r1 = READ_ONCE(*x1);
14 smp_store_release(sl, 0);
15 }
16
17 P1(int *sl, int *x0, int *x1)
18 {
19 int r1;
20 int r2;
21
22 r2 = xchg_acquire(sl, 1);
23 WRITE_ONCE(*x1, 1);
24 r1 = READ_ONCE(*x0);
25 smp_store_release(sl, 0);
26 }
27
28 filter (0:r2=0 /\ 1:r2=0)
29 exists (0:r1=0 /\ 1:r1=0)
This gets a more-terse result in significantly less time because the
irrelevant outcomes have been discarded:
Outcome for Litmus Test #4 (linux-kernel model)
1 Test C-SB+l-o-o-u+l-o-o-u-XF Allowed
2 States 2
3 0:r1=0; 1:r1=1;
4 0:r1=1; 1:r1=0;
5 No
6 Witnesses
7 Positive: 0 Negative: 2
8 Condition exists (0:r1=0 /\ 1:r1=0)
9 Observation C-SB+l-o-o-u+l-o-o-u-XF Never 0 2
10 Time C-SB+l-o-o-u+l-o-o-u-XF 0.06
11 Hash=d5f3e4dfa8b130a480224bd080c85536
A disadvantage of using xchg_acquire() is that every invocation
of xchg_acquire() generates a write event, and herd's runtime
is exponential in the number of write events.
One alternative is cmpxchg_acquire(), which avoids generating
write events in the case where the lock is not available:
Litmus Test #5
1 C C-SB+l-o-o-u+l-o-o-u-CF
2
3 {
4 }
5
6 P0(int *sl, int *x0, int *x1)
7 {
8 int r1;
9 int r2;
10
11 r2 = cmpxchg_acquire(sl, 0, 1);
12 WRITE_ONCE(*x0, 1);
13 r1 = READ_ONCE(*x1);
14 smp_store_release(sl, 0);
15 }
16
17 P1(int *sl, int *x0, int *x1)
18 {
19 int r1;
20 int r2;
21
22 r2 = cmpxchg_acquire(sl, 0, 1);
23 WRITE_ONCE(*x1, 1);
24 r1 = READ_ONCE(*x0);
25 smp_store_release(sl, 0);
26 }
27
28 filter (0:r2=0 /\ 1:r2=0)
29 exists (0:r1=0 /\ 1:r1=0)
This also gets the same result, which is reassuring:
Outcome for Litmus Test #5 (linux-kernel model)
1 Test C-SB+l-o-o-u+l-o-o-u-CF Allowed
2 States 2
3 0:r1=0; 1:r1=1;
4 0:r1=1; 1:r1=0;
5 No
6 Witnesses
7 Positive: 0 Negative: 2
8 Condition exists (0:r1=0 /\ 1:r1=0)
9 Observation C-SB+l-o-o-u+l-o-o-u-CF Never 0 2
10 Time C-SB+l-o-o-u+l-o-o-u-CF 0.05
11 Hash=7fcfda414ad9b01b777cf22d18dc7d4e
Of course, these two-process models do not consume all that much CPU time,
but the exponential nature of herd's analog to full state-space search
means that things deteriorate quickly as processes are added:
# Processes |
XE |
CE |
XF |
CF |
2 |
0.058 |
0.039 |
0.027 |
0.022 |
3 |
3.203 |
1.653 |
0.968 |
0.743 |
4 |
500.96 |
151.962 |
74.818 |
59.565 |
In this table,
the “XE” column gives the runtime in seconds for
xchg_acquire() using the exists clause,
the “CE” column gives the runtime in seconds for
cmpxchg_acquire() using the exists clause,
the “XF” column gives the runtime in seconds for
xchg_acquire() using the filter clause, and
the “CF” column gives the runtime in seconds for
cmpxchg_acquire() using the filter clause.
Quick Quiz 1:
What do the litmus tests corresponding to the “CE”
column look like?
Answer
Use of cmpxchg_acquire() and the filter clause
outperforms the other three emulation options by a wide margin.
However, 60 seconds is still a long time to wait.
Besides, four-process RCU litmus tests complete in a few tens of
milliseconds, which suggests that directly modeling locking might provide
some serious order-of-magnitude performance improvements over emulation.
In addition, modeling locking instead of emulating it would permit
litmus tests to call the spin_lock() and spin_unlock()
functions, greatly improving ease of use.
Quick Quiz 2:
Why don't the numbers in the table match those in the
“Time” outcome lines?
Answer
Given the performance results presented in the previous section,
it should be no surprise that we chose to directly model locking.
This choice results in much more readable litmus tests. For example,
the following is equivalent to the tests in the preceding section:
Litmus Test #6
1 C C-SB+l-o-o-u+l-o-o-u
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0, int *x1)
7 {
8 int r1;
9
10 spin_lock(sl);
11 WRITE_ONCE(*x0, 1);
12 r1 = READ_ONCE(*x1);
13 spin_unlock(sl);
14 }
15
16 P1(spinlock_t *sl, int *x0, int *x1)
17 {
18 int r1;
19
20 spin_lock(sl);
21 WRITE_ONCE(*x1, 1);
22 r1 = READ_ONCE(*x0);
23 spin_unlock(sl);
24 }
25
26 exists (0:r1=0 /\ 1:r1=0)
Reassuringly, this has the same outcome as its emulated equivalents:
Outcome for Litmus Test #6 (linux-kernel model)
1 Test C-SB+l-o-o-u+l-o-o-u Allowed
2 States 2
3 0:r1=0; 1:r1=1;
4 0:r1=1; 1:r1=0;
5 No
6 Witnesses
7 Positive: 0 Negative: 2
8 Condition exists (0:r1=0 /\ 1:r1=0)
9 Observation C-SB+l-o-o-u+l-o-o-u Never 0 2
10 Time C-SB+l-o-o-u+l-o-o-u 0.01
11 Hash=fa27fa4e010e86daa0452ad764f5d545
Locking is straightforward in the sense that a great many people have
a good intuitive understanding of it and are comfortable using locks.
And an informal definition covering most use cases is quite simple:
Any accesses executed while holding a given lock will see the effects
of any accesses made during any previous critical section for that
same lock, but will not see the effects of accesses in later critical
sections.
There are nevertheless some complications, including the possibility
of multiple locks, as discussed in the next section.
Mutual exclusion requires consistent use of locking, so it should
be no surprise that using two different locks can be problematic:
Litmus Test #7
1 C C-SB+l0-o-o-u0+l1-o-o-u1
2
3 {
4 }
5
6 P0(spinlock_t *sl0, int *x0, int *x1)
7 {
8 int r1;
9
10 spin_lock(sl0);
11 WRITE_ONCE(*x0, 1);
12 r1 = READ_ONCE(*x1);
13 spin_unlock(sl0);
14 }
15
16 P1(spinlock_t *sl1, int *x0, int *x1)
17 {
18 int r1;
19
20 spin_lock(sl1);
21 WRITE_ONCE(*x1, 1);
22 r1 = READ_ONCE(*x0);
23 spin_unlock(sl1);
24 }
25
26 exists (0:r1=0 /\ 1:r1=0)
And, as expected, this litmus test does not forbid the cycle:
Outcome for Litmus Test #7 (linux-kernel model)
1 Test C-SB+l0-o-o-u0+l1-o-o-u1 Allowed
2 States 4
3 0:r1=0; 1:r1=0;
4 0:r1=0; 1:r1=1;
5 0:r1=1; 1:r1=0;
6 0:r1=1; 1:r1=1;
7 Ok
8 Witnesses
9 Positive: 1 Negative: 3
10 Condition exists (0:r1=0 /\ 1:r1=0)
11 Observation C-SB+l0-o-o-u0+l1-o-o-u1 Sometimes 1 3
12 Time C-SB+l0-o-o-u0+l1-o-o-u1 0.01
13 Hash=a55f2bae093da399bdf7cacd553fa9f9
Self-deadlock is just as bad an idea in LKMM as it is in real life:
Litmus Test #8
1 C C-SB+l-l-o-o-u-u+l-l-o-o-u-u
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0, int *x1)
7 {
8 int r1;
9
10 spin_lock(sl);
11 spin_lock(sl);
12 WRITE_ONCE(*x0, 1);
13 r1 = READ_ONCE(*x1);
14 spin_unlock(sl);
15 spin_unlock(sl);
16 }
17
18 P1(spinlock_t *sl, int *x0, int *x1)
19 {
20 int r1;
21
22 spin_lock(sl);
23 spin_lock(sl);
24 WRITE_ONCE(*x1, 1);
25 r1 = READ_ONCE(*x0);
26 spin_unlock(sl);
27 spin_unlock(sl);
28 }
29
30 exists (0:r1=0 /\ 1:r1=0)
This results in no executions at all!
Every possible execution deadlocks and thus cannot complete:
Outcome for Litmus Test #8 (linux-kernel model)
1 Test C-SB+l-l-o-o-u-u+l-l-o-o-u-u Allowed
2 States 0
3 No
4 Witnesses
5 Positive: 0 Negative: 0
6 Condition exists (0:r1=0 /\ 1:r1=0)
7 Observation C-SB+l-l-o-o-u-u+l-l-o-o-u-u Never 0 0
8 Time C-SB+l-l-o-o-u-u+l-l-o-o-u-u 0.01
9 Hash=c65707ce9201b97c75ff028bd0446632
However, LKMM is not a replacement for lockdep.
If there is a way for an execution to complete, it will be cheerfully
reported, and executions that fail to complete will be equally as
cheerfully discarded.
For example, consider this ABBA locking scenario:
Litmus Test #9
1 C C-SB+l1-l0-o-o-u0-u1+l0-l1-o-o-u1-u0
2
3 {
4 }
5
6 P0(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
7 {
8 int r1;
9
10 spin_lock(sl1);
11 spin_lock(sl0);
12 WRITE_ONCE(*x0, 1);
13 r1 = READ_ONCE(*x1);
14 spin_unlock(sl0);
15 spin_unlock(sl1);
16 }
17
18 P1(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
19 {
20 int r1;
21
22 spin_lock(sl0);
23 spin_lock(sl1);
24 WRITE_ONCE(*x1, 1);
25 r1 = READ_ONCE(*x0);
26 spin_unlock(sl1);
27 spin_unlock(sl0);
28 }
29
30 exists (0:r1=0 /\ 1:r1=0)
Although this clearly is prone to deadlock, LKMM looks at the bright
side and only reports those executions that do manage to complete:
Outcome for Litmus Test #9 (linux-kernel model)
1 Test C-SB+l1-l0-o-o-u0-u1+l0-l1-o-o-u1-u0 Allowed
2 States 2
3 0:r1=0; 1:r1=1;
4 0:r1=1; 1:r1=0;
5 No
6 Witnesses
7 Positive: 0 Negative: 2
8 Condition exists (0:r1=0 /\ 1:r1=0)
9 Observation C-SB+l1-l0-o-o-u0-u1+l0-l1-o-o-u1-u0 Never 0 2
10 Time C-SB+l1-l0-o-o-u0-u1+l0-l1-o-o-u1-u0 0.05
11 Hash=fdfcdf743c97d9fc2dbb0f67f1b74f69
Again, LKMM is not a replacement for lockdep.
However, it is eminently possible and highly advised to use multiple
locks in a more disciplined fashion, for example, by dedicating a
separate lock for each variable:
Litmus Test #10
1 C C-SB+l0-o-u0-l1-o-u1+l1-o-u1-l0-o-u0
2
3 {
4 }
5
6 P0(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
7 {
8 int r1;
9
10 spin_lock(sl0);
11 WRITE_ONCE(*x0, 1);
12 spin_unlock(sl0);
13 spin_lock(sl1);
14 r1 = READ_ONCE(*x1);
15 spin_unlock(sl1);
16 }
17
18 P1(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
19 {
20 int r1;
21
22 spin_lock(sl1);
23 WRITE_ONCE(*x1, 1);
24 spin_unlock(sl1);
25 spin_lock(sl0);
26 r1 = READ_ONCE(*x0);
27 spin_unlock(sl0);
28 }
29
30 exists (0:r1=0 /\ 1:r1=0)
This approach is known to restore sequential consistency, which is
corroborated by LKMM:
Outcome for Litmus Test #10 (linux-kernel model)
1 Test C-SB+l0-o-u0-l1-o-u1+l1-o-u1-l0-o-u0 Allowed
2 States 3
3 0:r1=0; 1:r1=1;
4 0:r1=1; 1:r1=0;
5 0:r1=1; 1:r1=1;
6 No
7 Witnesses
8 Positive: 0 Negative: 3
9 Condition exists (0:r1=0 /\ 1:r1=0)
10 Observation C-SB+l0-o-u0-l1-o-u1+l1-o-u1-l0-o-u0 Never 0 3
11 Time C-SB+l0-o-u0-l1-o-u1+l1-o-u1-l0-o-u0 0.04
12 Hash=a5a35cc34ad44927b93bb363ed6f0be6
Another complication is the question of what guarantees locking
provides to code not holding any locks, a topic taken up by the
next section.
In a perfect (or at least a more easily analyzed) world, all accesses to
all objects protected by a given lock would be carried out while holding
that lock.
Unfortunately, this dubious form of perfection is sometimes ruled out
by performance and deadlock considerations.
In some such cases, it is necessary to define the ordering guarantees
that locking provides to code not holding the lock in question.
And it does turn out that locking can in some cases order accesses outside
of lock-based critical sections.
For example, consider this variation of
Litmus Test #6:
Litmus Test #11
1 C C-SB+o-l-o-u+l-o-u-o
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0, int *x1)
7 {
8 int r1;
9
10 WRITE_ONCE(*x0, 1);
11 spin_lock(sl);
12 r1 = READ_ONCE(*x1);
13 spin_unlock(sl);
14 }
15
16 P1(spinlock_t *sl, int *x0, int *x1)
17 {
18 int r1;
19
20 spin_lock(sl);
21 WRITE_ONCE(*x1, 1);
22 spin_unlock(sl);
23 r1 = READ_ONCE(*x0);
24 }
25
26 exists (0:r1=0 /\ 1:r1=0)
The accesses to x0 have been moved out of their respective
critical sections.
Nevertheless:
Outcome for Litmus Test #11 (linux-kernel model)
1 Test C-SB+o-l-o-u+l-o-u-o Allowed
2 States 3
3 0:r1=0; 1:r1=1;
4 0:r1=1; 1:r1=0;
5 0:r1=1; 1:r1=1;
6 No
7 Witnesses
8 Positive: 0 Negative: 3
9 Condition exists (0:r1=0 /\ 1:r1=0)
10 Observation C-SB+o-l-o-u+l-o-u-o Never 0 3
11 Time C-SB+o-l-o-u+l-o-u-o 0.02
12 Hash=c70b4405bd1942e7ad6fec45a59981f5
The cycle is still forbidden because ordering against a prior critical
section applies not only to that critical section, but also to all accesses
preceding it in the same process.
The ordering provided by spin_unlock() on line 13
therefore applies to line 10.
Similarly, ordering against a later critical section applies not only
to that critical section, but also to all accesses following it in
the same process.
The ordering provided by spin_lock() on line 20
therefore applies to line 23.
Lock-based ordering can in some cases also apply to processes that never
did acquire the lock.
For example, the following litmus test checks whether a pair of reads
from different variables in successive lock-based critical sections are
seen as ordered by another process not holding the lock:
Litmus Test #12
1 C C-lock-RR-3
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0, int *x2)
7 {
8 int *r1;
9
10 spin_lock(sl);
11 r1 = READ_ONCE(*x0);
12 WRITE_ONCE(*x2, 1);
13 spin_unlock(sl);
14 }
15
16 P1(spinlock_t *sl, int *x1, int *x2)
17 {
18 int *r1;
19 int *r2;
20
21 spin_lock(sl);
22 r1 = READ_ONCE(*x1);
23 r2 = READ_ONCE(*x2);
24 spin_unlock(sl);
25 }
26
27 P2(int *x0, int *x1)
28 {
29 int r1;
30
31 WRITE_ONCE(*x1, 1);
32 smp_mb();
33 WRITE_ONCE(*x0, 1);
34 }
35
36 exists (0:r1=1 /\ 1:r1=0 /\ 1:r2=1)
The two reads are on lines 11 and 22.
The write on line 12 and the read on line 23 verify
that P0()'s critical section precedes that of P1()
(exists clause 1:r2=1).
P2() is the external observer that checks the order of the two
reads: If the read on line 11 sees the value written by line 33
(exists clause 0:r1=1) but the read on line 22
does not see the value written by line 31 (exists clause
1:r1=0), then locking is failing to order the reads from the
viewpoint of an external observer not holding the lock.
Outcome for Litmus Test #12 (linux-kernel model)
1 Test C-lock-RR-3 Allowed
2 States 7
3 0:r1=0; 1:r1=0; 1:r2=0;
4 0:r1=0; 1:r1=0; 1:r2=1;
5 0:r1=0; 1:r1=1; 1:r2=0;
6 0:r1=0; 1:r1=1; 1:r2=1;
7 0:r1=1; 1:r1=0; 1:r2=0;
8 0:r1=1; 1:r1=1; 1:r2=0;
9 0:r1=1; 1:r1=1; 1:r2=1;
10 No
11 Witnesses
12 Positive: 0 Negative: 7
13 Condition exists (0:r1=1 /\ 1:r1=0 /\ 1:r2=1)
14 Observation C-lock-RR-3 Never 0 7
15 Time C-lock-RR-3 0.03
16 Hash=a86b085d58b19425cd99c36046f0b8fb
The exists clause is not satisfied, so successive lock-based
critical sections for the same lock do order reads.
The following litmus test checks for external-observable ordering of
prior reads against later writes:
Litmus Test #13
1 C C-lock-RW-3
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0, int *x1)
7 {
8 int *r1;
9 int *r2;
10
11 spin_lock(sl);
12 r1 = READ_ONCE(*x0);
13 r2 = READ_ONCE(*x1);
14 spin_unlock(sl);
15 }
16
17 P1(spinlock_t *sl, int *x0, int *x1)
18 {
19 spin_lock(sl);
20 WRITE_ONCE(*x1, 1);
21 spin_unlock(sl);
22 }
23
24 P2(int *x0, int *x1)
25 {
26 int r1;
27
28 r1 = READ_ONCE(*x1);
29 smp_mb();
30 WRITE_ONCE(*x0, 1);
31 }
32
33 exists (0:r1=1 /\ 0:r2=0 /\ 2:r1=1)
The read is on line 12 and the write on line 20.
The read on line 13 verifies that P0()'s critical section
precedes that of P1() (exists clause 0:r2=0).
P2() is again the external observer that checks the order of
the prior read and later write: If the read on line 12 sees the
value written by line 30 (exists clause 0:r1=1)
and the read on line 28 sees the value written by line 20
(exists clause 2:r1=1), then locking is failing
to order the prior read and the later write from the viewpoint of an
external observer not holding the lock.
Outcome for Litmus Test #13 (linux-kernel model)
1 Test C-lock-RW-3 Allowed
2 States 7
3 0:r1=0; 0:r2=0; 2:r1=0;
4 0:r1=0; 0:r2=0; 2:r1=1;
5 0:r1=0; 0:r2=1; 2:r1=0;
6 0:r1=0; 0:r2=1; 2:r1=1;
7 0:r1=1; 0:r2=0; 2:r1=0;
8 0:r1=1; 0:r2=1; 2:r1=0;
9 0:r1=1; 0:r2=1; 2:r1=1;
10 No
11 Witnesses
12 Positive: 0 Negative: 7
13 Condition exists (0:r1=1 /\ 0:r2=0 /\ 2:r1=1)
14 Observation C-lock-RW-3 Never 0 7
15 Time C-lock-RW-3 0.03
16 Hash=ac1d4b5f00c02c467b6b1659bc56871b
The exists clause is not satisfied, so successive lock-based
critical sections for the same lock do order prior reads against
later writes.
The following litmus test checks for external-observable ordering of
prior writes against later reads:
Litmus Test #14
1 C C-lock-WR-3
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0)
7 {
8 spin_lock(sl);
9 WRITE_ONCE(*x0, 1);
10 spin_unlock(sl);
11 }
12
13 P1(spinlock_t *sl, int *x0, int *x1)
14 {
15 int r1;
16 int r2;
17
18 spin_lock(sl);
19 r1 = READ_ONCE(*x0);
20 r2 = READ_ONCE(*x1);
21 spin_unlock(sl);
22 }
23
24 P2(int *x0, int *x1)
25 {
26 int r1;
27
28 WRITE_ONCE(*x1, 1);
29 smp_mb();
30 r1 = READ_ONCE(*x0);
31 }
32
33 exists (1:r1=1 /\ 1:r2=0 /\ 2:r1=0)
The write is on line 9 and the read on line 20.
The read on line 19 verifies that P0()'s critical section
precedes that of P1() (exists clause 1:r1=1).
P2() is once again the external observer that checks the order of
the prior write and later read: If the read on line 30 sees the
value written by line 9 (exists clause 2:r1=0)
and the read on line 20 does not see the value written by line 28
(exists clause 1:r2=0), then locking is failing
to order the prior write and the later read from the viewpoint of an
external observer not holding the lock.
Outcome for Litmus Test #14 (linux-kernel model)
1 Test C-lock-WR-3 Allowed
2 States 8
3 1:r1=0; 1:r2=0; 2:r1=0;
4 1:r1=0; 1:r2=0; 2:r1=1;
5 1:r1=0; 1:r2=1; 2:r1=0;
6 1:r1=0; 1:r2=1; 2:r1=1;
7 1:r1=1; 1:r2=0; 2:r1=0;
8 1:r1=1; 1:r2=0; 2:r1=1;
9 1:r1=1; 1:r2=1; 2:r1=0;
10 1:r1=1; 1:r2=1; 2:r1=1;
11 Ok
12 Witnesses
13 Positive: 1 Negative: 7
14 Condition exists (1:r1=1 /\ 1:r2=0 /\ 2:r1=0)
15 Observation C-lock-WR-3 Sometimes 1 7
16 Time C-lock-WR-3 0.03
17 Hash=6e790e9b8c7f389cd72d5c6e1ed824d8
The exists clause is satisfied, so successive lock-based
critical sections for the same lock do not necessarily order
prior writes against later reads.
Quick Quiz 3:
But what if I need a lock release and later acquisition to order
prior writes against later reads?
Answer
Finally, the following litmus test checks for external-observable ordering of
writes to different variables in successive lock-based critical sections:
Litmus Test #15
1 C C-lock-WW-3
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0)
7 {
8 spin_lock(sl);
9 WRITE_ONCE(*x0, 1);
10 spin_unlock(sl);
11 }
12
13 P1(spinlock_t *sl, int *x0, int *x1)
14 {
15 int *r1;
16
17 spin_lock(sl);
18 r1 = READ_ONCE(*x0);
19 WRITE_ONCE(*x1, 1);
20 spin_unlock(sl);
21 }
22
23 P2(int *x0, int *x1)
24 {
25 int r1;
26 int r2;
27
28 r1 = READ_ONCE(*x1);
29 smp_mb();
30 r2 = READ_ONCE(*x0);
31 }
32
33 exists (1:r1=1 /\ 2:r1=1 /\ 2:r2=0)
The writes are on lines 9 and 19.
The read on line 18 verifies that P0()'s critical section
precedes that of P1() (exists clause 1:r1=1).
P2() is yet again the external observer that checks the
order of the two writes: If the read on line 28 sees the value
written by line 19 (exists clause 2:r1=1) and
the read on line 30 does not see the value written by line 9
(exists clause 2:r2=0), then locking is failing to
order the two writes from the viewpoint of an external observer not
holding the lock.
Outcome for Litmus Test #15 (linux-kernel model)
1 Test C-lock-WW-3 Allowed
2 States 7
3 1:r1=0; 2:r1=0; 2:r2=0;
4 1:r1=0; 2:r1=0; 2:r2=1;
5 1:r1=0; 2:r1=1; 2:r2=0;
6 1:r1=0; 2:r1=1; 2:r2=1;
7 1:r1=1; 2:r1=0; 2:r2=0;
8 1:r1=1; 2:r1=0; 2:r2=1;
9 1:r1=1; 2:r1=1; 2:r2=1;
10 No
11 Witnesses
12 Positive: 0 Negative: 7
13 Condition exists (1:r1=1 /\ 2:r1=1 /\ 2:r2=0)
14 Observation C-lock-WW-3 Never 0 7
15 Time C-lock-WW-3 0.02
16 Hash=aeb873bf8def6e07ef92cc9e5edeb312
The exists clause is not satisfied, so successive lock-based
critical sections for the same lock do order writes.
However, this test demonstrates a difference between the modeling of
locks (and of their Linux-kernel guarantees) and emulated locking,
which can be seen by converting
Litmus Test #15
to emulated form
and running it through herd.
(Spoiler: In contrast with
Litmus Test #15,
the emulated form's exists clause will be satisfied.)
This difference is due to weak hardware and a desire for slightly stronger
external ordering from locking primitives.
In short, from the viewpoint of a CPU not holding the lock, writes in
prior critical sections are not ordered against reads in later critical
sections, but everything else is ordered.
This not-universally-loved exception to the ordering rule allows
hardware to maintain its store-buffer optimization through both
spin_unlock() and spin_lock() calls.
Quick Quiz 4:
Does a spin_unlock() followed by a spin_lock()
within the same process also provide similar ordering?
Answer
The locking model is implemented by
lock.cat,
which is as follows, give or take comments:
1 include "cross.cat"
2 let RL = try RL with emptyset
3 let RU = try RU with emptyset
4 let LF = LF | RL
5 let ALL-LOCKS = LKR | LKW | UL | LF | RU
6 flag ~empty [M \ IW] ; loc ; [ALL-LOCKS] as mixed-lock-accesses
7 let lk-rmw = ([LKR] ; po-loc ; [LKW]) \ (po ; po)
8 let rmw = rmw | lk-rmw
9 flag ~empty LKW \ range(lk-rmw) as unpaired-LKW
10 flag ~empty LKR \ domain(lk-rmw) as unpaired-LKR
11 empty ([LKW] ; po-loc ; [LKR]) \ (po-loc ; [UL] ; po-loc) as lock-nest
12 flag ~empty [FW] ; loc ; [ALL-LOCKS] as lock-final
13 let R = R | LKR | LF | RU
14 let W = W | LKW
15 let Release = Release | UL
16 let Acquire = Acquire | LKR
17 let critical = ([LKW] ; po-loc ; [UL]) \ (po-loc ; [LKW | UL] ; po-loc)
18 flag ~empty UL \ range(critical) as unmatched-unlock
19 let UNMATCHED-LKW = LKW \ domain(critical)
20 empty ([UNMATCHED-LKW] ; loc ; [UNMATCHED-LKW]) \ id as unmatched-locks
21 let rfi-lf = ([LKW] ; po-loc ; [LF]) \ ([LKW] ; po-loc ; [UL] ; po-loc)
22 let all-possible-rfe-lf =
23 let possible-rfe-lf r =
24 let pair-to-relation p = p ++ 0
25 in map pair-to-relation ((LKW * {r}) & loc & ext)
26 in map possible-rfe-lf (LF \ range(rfi-lf))
27 with rfe-lf from cross(all-possible-rfe-lf)
28 let rf-lf = rfe-lf | rfi-lf
29 let rfi-ru = ([UL] ; po-loc ; [RU]) \ ([UL] ; po-loc ; [LKW] ; po-loc)
30 let all-possible-rfe-ru =
31 let possible-rfe-ru r =
32 let pair-to-relation p = p ++ 0
33 in map pair-to-relation (((UL | IW) * {r}) & loc & ext)
34 in map possible-rfe-ru RU
35 with rfe-ru from cross(all-possible-rfe-ru)
36 let rf-ru = rfe-ru | rfi-ru
37 let rf = rf | rf-lf | rf-ru
38 let co0 = co0 | ([IW] ; loc ; [LKW]) |
39 (([LKW] ; loc ; [UNMATCHED-LKW]) \ [UNMATCHED-LKW])
40 include "cos-opt.cat"
41 let W = W | UL
42 let M = R | W
43 let co = (co | critical | (critical^-1 ; co))+
44 let coe = co & ext
45 let coi = co & int
46 let rf = rf | ([IW | UL] ; singlestep(co) ; lk-rmw^-1)
47 let rfe = rf & ext
48 let rfi = rf & int
49 let fr = rf^-1 ; co
50 let fre = fr & ext
51 let fri = fr & int
52 show co,rf,fr
Quick Quiz 5:
Yikes!!!
The locking model is quite a bit larger than that of RCU!
What gives?
Answer
Line 1 pulls in common definitions.
Lines 2-5 gather basic locking events, with
RL being a spin_is_locked() that returns true,
RU being a spin_is_locked() that returns false,
LF being a failing spin_trylock() (and line 4 extends
LF to include cases where
spin_is_locked() returns true),
UL being a spin_unlock(), and
LKR and LKW being the read and write portions,
respectively, of a successful lock acquisition.
Thus ALL-LOCKS is the set of all lock-related events.
Line 6 complains if one of the locking primitives is ever invoked
on a variable that is used as something other than a lock.
Line 7 forms the set of relations linking the read and write portions
of each successful lock-acquisition operation, and line 8 pulls
this set into the rmw set that covers other atomic operations.
Lines 9 and 10 complain if there is a successful lock-acquisition
operation that has a read portion but not a write portion and vice versa.
Line 11 checks for self-deadlock due to double-acquiring a lock.
Line 12 complains if a lock's value is tested as part of an
exists clause.
Lines 13-16 pull locking events into herd's set of reads, writes,
releases, and acquires, respectively.
Line 17 forms a relation linking each lock acquisition with the
corresponding lock release and line 18 complains if there is an
unmatched spin_unlock().
Line 19 then forms the set of unmatched spin_lock()
and successful spin_trylock() operations, after which
line 20 complains if there is more than one such unmatched
lock acquisition for a given lock.
Quick Quiz 6:
Why would anyone want to allow spin_lock() and successful
spin_trylock() operations that don't have matching
spin_unlock() operations???
Answer
Line 21 forms a relation linking spin_lock() calls to
spin_trylock() calls within the ensuing critical section.
Clearly, all such spin_trylock() calls must fail.
Lines 22-27 form a relation linking spin_lock() calls to
failing spin_trylock() calls in other processes.
The point here is that for every failing spin_trylock() call,
there must be a spin_lock() call that caused it to fail. This is
a design choice: One could equally well allow spin_trylock()
calls to spontaneously fail.
(And the C11 standard in fact does exactly that, which means that there
are a few lock-based algorithms that C11 is incapable of implementing.)
Line 28 then forms a relation linking to any failing
spin_trylock() call from the spin_lock() call that
caused it to fail, regardless of whether or not the two calls were in
the same process.
Line 29 forms a relation linking from a spin_unlock()
to a subsequent spin_is_locked() call within that same process
that returned false because of that spin_unlock().
Lines 30-35 then form a similar relation, but in the case where
the spin_unlock() and spin_is_locked() calls are
in different processes.
Line 36 forms the union of these two relations, that is, linking
from any spin_unlock() to all spin_is_locked() calls
that returned false because of that spin_unlock(), regardless
of which process these calls were in.
Finally, lines 37-52 add the lock-based ordering into the
standard herd relations.
Additional changes were required to linux-kernel.cat,
including the interactions of locking with
smp_mb__after_spinlock() and smp_mb__after_unlock_lock()
in the mb relation, and also
the addition of locking primitives to the release-acquire mechanics
via the po-unlock-rf-lock-po relation, which is in turn included
into the ppo and cumul-fence relations.
So it is true that lock.cat is not the simplest example of
cat code out there.
However, the added complexity does have its advantages.
For example, the following table shows the performance of the model
compared to the four emulation approaches:
# Processes |
XE |
CE |
XF |
CF |
Model |
2 |
0.058 |
0.039 |
0.027 |
0.022 |
0.004 |
3 |
3.203 |
1.653 |
0.968 |
0.743 |
0.041 |
4 |
500.96 |
151.962 |
74.818 |
59.565 |
0.374 |
5 |
|
|
|
|
4.905 |
The lock.cat model, though considerably slower than the RCU
model, has nevertheless delivered on the promise of multiple orders of
magnitude performance improvement over emulation of locking.
Quick Quiz 7:
Why not fill in the remaining values for the five-process row of
the table?
Answer
We have demonstrated LKMM's versatility by adding support for locking.
As before, we hope will be useful
for education, concurrent design, and for inclusion in other tooling,
especially given that locking is used very heavily within the Linux
kernel.
Acknowledgments
We owe thanks to the LWN editors for their review of this document.
We are also grateful to Jessica Murillo, Mark Figley, and Kara Todd
for their support of this effort.
This work represents the views of the authors and does not necessarily
represent the views of University College London, ARM Ltd., INRIA Paris,
Amarula Solutions, Harvard University, Intel, Red Hat, Nvidia,
or IBM Corporation.
Linux is a registered trademark of Linus Torvalds.
Other company, product, and service names may be trademarks or
service marks of others.
Quick Quiz 1:
What do the litmus tests corresponding to the “CE”
column look like?
Answer:
Like this:
Litmus Test #16
1 C C-SB+l-o-o-u+l-o-o-u-CE
2
3 {
4 }
5
6 P0(int *sl, int *x0, int *x1)
7 {
8 int r1;
9 int r2;
10
11 r2 = cmpxchg_acquire(sl, 0, 1);
12 WRITE_ONCE(*x0, 1);
13 r1 = READ_ONCE(*x1);
14 smp_store_release(sl, 0);
15 }
16
17 P1(int *sl, int *x0, int *x1)
18 {
19 int r1;
20 int r2;
21
22 r2 = cmpxchg_acquire(sl, 0, 1);
23 WRITE_ONCE(*x1, 1);
24 r1 = READ_ONCE(*x0);
25 smp_store_release(sl, 0);
26 }
27
28 exists (0:r1=0 /\ 0:r2=0 /\ 1:r1=0 /\ 1:r2=0)
Back to Quick Quiz 1.
Quick Quiz 2:
Why don't the numbers in the table match those in the
“Time” outcome lines?
Answer:
Because the tests generating the inline output were run on Paul's
laptop while it was also running rcutorture.
In contrast, the numbers in the table were collected using multiple
runs on an idle system.
Of course, as always, your mileage may vary.
Back to Quick Quiz 2.
Quick Quiz 3:
But what if I need a lock release and later acquisition to order
prior writes against later reads?
Answer:
Then you should use smp_mb__after_unlock_lock(), like this:
Litmus Test #17
1 C C-lock-WR-3
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0)
7 {
8 spin_lock(sl);
9 WRITE_ONCE(*x0, 1);
10 spin_unlock(sl);
11 }
12
13 P1(spinlock_t *sl, int *x0, int *x1)
14 {
15 int r1;
16 int r2;
17
18 spin_lock(sl);
19 smp_mb__after_unlock_lock();
20 r1 = READ_ONCE(*x0);
21 r2 = READ_ONCE(*x1);
22 spin_unlock(sl);
23 }
24
25 P2(int *x0, int *x1)
26 {
27 int r1;
28
29 WRITE_ONCE(*x1, 1);
30 smp_mb();
31 r1 = READ_ONCE(*x0);
32 }
33
34 exists (1:r1=1 /\ 1:r2=0 /\ 2:r1=0)
As you can see below, this provides the desired ordering.
Outcome for Litmus Test #17 (linux-kernel model)
1 Test C-lock-WR-3 Allowed
2 States 7
3 1:r1=0; 1:r2=0; 2:r1=0;
4 1:r1=0; 1:r2=0; 2:r1=1;
5 1:r1=0; 1:r2=1; 2:r1=0;
6 1:r1=0; 1:r2=1; 2:r1=1;
7 1:r1=1; 1:r2=0; 2:r1=1;
8 1:r1=1; 1:r2=1; 2:r1=0;
9 1:r1=1; 1:r2=1; 2:r1=1;
10 No
11 Witnesses
12 Positive: 0 Negative: 7
13 Condition exists (1:r1=1 /\ 1:r2=0 /\ 2:r1=0)
14 Observation C-lock-WR-3 Never 0 7
15 Time C-lock-WR-3 0.02
16 Hash=67cef9d4ce6e7598ef50bc43d28afeca
The Linux-kernel RCU implementation makes heavy use of
smp_mb__after_unlock_lock() in order to provide its
grace-period memory-ordering guarantees.
Back to Quick Quiz 3.
Quick Quiz 4:
Does a spin_unlock() followed by a spin_lock()
within the same process also provide similar ordering?
Answer:
Yes, assuming the spin_unlock() followed by a spin_lock()
use the same lock.
However, this ordering is not preserved when two different locks
are used:
Litmus Test #18
1 C C-lock-RW-2
2
3 {
4 }
5
6 P0(spinlock_t *sl0, spinlock_t *sl1, int *x0, int *x1)
7 {
8 int *r1;
9
10 spin_lock(sl0);
11 r1 = READ_ONCE(*x0);
12 spin_unlock(sl0);
13 spin_lock(sl1);
14 WRITE_ONCE(*x1, 1);
15 spin_unlock(sl1);
16 }
17
18 P1(int *x0, int *x1)
19 {
20 int r1;
21
22 r1 = READ_ONCE(*x1);
23 smp_mb();
24 WRITE_ONCE(*x0, 1);
25 }
26
27 exists (0:r1=1 /\ 1:r1=1)
As you can see below, this fails to provide the desired ordering.
Outcome for Litmus Test #18 (linux-kernel model)
1 Test C-lock-RW-2 Allowed
2 States 4
3 0:r1=0; 1:r1=0;
4 0:r1=0; 1:r1=1;
5 0:r1=1; 1:r1=0;
6 0:r1=1; 1:r1=1;
7 Ok
8 Witnesses
9 Positive: 1 Negative: 3
10 Condition exists (0:r1=1 /\ 1:r1=1)
11 Observation C-lock-RW-2 Sometimes 1 3
12 Time C-lock-RW-2 0.01
13 Hash=d27e32a7292825a15eb5ce3ef9944740
That said, you can insert
smp_mb__after_unlock_lock() after the
second
spin_lock() to enforce the ordering.
Back to Quick Quiz 4.
Quick Quiz 5:
Yikes!!!
The locking model is quite a bit larger than that of RCU!
What gives?
Answer:
Several things.
First, simpler models are possible, it is just that they are significantly
slower.
(But please feel free to construct a correct model that is both simpler
and at least as fast!)
Second, there is no rcu_read_trylock() for the RCU model to deal
with, but locking must model spin_trylock().
Third, the RCU model does not implement rcu_read_lock_held(),
but locking does implement spin_is_locked().
Note that rcu_read_lock_held() tends to be used in cases
where RCU readers span many functions, so there does not appear to
be much point in modeling it, given the limited size of LKMM litmus tests.
Fourth, unlike spin_lock() and spin_unlock(),
rcu_read_lock() and rcu_read_unlock() provide
no ordering except in conjunction with synchronize_rcu().
Fifth, unlike RCU, locking affects many of herd's basic relations,
including co, fr, and rf relations, and
thus must provide code to properly update those relations.
Sixth and finally, just you wait until we add reader-writer locking!!!
Or until you add it! ;–)
Back to Quick Quiz 5.
Quick Quiz 6:
Why would anyone want to allow spin_lock() and successful
spin_trylock() operations that don't have matching
spin_unlock() operations???
Answer:
For one thing, this sort of thing really does occur in real code, if only
by mistake.
Perhaps more usefully, unmatched spin_lock() calls allow
litmus-test writers limited control over the order of lock acquisition
without the need to add memory accesses and terms to the exists
or filter clauses.
Specifically, the process that acquires a lock but does not release it
must necessarily be the last process to acquire that lock.
For example,
Litmus Test #12
could be simplified as follows:
Litmus Test #19
1 C C-lock-RR-3
2
3 {
4 }
5
6 P0(spinlock_t *sl, int *x0)
7 {
8 int *r1;
9
10 spin_lock(sl);
11 r1 = READ_ONCE(*x0);
12 spin_unlock(sl);
13 }
14
15 P1(spinlock_t *sl, int *x1)
16 {
17 int *r1;
18
19 spin_lock(sl);
20 r1 = READ_ONCE(*x1);
21 }
22
23 P2(int *x0, int *x1)
24 {
25 int r1;
26
27 WRITE_ONCE(*x1, 1);
28 smp_mb();
29 WRITE_ONCE(*x0, 1);
30 }
31
32 exists (0:r1=1 /\ 1:r1=0)
This litmus test has the same outcome as its fully unlocked counterpart,
but without the need for the x2 variable:
Outcome for Litmus Test #19 (linux-kernel model)
1 Test C-lock-RR-3 Allowed
2 States 3
3 0:r1=0; 1:r1=0;
4 0:r1=0; 1:r1=1;
5 0:r1=1; 1:r1=1;
6 No
7 Witnesses
8 Positive: 0 Negative: 3
9 Condition exists (0:r1=1 /\ 1:r1=0)
10 Observation C-lock-RR-3 Never 0 3
11 Time C-lock-RR-3 0.01
12 Hash=45ba9be3cdc348a2620169b040843b60
For larger litmus tests, this sort of simplification could greatly
reduce herd's runtime.
Unmatched spin_lock() calls are also useful in debugging litmus
tests, for example, when trying to see whether a litmus test can deadlock
while holding a lock.
Back to Quick Quiz 6.
Quick Quiz 7:
Why not fill in the remaining values for the five-process row of
the table?
Answer:
Impatience.
But please feel free to run the full set of tests, send us the results,
and we will happily update the table.
Back to Quick Quiz 7.