From b1dc5ee6c26817efa76177b41382443f8d5bae91 Mon Sep 17 00:00:00 2001
From: vsadov <8218165+VSadov@users.noreply.github.com>
Date: Thu, 16 Jun 2022 00:26:29 -0700
Subject: [PATCH 01/33] Created memory-model.md

---
 docs/design/coreclr/botr/memory-model.md | 127 +++++++++++++++++++++++
 1 file changed, 127 insertions(+)
 create mode 100644 docs/design/coreclr/botr/memory-model.md

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
new file mode 100644
index 00000000000000..b322f03d335805
--- /dev/null
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -0,0 +1,127 @@
+// TODO: elaborate further on many sections   
+// TODO: need a lot more examples. 
+
+
+
+# CLR memory model
+
+## ECMA vs. CLR memory models.
+ECMA 335 standard defines a very weak memory model. After two decades the desire to have a flexible model did not result in considerable benefits due to hardware being stricter. On the other hand programming against ECMA model requires extra complexity to handle scenarios that are hard to comprehend and not possible to test.
+
+In the course of multiple releases CLR implementation settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers.  
+This document rationalizes the invariants provided and expected by the CLR runtime in its current implementation with expectation of that being carried to future releases.
+
+While the memory model is generally the same among runtime implementations such as .Net FX, CoreCLR, Mono, NativeAOT, when discrepancies do happen they will be called out.
+
+## Hardware considerations
+Current CoreCLR and libraries implementation makes a few expectations about the hardware memory model. These conditions are present on all currently supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions. 
+
+* Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.   
+That applies even for locations targeted by overlapping aligned reads and writes of different sizes.  
+**Example:** a read of a 4-byte aligned int32 variable will yield a value that existed prior some write or after some write. It will never be a mix of before/after bytes.
+
+*	The memory is cache-coherent and writes to a single location will be seen by all cores in the same order (multicopy atomic).  
+**Example:** when the same location is updated with values in ascending order (like 1,2,3,4,...), no observer will see a descending sequence.
+
+*	It may be possible for a thread to see its own writes before they appear to other cores (store buffer forwarding), as long as the single-thread consistency is not violated.
+
+*	The memory managed by the runtime is ordinary memory (not device register file or the like) and the only sideeffects of memory operations are storing and reading of values.
+
+*	It is possible to implement release consistency memory model.  
+Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
+
+*	Memory ordering honors data dependency  
+**Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.  
+(Some versions of Alpha processors did not support this, most current architectures do)
+
+## Alignment
+Data managed by CLR is *properly aligned* according to the data type size.
+
+1,2,4 - byte data types are aligned on 1,2,4 byte granularity. This applies to both heap and stack allocated memory.
+
+8-byte data types are 8-byte aligned on 64 bit platforms.
+
+Native-pointer-sized datatypes are always aligned.
+
+## Atomic memory accesses.
+Memory accesses to *properly aligned* data are always atomic. The value that is observed is always a result of complete read and write operations.
+
+## Unmanaged memory access. 
+As pointers can point to any addressable memory, operations with pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.  
+**Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.   
+
+Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as `.unaligned` IL prefix and `Unsafe.ReadUnaligned`, `Unsafe.WriteUnaligned` and ` Unsafe.CopyBlockUnaligned` helpers. 
+These facilities ensure fault-free access, but do not ensure atomicity.
+
+As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Support for unusual kinds of memory may be added if/when needed.
+
+## Sideeffects and optimizations of memory accesses.
+CLR assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
+
+As a consequence: 
+* Speculative writes are not allowed. 
+*	Reads cannot be introduced.
+*	Unused reads can be elided.
+*	Adjacent reads from the same location can be coalesced.
+*	Adjacent writes to the same location can be coalesced.
+
+## Thread-local memory accesses.
+It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.  
+
+## Order of memory operations.
+* **Ordinary memory accesses**  
+The effects of ordinary reads and writes can be reordered as long as that preserves single-thread consistency. Such reordering can happen both due to code generation strategy of the compiler or due to weak memory ordering in the hardware. 
+Volatile memory accesses
+
+*	**Volatile reads** have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.  
+  Operations with acquire semantics:  
+     - IL load instructions with `.volatile` prefix when such prefix is supported
+     - Volatile.Read
+
+*	**Volatile writes** have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.  
+  Operations with release semantics:
+     - IL store instructions with `.volatile` prefix when such prefix is supported
+     - Volatile.Write
+     - Releasing a lock
+
+Note that volatile semantics does not imply that operation is atomic or has any effect on how soon the operation is committed to the coherent memory. It only specifies the order of effects when they eventually become observable.
+
+* **Full-fence operations**  
+  Full-fence operations have "full-fence semantics" - effects of reads and writes must be observable no later or no earlier than a full-fence operation according to their relative program order.  
+  Operations with full-fence semantics:
+     - Thread.MemoryBarrier
+     - Interlocked methods
+     - Lock acquire
+
+For historical reasons:  
+     - Thread.VolatileRead performs a full fence after the read  
+     - Thread.VolatileWrite performs a full fence prior to the write
+
+## Process-wide barrier
+Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.  
+
+The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process' affinity mask could be a suitable implementation.
+
+## Object assignment
+Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
+The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.  
+The reading thread does not need to perform an acquiring read before accessing the content of an instance since all supported platforms honor ordering of data-dependent reads. 
+
+However, the ordering sideeffects of reference assignement should not be used for general ordering purposes because: 
+-	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler. 
+-	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.  
+
+## Instance constructors
+CLR does not prescribe any ordering effects to instance constructors.
+
+## Static constructors
+All side effects of static constructor execution must happen before accessing any member of the type.
+
+//TODO: is this a case when RunClassConstructor is called?
+
+## Referential transparency.
+Implementation of refs, pointers, remoting, MarshalByRefObject
+
+## Exceptions
+Synchronous, asynchronous, thread.abort. 
+Anything to do with memory model?

From 1fa2eccf98b9a81ee189b5ea76ec974a10fee7ab Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 17 Jun 2022 17:03:38 -0700
Subject: [PATCH 02/33] addresses some comments

---
 docs/design/coreclr/botr/memory-model.md | 119 +++++++++++------------
 1 file changed, 59 insertions(+), 60 deletions(-)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
index b322f03d335805..18fbd92ca61e2b 100644
--- a/docs/design/coreclr/botr/memory-model.md
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -6,64 +6,43 @@
 # CLR memory model
 
 ## ECMA vs. CLR memory models.
-ECMA 335 standard defines a very weak memory model. After two decades the desire to have a flexible model did not result in considerable benefits due to hardware being stricter. On the other hand programming against ECMA model requires extra complexity to handle scenarios that are hard to comprehend and not possible to test.
+ECMA 335 standard defines a very weak memory model. After two decades the desire to have a flexible model did not result in considerable benefits due to hardware being more strict. On the other hand programming against ECMA model requires extra complexity to handle scenarios that are hard to comprehend and not possible to test.
 
-In the course of multiple releases CLR implementation settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers.  
-This document rationalizes the invariants provided and expected by the CLR runtime in its current implementation with expectation of that being carried to future releases.
+In the course of multiple releases CLR implementation settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the CLR runtime in its current implementation with expectation of that being carried to future releases.
 
-While the memory model is generally the same among runtime implementations such as .Net FX, CoreCLR, Mono, NativeAOT, when discrepancies do happen they will be called out.
-
-## Hardware considerations
-Current CoreCLR and libraries implementation makes a few expectations about the hardware memory model. These conditions are present on all currently supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions. 
-
-* Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.   
-That applies even for locations targeted by overlapping aligned reads and writes of different sizes.  
-**Example:** a read of a 4-byte aligned int32 variable will yield a value that existed prior some write or after some write. It will never be a mix of before/after bytes.
-
-*	The memory is cache-coherent and writes to a single location will be seen by all cores in the same order (multicopy atomic).  
-**Example:** when the same location is updated with values in ascending order (like 1,2,3,4,...), no observer will see a descending sequence.
-
-*	It may be possible for a thread to see its own writes before they appear to other cores (store buffer forwarding), as long as the single-thread consistency is not violated.
-
-*	The memory managed by the runtime is ordinary memory (not device register file or the like) and the only sideeffects of memory operations are storing and reading of values.
-
-*	It is possible to implement release consistency memory model.  
-Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
-
-*	Memory ordering honors data dependency  
-**Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.  
-(Some versions of Alpha processors did not support this, most current architectures do)
+The memory model is generally the same among different runtime implementations such as .Net FX, CoreCLR, Mono, NativeAOT. When discrepancies do happen they will be called out.
 
 ## Alignment
-Data managed by CLR is *properly aligned* according to the data type size.
-
-1,2,4 - byte data types are aligned on 1,2,4 byte granularity. This applies to both heap and stack allocated memory.
+When managed by CLR runtime, variables of built-in primitive types are *properly aligned* according to the data type size. This applies to both heap and stack allocated memory.
 
-8-byte data types are 8-byte aligned on 64 bit platforms.
-
-Native-pointer-sized datatypes are always aligned.
+1-byte, 2-byte, 4-byte variables are stored at 1-byte, 2-byte, 4-byte boundary, respectively.  
+8-byte variables are 8-byte aligned on 64 bit platforms.  
+Native-sized integer types and pointers have alignment that matches their size on the given platform.  
 
 ## Atomic memory accesses.
-Memory accesses to *properly aligned* data are always atomic. The value that is observed is always a result of complete read and write operations.
+Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
 
 ## Unmanaged memory access. 
 As pointers can point to any addressable memory, operations with pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.  
 **Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.   
 
-Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as `.unaligned` IL prefix and `Unsafe.ReadUnaligned`, `Unsafe.WriteUnaligned` and ` Unsafe.CopyBlockUnaligned` helpers. 
-These facilities ensure fault-free access, but do not ensure atomicity.
+Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as:  
+* `.unaligned` IL prefix 
+* `Unsafe.ReadUnaligned`, `Unsafe.WriteUnaligned` and ` Unsafe.CopyBlockUnaligned` helpers.
+
+These facilities ensure fault-free access to potentially unaligned variables, but do not ensure atomicity.
 
-As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Support for unusual kinds of memory may be added if/when needed.
+As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop will result in Undefined Behavior.
 
 ## Sideeffects and optimizations of memory accesses.
 CLR assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
 
 As a consequence: 
 * Speculative writes are not allowed. 
-*	Reads cannot be introduced.
-*	Unused reads can be elided.
-*	Adjacent reads from the same location can be coalesced.
-*	Adjacent writes to the same location can be coalesced.
+* Reads cannot be introduced.
+* Unused reads can be elided.
+* Adjacent reads from the same location can be coalesced.
+* Adjacent writes to the same location can be coalesced.
 
 ## Thread-local memory accesses.
 It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.  
@@ -71,31 +50,34 @@ It may be possible for an optimizing compiler to prove that some data is accessi
 ## Order of memory operations.
 * **Ordinary memory accesses**  
 The effects of ordinary reads and writes can be reordered as long as that preserves single-thread consistency. Such reordering can happen both due to code generation strategy of the compiler or due to weak memory ordering in the hardware. 
-Volatile memory accesses
 
-*	**Volatile reads** have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.  
+* **Volatile reads** have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.  
   Operations with acquire semantics:  
-     - IL load instructions with `.volatile` prefix when such prefix is supported
-     - Volatile.Read
+     - IL load instructions with `.volatile` prefix when instruction supports such prefix
+     - `System.Threading.Volatile.Read`
+     - `System.Thread.VolatileRead`
+     - Acquiring a lock (`System.Threading.Monitor.Enter` or entering a synchronized method)
 
-*	**Volatile writes** have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.  
+* **Volatile writes** have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.  
   Operations with release semantics:
      - IL store instructions with `.volatile` prefix when such prefix is supported
-     - Volatile.Write
-     - Releasing a lock
+     - `System.Threading.Volatile.Write`
+     - `System.Thread.VolatileWrite`
+     - Releasing a lock (`System.Threading.Monitor.Exit` or leaving a synchronized method)
+
+Note that volatile semantics does not by itself imply that operation is atomic or has any effect on how soon the operation is committed to the coherent memory. It only specifies the order of effects when they eventually become observable.
 
-Note that volatile semantics does not imply that operation is atomic or has any effect on how soon the operation is committed to the coherent memory. It only specifies the order of effects when they eventually become observable.
+`.volatile` and `.unaligned` IL prefixes can be combined where both are permitted.
+
+// TODO: `cpblk` and `initblk`
+
+It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to omit volatile semantics when accessing such data.
 
 * **Full-fence operations**  
   Full-fence operations have "full-fence semantics" - effects of reads and writes must be observable no later or no earlier than a full-fence operation according to their relative program order.  
   Operations with full-fence semantics:
-     - Thread.MemoryBarrier
-     - Interlocked methods
-     - Lock acquire
-
-For historical reasons:  
-     - Thread.VolatileRead performs a full fence after the read  
-     - Thread.VolatileWrite performs a full fence prior to the write
+     - `System.Thread.MemoryBarrier`
+     - `System.Threading.Interlocked` methods
 
 ## Process-wide barrier
 Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.  
@@ -112,16 +94,33 @@ However, the ordering sideeffects of reference assignement should not be used fo
 -	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.  
 
 ## Instance constructors
-CLR does not prescribe any ordering effects to instance constructors.
+CLR does not specify any ordering effects to the instance constructors.
 
 ## Static constructors
 All side effects of static constructor execution must happen before accessing any member of the type.
 
 //TODO: is this a case when RunClassConstructor is called?
 
-## Referential transparency.
-Implementation of refs, pointers, remoting, MarshalByRefObject
-
 ## Exceptions
-Synchronous, asynchronous, thread.abort. 
-Anything to do with memory model?
+//TODO: Synchronous, asynchronous, thread.abort. Anything to do with memory model?
+
+## Hardware considerations
+Current CoreCLR and libraries implementation makes a few expectations about the hardware memory model. These conditions are present on all currently supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions. 
+
+* Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.   
+That applies even for locations targeted by overlapping aligned reads and writes of different sizes.  
+**Example:** a read of a 4-byte aligned int32 variable will yield a value that existed prior some write or after some write. It will never be a mix of before/after bytes.
+
+*	The memory is cache-coherent and writes to a single location will be seen by all cores in the same order (multicopy atomic).  
+**Example:** when the same location is updated with values in ascending order (like 1,2,3,4,...), no observer will see a descending sequence.
+
+*	It may be possible for a thread to see its own writes before they appear to other cores (store buffer forwarding), as long as the single-thread consistency is not violated.
+
+*	The memory managed by the runtime is ordinary memory (not device register file or the like) and the only sideeffects of memory operations are storing and reading of values.
+
+*	It is possible to implement release consistency memory model.  
+Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
+
+*	Memory ordering honors data dependency  
+**Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.  
+(Some versions of Alpha processors did not support this, most current architectures do)

From c4b47653a32d093f214b98db2d6bef74e6a1e05f Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 17 Jun 2022 23:31:34 -0700
Subject: [PATCH 03/33] More details and samples.

---
 docs/design/coreclr/botr/memory-model.md | 160 ++++++++++++++++++++---
 1 file changed, 141 insertions(+), 19 deletions(-)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
index 18fbd92ca61e2b..732f230e2e278f 100644
--- a/docs/design/coreclr/botr/memory-model.md
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -1,7 +1,3 @@
-// TODO: elaborate further on many sections   
-// TODO: need a lot more examples. 
-
-
 
 # CLR memory model
 
@@ -10,8 +6,6 @@ ECMA 335 standard defines a very weak memory model. After two decades the desire
 
 In the course of multiple releases CLR implementation settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the CLR runtime in its current implementation with expectation of that being carried to future releases.
 
-The memory model is generally the same among different runtime implementations such as .Net FX, CoreCLR, Mono, NativeAOT. When discrepancies do happen they will be called out.
-
 ## Alignment
 When managed by CLR runtime, variables of built-in primitive types are *properly aligned* according to the data type size. This applies to both heap and stack allocated memory.
 
@@ -23,14 +17,14 @@ Native-sized integer types and pointers have alignment that matches their size o
 Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
 
 ## Unmanaged memory access. 
-As pointers can point to any addressable memory, operations with pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.  
+As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.  
 **Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.   
 
 Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as:  
 * `.unaligned` IL prefix 
 * `Unsafe.ReadUnaligned`, `Unsafe.WriteUnaligned` and ` Unsafe.CopyBlockUnaligned` helpers.
 
-These facilities ensure fault-free access to potentially unaligned variables, but do not ensure atomicity.
+These facilities ensure fault-free access to potentially unaligned locations, but do not ensure atomicity.
 
 As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop will result in Undefined Behavior.
 
@@ -41,12 +35,16 @@ As a consequence:
 * Speculative writes are not allowed. 
 * Reads cannot be introduced.
 * Unused reads can be elided.
-* Adjacent reads from the same location can be coalesced.
-* Adjacent writes to the same location can be coalesced.
+* Adjacent nonvolatile reads from the same location can be coalesced.
+* Adjacent nonvolatile writes to the same location can be coalesced.
 
 ## Thread-local memory accesses.
 It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.  
 
+## Cross-thread access to local variables.
+-	There is no type-safe mechanism for accessing locations on one thread’s stack from another thread.
+-	Accessing managed references located on the stack of a different thread by the means of unsafe code will result in Undefiled Behavior.
+
 ## Order of memory operations.
 * **Ordinary memory accesses**  
 The effects of ordinary reads and writes can be reordered as long as that preserves single-thread consistency. Such reordering can happen both due to code generation strategy of the compiler or due to weak memory ordering in the hardware. 
@@ -64,13 +62,18 @@ The effects of ordinary reads and writes can be reordered as long as that preser
      - `System.Threading.Volatile.Write`
      - `System.Thread.VolatileWrite`
      - Releasing a lock (`System.Threading.Monitor.Exit` or leaving a synchronized method)
+     
+* **.volatile initblk** has "release semantics" - the effects of `.volatile initblk` will not be observable earlier than the effects of preceeding reads and writes.
+   
+* **.volatile cpblk** combines ordering semantics of a volatile read and write with respect to the read and written memory locations. 
+     - The writes performed by `.volatile cpblk` will not be observable earlier than the effects of preceeding reads and writes.
+     - No read or write that is later in the program order may be speculatively executed before the reads performed by `.volatile cpblk`
+     - `cpblk` may be implemented as a sequence of reads and writes. The granularity and mutual order of such reads and writes is unspecified.
 
 Note that volatile semantics does not by itself imply that operation is atomic or has any effect on how soon the operation is committed to the coherent memory. It only specifies the order of effects when they eventually become observable.
 
 `.volatile` and `.unaligned` IL prefixes can be combined where both are permitted.
 
-// TODO: `cpblk` and `initblk`
-
 It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to omit volatile semantics when accessing such data.
 
 * **Full-fence operations**  
@@ -78,12 +81,15 @@ It may be possible for an optimizing compiler to prove that some data is accessi
   Operations with full-fence semantics:
      - `System.Thread.MemoryBarrier`
      - `System.Threading.Interlocked` methods
-
+     
 ## Process-wide barrier
 Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.  
 
 The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process' affinity mask could be a suitable implementation.
 
+## Synchronized methods
+Synchronized methods have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
+
 ## Object assignment
 Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
 The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.  
@@ -99,13 +105,8 @@ CLR does not specify any ordering effects to the instance constructors.
 ## Static constructors
 All side effects of static constructor execution must happen before accessing any member of the type.
 
-//TODO: is this a case when RunClassConstructor is called?
-
-## Exceptions
-//TODO: Synchronous, asynchronous, thread.abort. Anything to do with memory model?
-
 ## Hardware considerations
-Current CoreCLR and libraries implementation makes a few expectations about the hardware memory model. These conditions are present on all currently supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions. 
+Currently supported implementations of CLR and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions. 
 
 * Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.   
 That applies even for locations targeted by overlapping aligned reads and writes of different sizes.  
@@ -124,3 +125,124 @@ Either the platform defaults to release consistency or stronger (i.e. x64 is TSO
 *	Memory ordering honors data dependency  
 **Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.  
 (Some versions of Alpha processors did not support this, most current architectures do)
+
+## Examples and common patterns:  
+The following examples work correctly on all supported CLR implementations regardless of the target OS or architecture.
+
+*   Constructing an instance and sharing with another thread is safe and does not require explicit fences.
+
+```cs
+
+static MyClass obj;
+
+// thread #1
+void ThreadFunc1()
+{
+    while (true)
+    {
+        obj = new MyClass();   
+    }
+}
+
+// thread #2
+void ThreadFunc1()
+{
+    while (true)
+    {
+        obj = null;   
+    }
+}
+
+// thread #3
+void ThreadFunc2()
+{
+    MyClass localObj = obj;
+    if (localObj != null)
+    {
+        // accessing members of the local object is safe because
+        // - reads cannot be introduced, thus localObj cannot be re-read and become null
+        // - publishing assignment to obj will not become visible earlier than write operations in the MyClass constructor
+        // - indirect accesses via an instance are dependent reads, thus we will see results of constructor's writes
+        System.Console.WriteLine(localObj.ToString());
+    }
+}
+
+```
+
+* Singleton (using a lock)
+
+```cs
+
+private object _lock = new object();
+private MyClass _inst;
+
+public MyClass GetSingleton()
+{
+    if (_inst == null)
+    {
+        lock (_lock)
+        {
+            // taking a lock is an acquire, the read of _inst will happen after taking the lock
+            // releasing a lock is a release, if another thread assigned _inst, the write will be observed no later than the release of the lock
+            // thus if another thread initialized the singleton, the current thread is guaranteed to see that here.
+
+            if (_inst == null)
+            {
+                _inst = new MyClass();
+            }
+        }
+    }
+    
+    return _inst;
+}
+
+```
+
+
+* Singleton (using an interlocked operation)
+
+```cs
+private MyClass _inst;
+
+public MyClass GetSingleton()
+{
+    MyClass localInst = _inst;
+    
+    if (localInst == null)
+    {
+        // unlike the example with the lock, we may construct multiple instances
+        // only one will "win" and become a unique singleton object
+        Interlocked.CompareExchange(ref _inst, new MyClass(), null);
+        
+        // since Interlocked.CompareExchange is a full fence,
+        // we cannot possibly read null or some other spurious instance that is not the singleton
+        localInst = _inst;
+    }
+    
+    return localInst;
+}
+```
+
+* Communicating with another thread by checking a flag. 
+
+```cs
+internal class Program
+{
+    static bool flag;
+
+    static void Main(string[] args)
+    {
+        Task.Run(() => flag = true);
+
+        // the repeated read will eventually see that the value of 'flag' has changed,
+        // but the read must be Volatile to ensure all reads are not coalesced
+        // into one read prior entering the while loop.
+        while (!Volatile.Read(ref flag))
+        {
+        }
+
+        System.Console.WriteLine("done");
+    }
+}
+
+```

From b872aba1a84d93eeadb0097b523bbe700760ec72 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 16 Sep 2022 17:09:41 -0700
Subject: [PATCH 04/33] Fix trailing whitespaces.

---
 docs/design/coreclr/botr/memory-model.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
index 732f230e2e278f..7cd896d142d0b6 100644
--- a/docs/design/coreclr/botr/memory-model.md
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -9,19 +9,19 @@ In the course of multiple releases CLR implementation settled around a memory mo
 ## Alignment
 When managed by CLR runtime, variables of built-in primitive types are *properly aligned* according to the data type size. This applies to both heap and stack allocated memory.
 
-1-byte, 2-byte, 4-byte variables are stored at 1-byte, 2-byte, 4-byte boundary, respectively.  
-8-byte variables are 8-byte aligned on 64 bit platforms.  
-Native-sized integer types and pointers have alignment that matches their size on the given platform.  
+- 1-byte, 2-byte, 4-byte variables are stored at 1-byte, 2-byte, 4-byte boundary, respectively.
+- 8-byte variables are 8-byte aligned on 64 bit platforms.
+- Native-sized integer types and pointers have alignment that matches their size on the given platform.
 
 ## Atomic memory accesses.
 Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
 
-## Unmanaged memory access. 
-As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.  
-**Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.   
+## Unmanaged memory access.
+As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.
+**Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.
 
-Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as:  
-* `.unaligned` IL prefix 
+Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as:
+* `.unaligned` IL prefix
 * `Unsafe.ReadUnaligned`, `Unsafe.WriteUnaligned` and ` Unsafe.CopyBlockUnaligned` helpers.
 
 These facilities ensure fault-free access to potentially unaligned locations, but do not ensure atomicity.
@@ -31,8 +31,8 @@ As of this writing there is no specific support for operating with incoherent me
 ## Sideeffects and optimizations of memory accesses.
 CLR assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
 
-As a consequence: 
-* Speculative writes are not allowed. 
+As a consequence:
+* Speculative writes are not allowed.
 * Reads cannot be introduced.
 * Unused reads can be elided.
 * Adjacent nonvolatile reads from the same location can be coalesced.

From aae251b330ee90afe6ef7cd0a86e8527a58cda5d Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 16 Sep 2022 17:16:03 -0700
Subject: [PATCH 05/33] More trailing whitespace

---
 docs/design/coreclr/botr/memory-model.md | 52 ++++++++++++------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
index 7cd896d142d0b6..973fb7220a65eb 100644
--- a/docs/design/coreclr/botr/memory-model.md
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -9,9 +9,9 @@ In the course of multiple releases CLR implementation settled around a memory mo
 ## Alignment
 When managed by CLR runtime, variables of built-in primitive types are *properly aligned* according to the data type size. This applies to both heap and stack allocated memory.
 
-- 1-byte, 2-byte, 4-byte variables are stored at 1-byte, 2-byte, 4-byte boundary, respectively.
-- 8-byte variables are 8-byte aligned on 64 bit platforms.
-- Native-sized integer types and pointers have alignment that matches their size on the given platform.
+1-byte, 2-byte, 4-byte variables are stored at 1-byte, 2-byte, 4-byte boundary, respectively.
+8-byte variables are 8-byte aligned on 64 bit platforms.
+Native-sized integer types and pointers have alignment that matches their size on the given platform.
 
 ## Atomic memory accesses.
 Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
@@ -39,33 +39,33 @@ As a consequence:
 * Adjacent nonvolatile writes to the same location can be coalesced.
 
 ## Thread-local memory accesses.
-It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.  
+It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.
 
 ## Cross-thread access to local variables.
 -	There is no type-safe mechanism for accessing locations on one thread’s stack from another thread.
 -	Accessing managed references located on the stack of a different thread by the means of unsafe code will result in Undefiled Behavior.
 
 ## Order of memory operations.
-* **Ordinary memory accesses**  
-The effects of ordinary reads and writes can be reordered as long as that preserves single-thread consistency. Such reordering can happen both due to code generation strategy of the compiler or due to weak memory ordering in the hardware. 
+* **Ordinary memory accesses**
+The effects of ordinary reads and writes can be reordered as long as that preserves single-thread consistency. Such reordering can happen both due to code generation strategy of the compiler or due to weak memory ordering in the hardware.
 
-* **Volatile reads** have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.  
-  Operations with acquire semantics:  
+* **Volatile reads** have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.
+  Operations with acquire semantics:
      - IL load instructions with `.volatile` prefix when instruction supports such prefix
      - `System.Threading.Volatile.Read`
      - `System.Thread.VolatileRead`
      - Acquiring a lock (`System.Threading.Monitor.Enter` or entering a synchronized method)
 
-* **Volatile writes** have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.  
+* **Volatile writes** have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.
   Operations with release semantics:
      - IL store instructions with `.volatile` prefix when such prefix is supported
      - `System.Threading.Volatile.Write`
      - `System.Thread.VolatileWrite`
      - Releasing a lock (`System.Threading.Monitor.Exit` or leaving a synchronized method)
-     
+
 * **.volatile initblk** has "release semantics" - the effects of `.volatile initblk` will not be observable earlier than the effects of preceeding reads and writes.
-   
-* **.volatile cpblk** combines ordering semantics of a volatile read and write with respect to the read and written memory locations. 
+
+* **.volatile cpblk** combines ordering semantics of a volatile read and write with respect to the read and written memory locations.
      - The writes performed by `.volatile cpblk` will not be observable earlier than the effects of preceeding reads and writes.
      - No read or write that is later in the program order may be speculatively executed before the reads performed by `.volatile cpblk`
      - `cpblk` may be implemented as a sequence of reads and writes. The granularity and mutual order of such reads and writes is unspecified.
@@ -76,14 +76,14 @@ Note that volatile semantics does not by itself imply that operation is atomic o
 
 It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to omit volatile semantics when accessing such data.
 
-* **Full-fence operations**  
-  Full-fence operations have "full-fence semantics" - effects of reads and writes must be observable no later or no earlier than a full-fence operation according to their relative program order.  
+* **Full-fence operations**
+  Full-fence operations have "full-fence semantics" - effects of reads and writes must be observable no later or no earlier than a full-fence operation according to their relative program order.
   Operations with full-fence semantics:
      - `System.Thread.MemoryBarrier`
      - `System.Threading.Interlocked` methods
      
 ## Process-wide barrier
-Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.  
+Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.
 
 The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process' affinity mask could be a suitable implementation.
 
@@ -92,12 +92,12 @@ Synchronized methods have the same memory access semantics as if a lock is acqui
 
 ## Object assignment
 Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
-The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.  
+The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
 The reading thread does not need to perform an acquiring read before accessing the content of an instance since all supported platforms honor ordering of data-dependent reads. 
 
-However, the ordering sideeffects of reference assignement should not be used for general ordering purposes because: 
--	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler. 
--	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.  
+However, the ordering sideeffects of reference assignement should not be used for general ordering purposes because:
+-	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler.
+-	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.
 
 ## Instance constructors
 CLR does not specify any ordering effects to the instance constructors.
@@ -108,22 +108,22 @@ All side effects of static constructor execution must happen before accessing an
 ## Hardware considerations
 Currently supported implementations of CLR and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions. 
 
-* Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.   
-That applies even for locations targeted by overlapping aligned reads and writes of different sizes.  
+* Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.
+That applies even for locations targeted by overlapping aligned reads and writes of different sizes.
 **Example:** a read of a 4-byte aligned int32 variable will yield a value that existed prior some write or after some write. It will never be a mix of before/after bytes.
 
-*	The memory is cache-coherent and writes to a single location will be seen by all cores in the same order (multicopy atomic).  
+*	The memory is cache-coherent and writes to a single location will be seen by all cores in the same order (multicopy atomic).
 **Example:** when the same location is updated with values in ascending order (like 1,2,3,4,...), no observer will see a descending sequence.
 
 *	It may be possible for a thread to see its own writes before they appear to other cores (store buffer forwarding), as long as the single-thread consistency is not violated.
 
 *	The memory managed by the runtime is ordinary memory (not device register file or the like) and the only sideeffects of memory operations are storing and reading of values.
 
-*	It is possible to implement release consistency memory model.  
+*	It is possible to implement release consistency memory model.
 Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
 
 *	Memory ordering honors data dependency  
-**Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.  
+**Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
 (Some versions of Alpha processors did not support this, most current architectures do)
 
 ## Examples and common patterns:  
@@ -140,7 +140,7 @@ void ThreadFunc1()
 {
     while (true)
     {
-        obj = new MyClass();   
+        obj = new MyClass();
     }
 }
 
@@ -149,7 +149,7 @@ void ThreadFunc1()
 {
     while (true)
     {
-        obj = null;   
+        obj = null;
     }
 }
 

From f13605da62bda194ab61d8e1cbfea2addb5b6cda Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 16 Sep 2022 18:25:25 -0700
Subject: [PATCH 06/33] More trailing whitespace.

---
 docs/design/coreclr/botr/memory-model.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
index 973fb7220a65eb..072c1eedcd1af4 100644
--- a/docs/design/coreclr/botr/memory-model.md
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -81,7 +81,7 @@ It may be possible for an optimizing compiler to prove that some data is accessi
   Operations with full-fence semantics:
      - `System.Thread.MemoryBarrier`
      - `System.Threading.Interlocked` methods
-     
+
 ## Process-wide barrier
 Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.
 
@@ -93,7 +93,7 @@ Synchronized methods have the same memory access semantics as if a lock is acqui
 ## Object assignment
 Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
 The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
-The reading thread does not need to perform an acquiring read before accessing the content of an instance since all supported platforms honor ordering of data-dependent reads. 
+The reading thread does not need to perform an acquiring read before accessing the content of an instance since all supported platforms honor ordering of data-dependent reads.
 
 However, the ordering sideeffects of reference assignement should not be used for general ordering purposes because:
 -	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler.
@@ -106,7 +106,7 @@ CLR does not specify any ordering effects to the instance constructors.
 All side effects of static constructor execution must happen before accessing any member of the type.
 
 ## Hardware considerations
-Currently supported implementations of CLR and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions. 
+Currently supported implementations of CLR and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions.
 
 * Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.
 That applies even for locations targeted by overlapping aligned reads and writes of different sizes.
@@ -122,11 +122,11 @@ That applies even for locations targeted by overlapping aligned reads and writes
 *	It is possible to implement release consistency memory model.
 Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
 
-*	Memory ordering honors data dependency  
+*	Memory ordering honors data dependency
 **Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
 (Some versions of Alpha processors did not support this, most current architectures do)
 
-## Examples and common patterns:  
+## Examples and common patterns:
 The following examples work correctly on all supported CLR implementations regardless of the target OS or architecture.
 
 *   Constructing an instance and sharing with another thread is safe and does not require explicit fences.
@@ -223,7 +223,7 @@ public MyClass GetSingleton()
 }
 ```
 
-* Communicating with another thread by checking a flag. 
+* Communicating with another thread by checking a flag.
 
 ```cs
 internal class Program

From a8a100303725c07066467882199d3046d2fa6991 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 16 Sep 2022 18:34:02 -0700
Subject: [PATCH 07/33] Apply suggestions from code review (typos)

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: Dan Moseley <danmose@microsoft.com>
---
 docs/design/coreclr/botr/memory-model.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
index 072c1eedcd1af4..7e8699fe4fa036 100644
--- a/docs/design/coreclr/botr/memory-model.md
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -26,7 +26,7 @@ Although rare, unaligned access is a realistic scenario and thus there is some l
 
 These facilities ensure fault-free access to potentially unaligned locations, but do not ensure atomicity.
 
-As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop will result in Undefined Behavior.
+As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop results in Undefined Behavior.
 
 ## Sideeffects and optimizations of memory accesses.
 CLR assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
@@ -43,7 +43,7 @@ It may be possible for an optimizing compiler to prove that some data is accessi
 
 ## Cross-thread access to local variables.
 -	There is no type-safe mechanism for accessing locations on one thread’s stack from another thread.
--	Accessing managed references located on the stack of a different thread by the means of unsafe code will result in Undefiled Behavior.
+-	Accessing managed references located on the stack of a different thread by the means of unsafe code will result in Undefined Behavior.
 
 ## Order of memory operations.
 * **Ordinary memory accesses**
@@ -95,7 +95,7 @@ Object assignment to a location potentially accessible by other threads is a rel
 The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
 The reading thread does not need to perform an acquiring read before accessing the content of an instance since all supported platforms honor ordering of data-dependent reads.
 
-However, the ordering sideeffects of reference assignement should not be used for general ordering purposes because:
+However, the ordering sideeffects of reference assignment should not be used for general ordering purposes because:
 -	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler.
 -	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.
 

From b15a921df0698fc418c2bf3f6fd808c31e32e10e Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Wed, 21 Sep 2022 13:38:59 -0700
Subject: [PATCH 08/33] replaced references to CLR with ".NET runtime"

---
 docs/design/coreclr/botr/memory-model.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
index 7e8699fe4fa036..5a5694d2ab4734 100644
--- a/docs/design/coreclr/botr/memory-model.md
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -1,13 +1,13 @@
 
-# CLR memory model
+# .NET memory model
 
-## ECMA vs. CLR memory models.
+## ECMA 335 vs. .NET memory models.
 ECMA 335 standard defines a very weak memory model. After two decades the desire to have a flexible model did not result in considerable benefits due to hardware being more strict. On the other hand programming against ECMA model requires extra complexity to handle scenarios that are hard to comprehend and not possible to test.
 
-In the course of multiple releases CLR implementation settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the CLR runtime in its current implementation with expectation of that being carried to future releases.
+In the course of multiple releases .NET runtime implementations settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the CLR runtime in its current implementation with expectation of that being carried to future releases.
 
 ## Alignment
-When managed by CLR runtime, variables of built-in primitive types are *properly aligned* according to the data type size. This applies to both heap and stack allocated memory.
+When managed by the .NET runtime, variables of built-in primitive types are *properly aligned* according to the data type size. This applies to both heap and stack allocated memory.
 
 1-byte, 2-byte, 4-byte variables are stored at 1-byte, 2-byte, 4-byte boundary, respectively.
 8-byte variables are 8-byte aligned on 64 bit platforms.
@@ -29,7 +29,7 @@ These facilities ensure fault-free access to potentially unaligned locations, bu
 As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop results in Undefined Behavior.
 
 ## Sideeffects and optimizations of memory accesses.
-CLR assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
+.NET runtime assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
 
 As a consequence:
 * Speculative writes are not allowed.

From d554b05607ab746735ae7cc0d23e312681a697fe Mon Sep 17 00:00:00 2001
From: vsadov <8218165+VSadov@users.noreply.github.com>
Date: Wed, 21 Sep 2022 17:44:48 -0700
Subject: [PATCH 09/33] Addressed some PR review feedback

---
 docs/design/coreclr/botr/memory-model.md | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/coreclr/botr/memory-model.md
index 5a5694d2ab4734..62d6998b924272 100644
--- a/docs/design/coreclr/botr/memory-model.md
+++ b/docs/design/coreclr/botr/memory-model.md
@@ -16,12 +16,14 @@ Native-sized integer types and pointers have alignment that matches their size o
 ## Atomic memory accesses.
 Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
 
+Note: since unmanaged pointers and managed references are always aligned to their size on the given platform, accesses of pointers and managed references are atomic.
+
 ## Unmanaged memory access.
 As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.
 **Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.
 
 Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as:
-* `.unaligned` IL prefix
+* `unaligned.` IL prefix
 * `Unsafe.ReadUnaligned`, `Unsafe.WriteUnaligned` and ` Unsafe.CopyBlockUnaligned` helpers.
 
 These facilities ensure fault-free access to potentially unaligned locations, but do not ensure atomicity.
@@ -51,28 +53,28 @@ The effects of ordinary reads and writes can be reordered as long as that preser
 
 * **Volatile reads** have "acquire semantics" - no read or write that is later in the program order may be speculatively executed ahead of a volatile read.
   Operations with acquire semantics:
-     - IL load instructions with `.volatile` prefix when instruction supports such prefix
+     - IL load instructions with `volatile.` prefix when instruction supports such prefix
      - `System.Threading.Volatile.Read`
      - `System.Thread.VolatileRead`
      - Acquiring a lock (`System.Threading.Monitor.Enter` or entering a synchronized method)
 
 * **Volatile writes** have "release semantics" - the effects of a volatile write will not be observable before effects of all previous, in program order, reads and writes become observable.
   Operations with release semantics:
-     - IL store instructions with `.volatile` prefix when such prefix is supported
+     - IL store instructions with `volatile.` prefix when such prefix is supported
      - `System.Threading.Volatile.Write`
      - `System.Thread.VolatileWrite`
      - Releasing a lock (`System.Threading.Monitor.Exit` or leaving a synchronized method)
 
-* **.volatile initblk** has "release semantics" - the effects of `.volatile initblk` will not be observable earlier than the effects of preceeding reads and writes.
+* **volatile. initblk** has "release semantics" - the effects of `.volatile initblk` will not be observable earlier than the effects of preceeding reads and writes.
 
-* **.volatile cpblk** combines ordering semantics of a volatile read and write with respect to the read and written memory locations.
-     - The writes performed by `.volatile cpblk` will not be observable earlier than the effects of preceeding reads and writes.
-     - No read or write that is later in the program order may be speculatively executed before the reads performed by `.volatile cpblk`
+* **volatile. cpblk** combines ordering semantics of a volatile read and write with respect to the read and written memory locations.
+     - The writes performed by `volatile. cpblk` will not be observable earlier than the effects of preceeding reads and writes.
+     - No read or write that is later in the program order may be speculatively executed before the reads performed by `volatile. cpblk`
      - `cpblk` may be implemented as a sequence of reads and writes. The granularity and mutual order of such reads and writes is unspecified.
 
 Note that volatile semantics does not by itself imply that operation is atomic or has any effect on how soon the operation is committed to the coherent memory. It only specifies the order of effects when they eventually become observable.
 
-`.volatile` and `.unaligned` IL prefixes can be combined where both are permitted.
+`volatile.` and `unaligned.` IL prefixes can be combined where both are permitted.
 
 It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to omit volatile semantics when accessing such data.
 
@@ -173,7 +175,7 @@ void ThreadFunc2()
 
 ```cs
 
-private object _lock = new object();
+private readonly object _lock = new object();
 private MyClass _inst;
 
 public MyClass GetSingleton()

From 38f0585042ae8a2cb0a1d9211b3831bb7ff13311 Mon Sep 17 00:00:00 2001
From: vsadov <8218165+VSadov@users.noreply.github.com>
Date: Wed, 21 Sep 2022 17:48:04 -0700
Subject: [PATCH 10/33] Moved to specs folder

---
 .../{coreclr/botr/memory-model.md => specs/Memory-model.md}       | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename docs/design/{coreclr/botr/memory-model.md => specs/Memory-model.md} (100%)

diff --git a/docs/design/coreclr/botr/memory-model.md b/docs/design/specs/Memory-model.md
similarity index 100%
rename from docs/design/coreclr/botr/memory-model.md
rename to docs/design/specs/Memory-model.md

From fe0a65edb853e76ed8e9e83b45f1d6a1de3803cc Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Wed, 21 Sep 2022 18:27:53 -0700
Subject: [PATCH 11/33] More addressing PR feedback

---
 docs/design/specs/Memory-model.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 62d6998b924272..7fe86b15ec63b2 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -84,13 +84,16 @@ It may be possible for an optimizing compiler to prove that some data is accessi
      - `System.Thread.MemoryBarrier`
      - `System.Threading.Interlocked` methods
 
+## C# `volatile` feature.
+One common way to introduce volatile memory accesses is by using C# `volatile` language feature. Declaring a field as `volatile` does not have any effect on how .NET runtime treats the field. The decoration works as a hint to the C# compiler itself (and compilers for other .Net languages) to emit reads and writes of such field as  reads and writes with `volatile.` prefix.
+
 ## Process-wide barrier
 Process-wide barrier has full-fence semantics with an additional guarantee that each thread in the program effectively performs a full fence at arbitrary point synchronized with the process-wide barrier in such a way that effects of writes that precede both barriers are observable by memory operations that follow the barriers.
 
 The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process' affinity mask could be a suitable implementation.
 
-## Synchronized methods
-Synchronized methods have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
+## Synchronized methods. 
+Methods decoratied with ```MethodImpl(MethodImplOptions.Synchronized)``` attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
 
 ## Object assignment
 Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
@@ -105,7 +108,7 @@ However, the ordering sideeffects of reference assignment should not be used for
 CLR does not specify any ordering effects to the instance constructors.
 
 ## Static constructors
-All side effects of static constructor execution must happen before accessing any member of the type.
+All side effects of static constructor execution will become observable not later than effects of accessing any member of the type. Other member methods of the type, when invoked, will observe complete results of the type's static constructor execution.
 
 ## Hardware considerations
 Currently supported implementations of CLR and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions.

From e8861ecb13dd6fb0ec6a622552dffab727b318ed Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Wed, 21 Sep 2022 18:54:15 -0700
Subject: [PATCH 12/33] Volatile/Interlocked methods are atomic

---
 docs/design/specs/Memory-model.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 7fe86b15ec63b2..5f557ece0ec865 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -16,7 +16,11 @@ Native-sized integer types and pointers have alignment that matches their size o
 ## Atomic memory accesses.
 Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
 
-Note: since unmanaged pointers and managed references are always aligned to their size on the given platform, accesses of pointers and managed references are atomic.
+The following methods perform atomic memory accesses regardless of the platform.<br/>
+- `System.Threading.Interlocked` methods
+- `System.Threading.Volatile` methods
+
+**Example:** `Volatile.Read<double>(ref location)` on a 32 bit platform is atomic, while an ordinary read of `location` mat not be.
 
 ## Unmanaged memory access.
 As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.
@@ -92,7 +96,7 @@ Process-wide barrier has full-fence semantics with an additional guarantee that
 
 The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process' affinity mask could be a suitable implementation.
 
-## Synchronized methods. 
+## Synchronized methods.
 Methods decoratied with ```MethodImpl(MethodImplOptions.Synchronized)``` attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
 
 ## Object assignment

From 5a25cabd7d5a0815882381f90c5330dcb2715409 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Wed, 21 Sep 2022 18:59:44 -0700
Subject: [PATCH 13/33] Better notes about atomicity of pointers

---
 docs/design/specs/Memory-model.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 5f557ece0ec865..50dcbc504ae3d9 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -16,6 +16,10 @@ Native-sized integer types and pointers have alignment that matches their size o
 ## Atomic memory accesses.
 Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
 
+Values of unmanaged pointers are treated as native integer primitive types. Memory accesses to *properly aligned* values of unmanaged pointers are atomic.
+
+Managed references are always aligned to their size on the given platform and accesses are atomic.
+
 The following methods perform atomic memory accesses regardless of the platform.<br/>
 - `System.Threading.Interlocked` methods
 - `System.Threading.Volatile` methods

From be72e2ee268585c5c9faa1f9fdbbeeec8ace1115 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Wed, 21 Sep 2022 19:27:53 -0700
Subject: [PATCH 14/33] Apply suggestions from code review

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
---
 docs/design/specs/Memory-model.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 50dcbc504ae3d9..337aba3ebe5de9 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -20,11 +20,11 @@ Values of unmanaged pointers are treated as native integer primitive types. Memo
 
 Managed references are always aligned to their size on the given platform and accesses are atomic.
 
-The following methods perform atomic memory accesses regardless of the platform.<br/>
+The following methods perform atomic memory accesses regardless of the platform.
 - `System.Threading.Interlocked` methods
 - `System.Threading.Volatile` methods
 
-**Example:** `Volatile.Read<double>(ref location)` on a 32 bit platform is atomic, while an ordinary read of `location` mat not be.
+**Example:** `Volatile.Read<double>(ref location)` on a 32 bit platform is atomic, while an ordinary read of `location` may not be.
 
 ## Unmanaged memory access.
 As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.

From 75adccb89d12c4a9240bc743bcb5cb4998721c7e Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 23 Sep 2022 13:42:20 -0700
Subject: [PATCH 15/33] Addressing more PR feedback

---
 docs/design/specs/Memory-model.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 337aba3ebe5de9..74266ac84e22af 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -4,7 +4,7 @@
 ## ECMA 335 vs. .NET memory models.
 ECMA 335 standard defines a very weak memory model. After two decades the desire to have a flexible model did not result in considerable benefits due to hardware being more strict. On the other hand programming against ECMA model requires extra complexity to handle scenarios that are hard to comprehend and not possible to test.
 
-In the course of multiple releases .NET runtime implementations settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the CLR runtime in its current implementation with expectation of that being carried to future releases.
+In the course of multiple releases .NET runtime implementations settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the .NET runtimes in their current implementation with expectation of that being carried to future releases.
 
 ## Alignment
 When managed by the .NET runtime, variables of built-in primitive types are *properly aligned* according to the data type size. This applies to both heap and stack allocated memory.
@@ -13,6 +13,8 @@ When managed by the .NET runtime, variables of built-in primitive types are *pro
 8-byte variables are 8-byte aligned on 64 bit platforms.
 Native-sized integer types and pointers have alignment that matches their size on the given platform.
 
+The alignment of fields is not guaranteed when `FieldOffsetAttribute` is used to explicitly adjust field offsets.
+
 ## Atomic memory accesses.
 Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
 
@@ -20,7 +22,7 @@ Values of unmanaged pointers are treated as native integer primitive types. Memo
 
 Managed references are always aligned to their size on the given platform and accesses are atomic.
 
-The following methods perform atomic memory accesses regardless of the platform.
+The following methods perform atomic memory accesses regardless of the platform when the location of the variable is managed by the runtime. 
 - `System.Threading.Interlocked` methods
 - `System.Threading.Volatile` methods
 
@@ -101,7 +103,7 @@ Process-wide barrier has full-fence semantics with an additional guarantee that
 The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process' affinity mask could be a suitable implementation.
 
 ## Synchronized methods.
-Methods decoratied with ```MethodImpl(MethodImplOptions.Synchronized)``` attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
+Methods decorated with ```MethodImpl(MethodImplOptions.Synchronized)``` attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
 
 ## Object assignment
 Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
@@ -113,13 +115,13 @@ However, the ordering sideeffects of reference assignment should not be used for
 -	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.
 
 ## Instance constructors
-CLR does not specify any ordering effects to the instance constructors.
+.NET runtime does not specify any ordering effects to the instance constructors.
 
 ## Static constructors
 All side effects of static constructor execution will become observable not later than effects of accessing any member of the type. Other member methods of the type, when invoked, will observe complete results of the type's static constructor execution.
 
 ## Hardware considerations
-Currently supported implementations of CLR and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these too as the large body of preexisting software will make it burdensome to break common assumptions.
+Currently supported implementations of .NET runtime and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these as well because the large body of preexisting software will make it burdensome to break common assumptions.
 
 * Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.
 That applies even for locations targeted by overlapping aligned reads and writes of different sizes.
@@ -140,7 +142,7 @@ Either the platform defaults to release consistency or stronger (i.e. x64 is TSO
 (Some versions of Alpha processors did not support this, most current architectures do)
 
 ## Examples and common patterns:
-The following examples work correctly on all supported CLR implementations regardless of the target OS or architecture.
+The following examples work correctly on all supported implementations of .NET runtime regardless of the target OS or architecture.
 
 *   Constructing an instance and sharing with another thread is safe and does not require explicit fences.
 

From eec3b9645691ca161fe637c05a4ea16ad3eb3d18 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 23 Sep 2022 14:22:21 -0700
Subject: [PATCH 16/33] Updated singleton sample for more clarity.

---
 docs/design/specs/Memory-model.md | 66 +++++++++++++++++--------------
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 74266ac84e22af..70ae0b3437998d 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -187,28 +187,32 @@ void ThreadFunc2()
 * Singleton (using a lock)
 
 ```cs
+public class Singleton
+{
+    private static readonly object _lock = new object();
+    private static Singleton _inst;
 
-private readonly object _lock = new object();
-private MyClass _inst;
+    private Singleton() { }
 
-public MyClass GetSingleton()
-{
-    if (_inst == null)
+    public static Singleton GetInstance()
     {
-        lock (_lock)
+        if (_inst == null)
         {
-            // taking a lock is an acquire, the read of _inst will happen after taking the lock
-            // releasing a lock is a release, if another thread assigned _inst, the write will be observed no later than the release of the lock
-            // thus if another thread initialized the singleton, the current thread is guaranteed to see that here.
-
-            if (_inst == null)
+            lock (_lock)
             {
-                _inst = new MyClass();
+                // taking a lock is an acquire, the read of _inst will happen after taking the lock
+                // releasing a lock is a release, if another thread assigned _inst, the write will be observed no later than the release of the lock
+                // thus if another thread initialized the _inst, the current thread is guaranteed to see that here.
+
+                if (_inst == null)
+                {
+                    _inst = new Singleton();
+                }
             }
         }
+
+        return _inst;
     }
-    
-    return _inst;
 }
 
 ```
@@ -217,24 +221,28 @@ public MyClass GetSingleton()
 * Singleton (using an interlocked operation)
 
 ```cs
-private MyClass _inst;
-
-public MyClass GetSingleton()
+public class Singleton
 {
-    MyClass localInst = _inst;
-    
-    if (localInst == null)
+    private static Singleton _inst;
+
+    private Singleton() { }
+
+    public static Singleton GetInstance()
     {
-        // unlike the example with the lock, we may construct multiple instances
-        // only one will "win" and become a unique singleton object
-        Interlocked.CompareExchange(ref _inst, new MyClass(), null);
-        
-        // since Interlocked.CompareExchange is a full fence,
-        // we cannot possibly read null or some other spurious instance that is not the singleton
-        localInst = _inst;
+        Singleton localInst = _inst;
+        if (localInst == null)
+        {
+            // unlike the example with the lock, we may construct multiple instances
+            // only one will "win" and become a unique singleton object
+            Interlocked.CompareExchange(ref _inst, new Singleton(), null);
+
+            // since Interlocked.CompareExchange is a full fence,
+            // we cannot possibly read null or some other spurious instance that is not the singleton
+            localInst = _inst;
+        }
+
+        return localInst;
     }
-    
-    return localInst;
 }
 ```
 

From aaeadfc1e1078139b657037dafd386212e06b94b Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 23 Sep 2022 14:27:52 -0700
Subject: [PATCH 17/33] Trailing whitespace.

---
 docs/design/specs/Memory-model.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 70ae0b3437998d..92d334b14d4aaf 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -22,7 +22,7 @@ Values of unmanaged pointers are treated as native integer primitive types. Memo
 
 Managed references are always aligned to their size on the given platform and accesses are atomic.
 
-The following methods perform atomic memory accesses regardless of the platform when the location of the variable is managed by the runtime. 
+The following methods perform atomic memory accesses regardless of the platform when the location of the variable is managed by the runtime.
 - `System.Threading.Interlocked` methods
 - `System.Threading.Volatile` methods
 

From 4cdac19bb26a9eb77922ddb2d65544b86cb2e5ed Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Mon, 26 Sep 2022 15:11:43 -0700
Subject: [PATCH 18/33] Move data dependent reads to general section

---
 docs/design/specs/Memory-model.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 92d334b14d4aaf..e849853328dc34 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -105,6 +105,10 @@ The actual implementation may vary depending on the platform. For example interr
 ## Synchronized methods.
 Methods decorated with ```MethodImpl(MethodImplOptions.Synchronized)``` attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
 
+## Data-dependent reads are ordered.
+In all implementations of .NET runtime, memory ordering honors data dependency. When performing indirect reads from a location derived from a reference, it is guaranteed that reading of the data will not happen ahead of obtaining the reference.
+**Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
+
 ## Object assignment
 Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
 The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
@@ -137,9 +141,8 @@ That applies even for locations targeted by overlapping aligned reads and writes
 *	It is possible to implement release consistency memory model.
 Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
 
-*	Memory ordering honors data dependency
-**Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
-(Some versions of Alpha processors did not support this, most current architectures do)
+*	It is possible to guarantee ordering of data dependent reads.
+Either the platform honors data dependedncy by default (all currently supported platforms), or provides means to order data dependent reads via fencing operations.
 
 ## Examples and common patterns:
 The following examples work correctly on all supported implementations of .NET runtime regardless of the target OS or architecture.

From 05694e31086740502cc6473cf45aa98033a5d815 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Mon, 26 Sep 2022 15:29:56 -0700
Subject: [PATCH 19/33] Compat disambiguation note on object assignments.

---
 docs/design/specs/Memory-model.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index e849853328dc34..0e6b408df4ff9d 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -118,6 +118,8 @@ However, the ordering sideeffects of reference assignment should not be used for
 -	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler.
 -	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.
 
+There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document.
+
 ## Instance constructors
 .NET runtime does not specify any ordering effects to the instance constructors.
 

From 804942fa54d90822dbc1bc9b5cb0e9fb4ae81a02 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Mon, 26 Sep 2022 15:30:54 -0700
Subject: [PATCH 20/33] Apply suggestions from code review

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
---
 docs/design/specs/Memory-model.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 0e6b408df4ff9d..0bdc9db6177a6e 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -15,7 +15,7 @@ Native-sized integer types and pointers have alignment that matches their size o
 
 The alignment of fields is not guaranteed when `FieldOffsetAttribute` is used to explicitly adjust field offsets.
 
-## Atomic memory accesses.
+## Atomic memory accesses
 Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
 
 Values of unmanaged pointers are treated as native integer primitive types. Memory accesses to *properly aligned* values of unmanaged pointers are atomic.
@@ -106,7 +106,7 @@ The actual implementation may vary depending on the platform. For example interr
 Methods decorated with ```MethodImpl(MethodImplOptions.Synchronized)``` attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
 
 ## Data-dependent reads are ordered.
-In all implementations of .NET runtime, memory ordering honors data dependency. When performing indirect reads from a location derived from a reference, it is guaranteed that reading of the data will not happen ahead of obtaining the reference.
+Memory ordering honors data dependency. When performing indirect reads from a location derived from a reference, it is guaranteed that reading of the data will not happen ahead of obtaining the reference.
 **Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
 
 ## Object assignment

From f1e13ede5691bac7e5515b8ddc906031087d1b83 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Mon, 26 Sep 2022 15:34:27 -0700
Subject: [PATCH 21/33] No dots at title ends

---
 docs/design/specs/Memory-model.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 0bdc9db6177a6e..a0d9f29e083715 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -1,7 +1,7 @@
 
 # .NET memory model
 
-## ECMA 335 vs. .NET memory models.
+## ECMA 335 vs. .NET memory models
 ECMA 335 standard defines a very weak memory model. After two decades the desire to have a flexible model did not result in considerable benefits due to hardware being more strict. On the other hand programming against ECMA model requires extra complexity to handle scenarios that are hard to comprehend and not possible to test.
 
 In the course of multiple releases .NET runtime implementations settled around a memory model that is a practical compromise between what can be implemented efficiently on the current hardware, while staying reasonably approachable by the developers. This document rationalizes the invariants provided and expected by the .NET runtimes in their current implementation with expectation of that being carried to future releases.
@@ -28,7 +28,7 @@ The following methods perform atomic memory accesses regardless of the platform
 
 **Example:** `Volatile.Read<double>(ref location)` on a 32 bit platform is atomic, while an ordinary read of `location` may not be.
 
-## Unmanaged memory access.
+## Unmanaged memory access
 As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.
 **Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.
 
@@ -40,7 +40,7 @@ These facilities ensure fault-free access to potentially unaligned locations, bu
 
 As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop results in Undefined Behavior.
 
-## Sideeffects and optimizations of memory accesses.
+## Sideeffects and optimizations of memory accesses
 .NET runtime assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
 
 As a consequence:
@@ -50,14 +50,14 @@ As a consequence:
 * Adjacent nonvolatile reads from the same location can be coalesced.
 * Adjacent nonvolatile writes to the same location can be coalesced.
 
-## Thread-local memory accesses.
+## Thread-local memory accesses
 It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.
 
-## Cross-thread access to local variables.
+## Cross-thread access to local variables
 -	There is no type-safe mechanism for accessing locations on one thread’s stack from another thread.
 -	Accessing managed references located on the stack of a different thread by the means of unsafe code will result in Undefined Behavior.
 
-## Order of memory operations.
+## Order of memory operations
 * **Ordinary memory accesses**
 The effects of ordinary reads and writes can be reordered as long as that preserves single-thread consistency. Such reordering can happen both due to code generation strategy of the compiler or due to weak memory ordering in the hardware.
 
@@ -94,7 +94,7 @@ It may be possible for an optimizing compiler to prove that some data is accessi
      - `System.Thread.MemoryBarrier`
      - `System.Threading.Interlocked` methods
 
-## C# `volatile` feature.
+## C# `volatile` feature
 One common way to introduce volatile memory accesses is by using C# `volatile` language feature. Declaring a field as `volatile` does not have any effect on how .NET runtime treats the field. The decoration works as a hint to the C# compiler itself (and compilers for other .Net languages) to emit reads and writes of such field as  reads and writes with `volatile.` prefix.
 
 ## Process-wide barrier
@@ -102,10 +102,10 @@ Process-wide barrier has full-fence semantics with an additional guarantee that
 
 The actual implementation may vary depending on the platform. For example interrupting the execution of every core in the current process' affinity mask could be a suitable implementation.
 
-## Synchronized methods.
+## Synchronized methods
 Methods decorated with ```MethodImpl(MethodImplOptions.Synchronized)``` attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
 
-## Data-dependent reads are ordered.
+## Data-dependent reads are ordered
 Memory ordering honors data dependency. When performing indirect reads from a location derived from a reference, it is guaranteed that reading of the data will not happen ahead of obtaining the reference.
 **Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
 
@@ -146,7 +146,7 @@ Either the platform defaults to release consistency or stronger (i.e. x64 is TSO
 *	It is possible to guarantee ordering of data dependent reads.
 Either the platform honors data dependedncy by default (all currently supported platforms), or provides means to order data dependent reads via fencing operations.
 
-## Examples and common patterns:
+## Examples and common patterns
 The following examples work correctly on all supported implementations of .NET runtime regardless of the target OS or architecture.
 
 *   Constructing an instance and sharing with another thread is safe and does not require explicit fences.
@@ -251,7 +251,7 @@ public class Singleton
 }
 ```
 
-* Communicating with another thread by checking a flag.
+* Communicating with another thread by checking a flag
 
 ```cs
 internal class Program

From d7c179c20bed99034b1203546268aedf323346ba Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Mon, 26 Sep 2022 15:45:40 -0700
Subject: [PATCH 22/33] "Data-dependent" spelled with dash consistently

---
 docs/design/specs/Memory-model.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index a0d9f29e083715..6e00d0b0e76705 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -112,7 +112,7 @@ Memory ordering honors data dependency. When performing indirect reads from a lo
 ## Object assignment
 Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
 The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
-The reading thread does not need to perform an acquiring read before accessing the content of an instance since all supported platforms honor ordering of data-dependent reads.
+The reading thread does not need to perform an acquiring read before accessing the content of an instance since runtime guarantees ordering of data-dependent reads.
 
 However, the ordering sideeffects of reference assignment should not be used for general ordering purposes because:
 -	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler.
@@ -143,8 +143,8 @@ That applies even for locations targeted by overlapping aligned reads and writes
 *	It is possible to implement release consistency memory model.
 Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
 
-*	It is possible to guarantee ordering of data dependent reads.
-Either the platform honors data dependedncy by default (all currently supported platforms), or provides means to order data dependent reads via fencing operations.
+*	It is possible to guarantee ordering of data-dependent reads.
+Either the platform honors data dependedncy by default (all currently supported platforms), or provides means to order data-dependent reads via fencing operations.
 
 ## Examples and common patterns
 The following examples work correctly on all supported implementations of .NET runtime regardless of the target OS or architecture.
@@ -182,7 +182,7 @@ void ThreadFunc2()
         // accessing members of the local object is safe because
         // - reads cannot be introduced, thus localObj cannot be re-read and become null
         // - publishing assignment to obj will not become visible earlier than write operations in the MyClass constructor
-        // - indirect accesses via an instance are dependent reads, thus we will see results of constructor's writes
+        // - indirect accesses via an instance are data-dependent reads, thus we will see results of constructor's writes
         System.Console.WriteLine(localObj.ToString());
     }
 }

From 615c4345ae5d7530a6c8dd5bf0edded89e87f574 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Mon, 26 Sep 2022 20:43:41 -0700
Subject: [PATCH 23/33] Apply suggestions from code review

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
---
 docs/design/specs/Memory-model.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 6e00d0b0e76705..afc148c097bd17 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -106,7 +106,7 @@ The actual implementation may vary depending on the platform. For example interr
 Methods decorated with ```MethodImpl(MethodImplOptions.Synchronized)``` attribute have the same memory access semantics as if a lock is acquired at an entrance to the method and released upon leaving the method.
 
 ## Data-dependent reads are ordered
-Memory ordering honors data dependency. When performing indirect reads from a location derived from a reference, it is guaranteed that reading of the data will not happen ahead of obtaining the reference.
+Memory ordering honors data dependency. When performing indirect reads from a location derived from a reference, it is guaranteed that reading of the data will not happen ahead of obtaining the reference. This guarantee applies to both managed references and unmanaged pointers.
 **Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
 
 ## Object assignment

From 20b6b528b34279bb64e7331ddf81368ae14983ba Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 30 Sep 2022 17:47:57 -0700
Subject: [PATCH 24/33] Apply suggestions from code review

Co-authored-by: Aaron Robinson <arobins@microsoft.com>
---
 docs/design/specs/Memory-model.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index afc148c097bd17..48f0f03d6a775f 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -40,15 +40,15 @@ These facilities ensure fault-free access to potentially unaligned locations, bu
 
 As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop results in Undefined Behavior.
 
-## Sideeffects and optimizations of memory accesses
-.NET runtime assumes that the sideeffects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
+## Side-effects and optimizations of memory accesses
+.NET runtime assumes that the side-effects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
 
 As a consequence:
 * Speculative writes are not allowed.
 * Reads cannot be introduced.
 * Unused reads can be elided.
-* Adjacent nonvolatile reads from the same location can be coalesced.
-* Adjacent nonvolatile writes to the same location can be coalesced.
+* Adjacent non-volatile reads from the same location can be coalesced.
+* Adjacent non-volatile writes to the same location can be coalesced.
 
 ## Thread-local memory accesses
 It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.
@@ -111,10 +111,10 @@ Memory ordering honors data dependency. When performing indirect reads from a lo
 
 ## Object assignment
 Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
-The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (i.e. method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
+The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (for example, method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
 The reading thread does not need to perform an acquiring read before accessing the content of an instance since runtime guarantees ordering of data-dependent reads.
 
-However, the ordering sideeffects of reference assignment should not be used for general ordering purposes because:
+However, the ordering side-effects of reference assignment should not be used for general ordering purposes because:
 -	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler.
 -	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.
 
@@ -124,24 +124,24 @@ There was a lot of ambiguity around the guarantees provided by object assignment
 .NET runtime does not specify any ordering effects to the instance constructors.
 
 ## Static constructors
-All side effects of static constructor execution will become observable not later than effects of accessing any member of the type. Other member methods of the type, when invoked, will observe complete results of the type's static constructor execution.
+All side-effects of static constructor execution will become observable no later than effects of accessing any member of the type. Other member methods of the type, when invoked, will observe complete results of the type's static constructor execution.
 
 ## Hardware considerations
 Currently supported implementations of .NET runtime and system libraries make a few expectations about the hardware memory model. These conditions are present on all supported platforms and transparently passed to the user of the runtime. The future supported platforms will likely support these as well because the large body of preexisting software will make it burdensome to break common assumptions.
 
 * Naturally aligned reads and writes with sizes up to the platform pointer size are atomic.
 That applies even for locations targeted by overlapping aligned reads and writes of different sizes.
-**Example:** a read of a 4-byte aligned int32 variable will yield a value that existed prior some write or after some write. It will never be a mix of before/after bytes.
+**Example:** a read of a 4-byte aligned int32 variable will yield a value that existed prior to some write or after some write. It will never be a mix of before/after bytes.
 
 *	The memory is cache-coherent and writes to a single location will be seen by all cores in the same order (multicopy atomic).
-**Example:** when the same location is updated with values in ascending order (like 1,2,3,4,...), no observer will see a descending sequence.
+**Example:** when the same location is updated with values in ascending order (for example, 1,2,3,4,...), no observer will see a descending sequence.
 
 *	It may be possible for a thread to see its own writes before they appear to other cores (store buffer forwarding), as long as the single-thread consistency is not violated.
 
-*	The memory managed by the runtime is ordinary memory (not device register file or the like) and the only sideeffects of memory operations are storing and reading of values.
+*	The memory managed by the runtime is ordinary memory (not device register file or the like) and the only side-effects of memory operations are storing and reading of values.
 
 *	It is possible to implement release consistency memory model.
-Either the platform defaults to release consistency or stronger (i.e. x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
+Either the platform defaults to release consistency or stronger (that is, x64 is TSO, which is stronger), or provides means to implement release consistency via fencing operations.
 
 *	It is possible to guarantee ordering of data-dependent reads.
 Either the platform honors data dependedncy by default (all currently supported platforms), or provides means to order data-dependent reads via fencing operations.

From 55793439763d6459590f1d02016916f2be43214b Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Wed, 2 Nov 2022 19:19:05 -0700
Subject: [PATCH 25/33] Apply suggestions from code review

Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
---
 docs/design/specs/Memory-model.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 48f0f03d6a775f..171b1363e50e81 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -30,7 +30,7 @@ The following methods perform atomic memory accesses regardless of the platform
 
 ## Unmanaged memory access
 As unmanaged pointers can point to any addressable memory, operations with such pointers may violate guarantees provided by the runtime and expose undefined or platform-specific behavior.
-**Example:** memory accesses through pointers which are *not properly aligned* may be not atomic or cause faults depending on the platform and hardware configuration.
+**Example:** memory accesses through pointers whose target address is *not properly aligned* to the data access size may be not atomic or cause faults depending on the platform and hardware configuration.
 
 Although rare, unaligned access is a realistic scenario and thus there is some limited support for unaligned memory accesses, such as:
 * `unaligned.` IL prefix
@@ -41,7 +41,7 @@ These facilities ensure fault-free access to potentially unaligned locations, bu
 As of this writing there is no specific support for operating with incoherent memory, device memory or similar. Passing non-ordinary memory to the runtime by the means of pointer operations or native interop results in Undefined Behavior.
 
 ## Side-effects and optimizations of memory accesses
-.NET runtime assumes that the side-effects of memory reads and writes include only changing and observing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
+.NET runtime assumes that the side-effects of memory reads and writes include only observing and changing values at specified memory locations. This applies to all reads and writes - volatile or not. **This is different from ECMA model.**
 
 As a consequence:
 * Speculative writes are not allowed.

From 31431ed16d53e9e136f844f30d174c6242584318 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Wed, 2 Nov 2022 19:23:52 -0700
Subject: [PATCH 26/33] order of object assignment and data-dependent memory
 accesses

---
 docs/design/specs/Memory-model.md | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 171b1363e50e81..3fb8b8133134cb 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -107,15 +107,28 @@ Methods decorated with ```MethodImpl(MethodImplOptions.Synchronized)``` attribut
 
 ## Data-dependent reads are ordered
 Memory ordering honors data dependency. When performing indirect reads from a location derived from a reference, it is guaranteed that reading of the data will not happen ahead of obtaining the reference. This guarantee applies to both managed references and unmanaged pointers.
+
 **Example:** reading a field, will not use a cached value fetched from the location of the field prior obtaining a reference to the instance.
+ ```cs
+var x = nonlocal.a.b;
+var y = nonlocal.a;
+var z = y.b;
+
+// cannot have execution order as:
+
+var x = nonlocal.a.b;
+var y = nonlocal.a;
+var z = x;
+```
 
 ## Object assignment
-Object assignment to a location potentially accessible by other threads is a release with respect to write operations to the instance’s fields and metadata.
+Object assignment to a location potentially accessible by other threads is a release with respect to accesses to the instance’s fields/elements and metadata. An optimizing compiler must preserve the order of object assignment and data-dependent memory accesses.
+
 The motivation is to ensure that storing an object reference to shared memory acts as a "committing point" to all modifications that are reachable through the instance reference. It also guarantees that a freshly allocated instance is valid (for example, method table and necessary flags are set) when other threads, including background GC threads are able to access the instance.
 The reading thread does not need to perform an acquiring read before accessing the content of an instance since runtime guarantees ordering of data-dependent reads.
 
-However, the ordering side-effects of reference assignment should not be used for general ordering purposes because:
--	ordinary reference assignments are still treated as ordinary assignments and could be reordered by the compiler.
+The ordering side-effects of reference assignment should not be used for general ordering purposes because:
+-	independent nonvolatile reference assignments could be reordered by the compiler.
 -	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.
 
 There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document.

From 8b23a886837eb10694ec9bf446e5922260024c2a Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Thu, 15 Dec 2022 19:58:05 -0800
Subject: [PATCH 27/33] Listed primitive types.

---
 docs/design/specs/Memory-model.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 3fb8b8133134cb..eddd85c02b29a9 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -16,7 +16,9 @@ Native-sized integer types and pointers have alignment that matches their size o
 The alignment of fields is not guaranteed when `FieldOffsetAttribute` is used to explicitly adjust field offsets.
 
 ## Atomic memory accesses
-Memory accesses to *properly aligned* data of primitive types are always atomic. The value that is observed is always a result of complete read and write operations.
+Memory accesses to *properly aligned* data of primitive and Enum types with size with sizes up to the platform pointer size are always atomic. The value that is observed is always a result of complete read and write operations.
+
+Primitive types: bool, char, int8, uint8, int16, uint16, int32, uint32, int64, uint64, float32, float64, native int, native unsigned int.
 
 Values of unmanaged pointers are treated as native integer primitive types. Memory accesses to *properly aligned* values of unmanaged pointers are atomic.
 

From e02a5df9390fc1265e66a11e1384717c0216a9ea Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Thu, 15 Dec 2022 20:21:33 -0800
Subject: [PATCH 28/33] Briefly explained motivations for the treatment of
 memory access sideeffects.

---
 docs/design/specs/Memory-model.md | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index eddd85c02b29a9..c8f10898b29ada 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -48,10 +48,16 @@ As of this writing there is no specific support for operating with incoherent me
 As a consequence:
 * Speculative writes are not allowed.
 * Reads cannot be introduced.
-* Unused reads can be elided.
+* Unused reads can be elided. (note: if a read can cause a fault it is not "unused")
 * Adjacent non-volatile reads from the same location can be coalesced.
 * Adjacent non-volatile writes to the same location can be coalesced.
 
+The practical motivations for these rules are:
+- We can't allow speculative writes as we consider changing the value to be observable, thus effects of a speculative write may not be possible to undo.
+- A read cannot be re-done, since it could fetch a different value and thus introduce a data race that the program did not have.
+- Reading from a variable and not observing sideeffects of the read is the same as not performing a read, thus unused reads can be removed.
+- Coalescing of adjacent ordinary memory accesses to the same location is ok because most programs do not rely on presence of data races thus, unlike introducing, removing data races is ok. Programs that do rely on observing data races shall use `volatile` accesses.
+
 ## Thread-local memory accesses
 It may be possible for an optimizing compiler to prove that some data is accessible only by a single thread. In such case it is permitted to perform further optimizations such as duplicating or removal of memory accesses.
 

From 280014d5cd118a4aa418a83c2870f7081a244cb8 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 16 Dec 2022 00:58:37 -0800
Subject: [PATCH 29/33] Update docs/design/specs/Memory-model.md

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
---
 docs/design/specs/Memory-model.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index c8f10898b29ada..bd8552eff98140 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -293,5 +293,4 @@ internal class Program
         System.Console.WriteLine("done");
     }
 }
-
 ```

From 608f45d69bc8e1a0cd0f856f570192f856acb9b9 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 16 Dec 2022 11:12:49 -0800
Subject: [PATCH 30/33] Link to the data-dependent accesses and compiler
 optimizations followup issue.

---
 docs/design/specs/Memory-model.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index bd8552eff98140..9e73b196fee7f9 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -139,7 +139,9 @@ The ordering side-effects of reference assignment should not be used for general
 -	independent nonvolatile reference assignments could be reordered by the compiler.
 -	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.
 
-There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document.
+There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document. 
+
+_It is beleived that compiler optimizations do not violate the ordering guarantees in sections about [data-dependent reads](~#data-dependent-reads-are-ordered) and [object assignments]([~#object-assignment), but further investigations are needed to ensure compliance or to fix possible violations. That is tracked by the following issue:_ https://github.com/dotnet/runtime/issues/79764
 
 ## Instance constructors
 .NET runtime does not specify any ordering effects to the instance constructors.

From ed89b0473ccaae0d0d73868947a5dc5cf1c5fb83 Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 16 Dec 2022 11:15:08 -0800
Subject: [PATCH 31/33] removed unnecessary `[`

---
 docs/design/specs/Memory-model.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index 9e73b196fee7f9..c38ac87b217153 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -141,7 +141,7 @@ The ordering side-effects of reference assignment should not be used for general
 
 There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document. 
 
-_It is beleived that compiler optimizations do not violate the ordering guarantees in sections about [data-dependent reads](~#data-dependent-reads-are-ordered) and [object assignments]([~#object-assignment), but further investigations are needed to ensure compliance or to fix possible violations. That is tracked by the following issue:_ https://github.com/dotnet/runtime/issues/79764
+_It is beleived that compiler optimizations do not violate the ordering guarantees in sections about [data-dependent reads](~#data-dependent-reads-are-ordered) and [object assignments](~#object-assignment), but further investigations are needed to ensure compliance or to fix possible violations. That is tracked by the following issue:_ https://github.com/dotnet/runtime/issues/79764
 
 ## Instance constructors
 .NET runtime does not specify any ordering effects to the instance constructors.

From 136e63799cf612e7d353353f10d3347404789836 Mon Sep 17 00:00:00 2001
From: Jan Kotas <jkotas@microsoft.com>
Date: Fri, 16 Dec 2022 11:16:59 -0800
Subject: [PATCH 32/33] Update docs/design/specs/Memory-model.md

---
 docs/design/specs/Memory-model.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index c38ac87b217153..d141ecb664a736 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -141,7 +141,7 @@ The ordering side-effects of reference assignment should not be used for general
 
 There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document. 
 
-_It is beleived that compiler optimizations do not violate the ordering guarantees in sections about [data-dependent reads](~#data-dependent-reads-are-ordered) and [object assignments](~#object-assignment), but further investigations are needed to ensure compliance or to fix possible violations. That is tracked by the following issue:_ https://github.com/dotnet/runtime/issues/79764
+_It is believed that compiler optimizations do not violate the ordering guarantees in sections about [data-dependent reads](~#data-dependent-reads-are-ordered) and [object assignments](~#object-assignment), but further investigations are needed to ensure compliance or to fix possible violations. That is tracked by the following issue:_ https://github.com/dotnet/runtime/issues/79764
 
 ## Instance constructors
 .NET runtime does not specify any ordering effects to the instance constructors.

From 34a074de4b91e856be9c65bed8823bce4060757c Mon Sep 17 00:00:00 2001
From: Vladimir Sadov <vsadov@microsoft.com>
Date: Fri, 16 Dec 2022 11:21:37 -0800
Subject: [PATCH 33/33] Trailing whitespace

---
 docs/design/specs/Memory-model.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/design/specs/Memory-model.md b/docs/design/specs/Memory-model.md
index d141ecb664a736..f184ac222f6162 100644
--- a/docs/design/specs/Memory-model.md
+++ b/docs/design/specs/Memory-model.md
@@ -139,7 +139,7 @@ The ordering side-effects of reference assignment should not be used for general
 -	independent nonvolatile reference assignments could be reordered by the compiler.
 -	an optimizing compiler can omit the release semantics if it can prove that the instance is not shared with other threads.
 
-There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document. 
+There was a lot of ambiguity around the guarantees provided by object assignments. Going forward the runtimes will only provide the guarantees described in this document.
 
 _It is believed that compiler optimizations do not violate the ordering guarantees in sections about [data-dependent reads](~#data-dependent-reads-are-ordered) and [object assignments](~#object-assignment), but further investigations are needed to ensure compliance or to fix possible violations. That is tracked by the following issue:_ https://github.com/dotnet/runtime/issues/79764