Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Commit

Permalink
Documentation naming fixes, spelling fixes, line length standardizati…
Browse files Browse the repository at this point in the history
…on. (#1836)

* Documentation naming fixes, spelling fixes, line length standardization.

* Fixed workitems copy issue
  • Loading branch information
derekbjohnson authored and ahsonkhan committed Oct 10, 2017
1 parent 4446c23 commit ca88785
Show file tree
Hide file tree
Showing 9 changed files with 375 additions and 222 deletions.
10 changes: 5 additions & 5 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# High-Performance Data Pipelines

Our goal is to optimize .NET much more for scenarios in which inefficiences are
Our goal is to optimize .NET much more for scenarios in which inefficiencies are
directly tied to your monthly billing. With ASP.NET Core, we've already
[improved significantly][TechEmpower13] and are now in the top 10 for the plain
text benchmark. But we believe there is still a lot more potential that we could
Expand All @@ -18,8 +18,8 @@ Consider an example for a typical web request:

![](./img/pipeline.png)

Most cloud apps are parellelized by running multiple requests at the same time
while each requests is often executed as a single chain. This results in the
Most cloud apps are parallelized by running multiple requests at the same time
while each request is often executed as a single chain. This results in the
picture above where all components are daisy-chained. That means slowing down
one component will slow down the entire request.

Expand Down Expand Up @@ -63,10 +63,10 @@ we're seeking to improve is scalability:
>
> --- [Wikipedia](https://en.wikipedia.org/wiki/Scalability)
Many areas affect scale but an efficient request/response pipeline is key as
Many areas affect scale, but an efficient request/response pipeline is key as
it's the backbone for all cloud solutions.

Other investments (faster GC, better JIT, AOT) aren't a replacement but will
Other investments (faster GC, better JIT, AOT) aren't a replacement, but will
provide additional benefits.

## Current areas of concerns
Expand Down
18 changes: 12 additions & 6 deletions docs/specs/encoding.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
# Encoding

The existing .NET encoding APIs (System.Text.Encoding) don't work super well in non-allocating data pipelines:
The existing .NET encoding APIs (System.Text.Encoding) don't work super well in
non-allocating data pipelines:

1. They allocate output arrays (as opposed to taking output buffers are parameters).
1. They allocate output arrays (as opposed to taking output buffers are
parameters).
2. They don't work with Span\<T\>
3. They have the overhead of virtual calls (which might be significant to transcoding small slices)
3. They have the overhead of virtual calls (which might be significant to
transcoding small slices)

The requirements for the new APIs are as follows:

1. Transcode bytes contained in ReadOnlySpan<byte> into a passed in Span<byte>
2. The API needs to handle running out of space in the output buffer and then continuing when an additional output buffer is passed in.
2. The API needs to handle running out of space in the output buffer and then
continuing when an additional output buffer is passed in.
3. The API needs to be fast (on par with native code implementations)
4. Needs to do this with zero GC allocations
5. Needs to be stateless (multithreaded)
6. We need to support transcoding between UTF8, UTF16LE, ISO8859-1, and be able to support other encodings in the future.
6. We need to support transcoding between UTF8, UTF16LE, ISO8859-1, and be able
to support other encodings in the future.

This document will describe these issues in detail and propose new APIs better suited for data pipelines.
This document will describe these issues in detail and propose new APIs better
suited for data pipelines.
103 changes: 61 additions & 42 deletions docs/specs/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ to the `Span<T>`. `Memory<T>` does not provide such guarantee.
```c#
byte[] array = _pool.Rent(size);
var buffer = new Memory<byte>(array);
DoSomething(buffer); // so DoSomething could store the buffer in a static, for example.
DoSomething(buffer); // so DoSomething could store the buffer in a static, for
// example.
_pool.Return(array); // if we return it here, we risk use-after-free bugs.
```

Expand Down Expand Up @@ -174,69 +175,71 @@ public class OwnedMemory<T> : IDisposable {

## Bonus Features

The design based on OwnedMemory\<T\> has several limitations that we might want to
address. The limitations and potential solutions are described in this section.
The design based on OwnedMemory\<T\> has several limitations that we might want
to address. The limitations and potential solutions are described in this
section.

### Pooling OwnedMemory\<T\>

A new OwnedMemory\<T\> instance must be allocated for every revocation, i.e.
although buffers can be pooled, the instances of OwnedMemory\<T\> cannot. The
instances cannot be pooled because they cannot be reused (resurrected). If
OwnedMemory\<T\> was resurrected, all previous Memory\<T\> values that were
supposed to be invalidated when the memory was returned to the pool,
OwnedMemory\<T\> was resurrected, all previous Memory\<T\> values that were
supposed to be invalidated when the memory was returned to the pool,
would point to the new reused OwnedMemory\<T\>.
```c#
// pseudocode
OwnedMemory<byte> owned1 = s_stack.Pop(); // rent from pool
Memory<byte> memory = owned.Memory;
owned.Dispose();
s_stack.Push(owned); // return to pool
OwnedMemory<byte> owned2 = s_stack.Pop().Ressurect(); // rent again
OwnedMemory<byte> owned2 = s_stack.Pop().Resurrect(); // rent again
memory.Span[0] = 0; // this should fail; owned1 was returned to the pool
```
We could address this limitation by storing a unique ID in OwnedMemory\<T\> when
We could address this limitation by storing a unique ID in OwnedMemory\<T\> when
it's rented from a pool, and giving this ID to all Memory\<T\> instances created
from this OwnedMemory\<T\>. All Memory\<T\> instances would have to pass this ID
to OwnedMeory\<T\> when requesting operations that access the memory.
The IDs would have to match for the operation to succeed.
The ID in OwnedMemory\<T\> would be changed when OwnedMemory\<T\> is disposed
and returned to the pool, at which point all existing Memory\<T\> instances
to OwnedMemory\<T\> when requesting operations that access the memory.
The IDs would have to match for the operation to succeed.
The ID in OwnedMemory\<T\> would be changed when OwnedMemory\<T\> is disposed
and returned to the pool, at which point all existing Memory\<T\> instances
could not access it anymore.
```C#
public struct Memory<T> {
OwnedMemory<T> _owner;
long _id;
int _index;
int _length;

public Span<T> Span => _owner.GetSpanInternal(_id).Slice(_index, _length);
}
public class OwnedMemory<T> {
int _id = GenerateNewId();

public Memory<T> Memory => new Memory<T>(this, Id);

internal Span<T> GetSpanInternal(long id) {
if (_id != id) ThrowIdHelper();
return Span;
}
}

public void Dispose() => _id = int.MinValue;
public void Ressurect() => _id = GenerateNewId();

public void Resurrect() => _id = GenerateNewId();
}
```
The main negatives of this approach are additional complexity and larger size of Memory<T> instances.
The current prototype in the corfxlab repo implements this feature: https://github.com/dotnet/corefxlab/blob/master/src/System.Slices/System/Buffers/Memory.cs#L14
The main negatives of this approach are additional complexity and larger size of
Memory<T> instances. The current prototype in the corfxlab repo implements this feature: https://github.com/dotnet/corefxlab/blob/master/src/System.Slices/System/Buffers/Memory.cs#L14

### IOwnedMemory\<T\>

OwnedMemory\<T\> is a wrapper over actual memory buffers.
When wrapping T[], System.String, and other reference type buffers,
OwnedMemory\<T\> is a wrapper over actual memory buffers.
When wrapping T[], System.String, and other reference type buffers,
we end up with two managed objects for each instance of OwnedMemory\<T\>.
If OwnedMemory\<T\> were an interface, the interface could be implemented directly by types representing memory buffers,
and such types could directly interoperate with Memory\<T\> (and so Span\<T\>).
If OwnedMemory\<T\> were an interface, the interface could be implemented
directly by types representing memory buffers, and such types could directly
interoperate with Memory\<T\> (and so Span\<T\>).
```c#
// pseudocode
public class String : IOwnedMemory<char> {
Expand All @@ -250,9 +253,13 @@ A proof of concept of such feature is implemented in https://github.com/Krzyszto

This feature has the following tradeoffs:

1. (good) It allows reference types that cannot inherit from OwnedMemory\<T\> to directly interoperate with Memory\<T\> without the need to allocate wrappers.
2. (bad) This feature does not work with types that need object pooling support, as there is no indirection between Memory\<T\> and IOwnedMemory\<T\> instances.
3. (ugly) The feature makes some of the hot path method calls (in particular IOwnedMemory\<T\>.GetSpan) virtual/interface dispatch. We have some anecdotal evidence this is unacceptable in some performance critical scenarios.
1. (good) It allows reference types that cannot inherit from OwnedMemory\<T\> to
directly interoperate with Memory\<T\> without the need to allocate wrappers.
2. (bad) This feature does not work with types that need object pooling support,
as there is no indirection between Memory\<T\> and IOwnedMemory\<T\> instances.
3. (ugly) The feature makes some of the hot path method calls (in particular
IOwnedMemory\<T\>.GetSpan) virtual/interface dispatch. We have some anecdotal
evidence this is unacceptable in some performance critical scenarios.

### Safe Dispose

Expand All @@ -263,24 +270,32 @@ var owned = new OwnedNativeMemory(1024);
var memory = owned.Memory;
var span1 = memory.Span; // of course works as it should
Task.Run(()=>{
var span2 = memory.Span;
// the following line is unsafe, if the Dispose call (below) executes at this point.
span2[0] = 0;
var span2 = memory.Span;
// the following line is unsafe, if the Dispose call (below) executes at
// this point.
span2[0] = 0;
});
owned.Dispose();
var span3 = memory.Span; // fails as it should because the memory instance is now pointing to Disposed OwnedMemory\<T\>.
var span3 = memory.Span; // fails as it should because the memory instance is
// now pointing to Disposed OwnedMemory\<T\>.
```
In other words, the design is subject to a race condition when one thread uses a Span\<T\> created before its OwnedMemory\<\T> is disposed, but accessed after it is disposed.
In other words, the design is subject to a race condition when one thread uses a
Span\<T\> created before its OwnedMemory\<\T> is disposed, but accessed after it
is disposed.

We could address this issue with reference counting Span\<T\> when it is created from Memory\<T\>.
When OwnedMemory\<T\>.Dispose is called, we would check the reference count, and fail the call if the count was positive.
We could address this issue with reference counting Span\<T\> when it is created
from Memory\<T\>. When OwnedMemory\<T\>.Dispose is called, we would check the
reference count, and fail the call if the count was positive.

But reference counting comes with all its drawbacks:

1. It significantly impacts performance. In our case, every Memory\<T\>.Span call would result in reference counting.
2. Unless we implement automatic reference counting, we would be risking leaks resulting from hand written code not releasing the counts.
1. It significantly impacts performance. In our case, every Memory\<T\>.Span
call would result in reference counting.
2. Unless we implement automatic reference counting, we would be risking leaks
resulting from hand written code not releasing the counts.

To illustrate, let's consider the following method that formats an int into a memory buffer:
To illustrate, let's consider the following method that formats an int into a
memory buffer:
```c#
public static bool Append(this Formatter formatter, int value) {
Memory<T> memory = formatter.Buffer;
Expand All @@ -294,7 +309,8 @@ public static bool Append(this Formatter formatter, int value) {
A real example of such routine can be found at:
https://github.com/dotnet/corefxlab/blob/master/src/System.Text.Formatting/System/Text/Formatting/IOutputExtensions.cs#L152
If we reference counted all calls to retrieve Span\<T\> from Memory\<T\>, the routine would look like the following:
If we reference counted all calls to retrieve Span\<T\> from Memory\<T\>, the
routine would look like the following:
```c#
public static bool Append(this Formatter formatter, int value) {
Memory<T> memory = formatter.Buffer;
Expand All @@ -307,11 +323,14 @@ public static bool Append(this Formatter formatter, int value) {
formatter.Advance(bytesWritten);
}
```
This would result in expensive reference count manipulation for every integer being written.

Alternatively, formatters (and other types that need to access Span\<T\>) could increment the reference when they are instantiated,
but then users of such formatters would have to ensure that the formatters be disposed properly or risk memory leaks,
or we would need C# to support ARC at least for some special types (e.g. ref-like/stack-only types).
This would result in expensive reference count manipulation for every integer
being written.

Alternatively, formatters (and other types that need to access Span\<T\>) could
increment the reference when they are instantiated, but then users of such
formatters would have to ensure that the formatters be disposed properly or
risk memory leaks, or we would need C# to support ARC at least for some special
types (e.g. ref-like/stack-only types).

## Issues/To-Design
1. Do we want the "pooling ID"? (see section above)
Expand Down
Loading

0 comments on commit ca88785

Please sign in to comment.