Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address some System.Formats.Tar TODOs (infra and syscalls) #69107

Merged
merged 22 commits into from
May 17, 2022

Conversation

carlossanlop
Copy link
Member

Contributes to #68230

Here's the first batch of pending tasks and feedback from the original PR.

  • Fix Design time build errors by removing src project as dependency of the test project.
  • Add Browser to target platform identifiers. Ensure Browser can consume the Unix specific ArchivingUtils file too.
  • Remove nullable enable from csproj since it's now default. cc @eerhardt
  • Use FileStream constructor with FileMode.CreateNew to detect and throw if destination file exists when creating archive.
  • No error checking on syscalls that do not set errno. cc @tmds
  • Add RDev field in FileStatus and retrieve it with stat/lstat so devmajor and devminor get their correct values. cc @tmds
  • Simplify some File.Open calls with direct FileStream constructor calls. Simplify FileStreamOptions objects too.
  • Implement and consume p/invokes to retrieve uname from uid and gname from gid. cc @tmds

@ghost
Copy link

ghost commented May 10, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #68230

Here's the first batch of pending tasks and feedback from the original PR.

  • Fix Design time build errors by removing src project as dependency of the test project.
  • Add Browser to target platform identifiers. Ensure Browser can consume the Unix specific ArchivingUtils file too.
  • Remove nullable enable from csproj since it's now default. cc @eerhardt
  • Use FileStream constructor with FileMode.CreateNew to detect and throw if destination file exists when creating archive.
  • No error checking on syscalls that do not set errno. cc @tmds
  • Add RDev field in FileStatus and retrieve it with stat/lstat so devmajor and devminor get their correct values. cc @tmds
  • Simplify some File.Open calls with direct FileStream constructor calls. Simplify FileStreamOptions objects too.
  • Implement and consume p/invokes to retrieve uname from uid and gname from gid. cc @tmds
Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO

Milestone: 7.0.0

entry._header._uName = "";// Interop.Sys.GetUName();
entry._header._gName = "";// Interop.Sys.GetGName();
entry._header._uName = Interop.Sys.GetUName(status.Uid) ?? string.Empty;
entry._header._gName = Interop.Sys.GetGName(status.Gid) ?? string.Empty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to cache these in a Dictionary to avoid looking them up each time over?

And, I think the user may want to be able to control the the user id and name that get stored in the tar file.

I think it would be meaningful if the API had option classes that control the creation behavior, and extraction behavior. These classes can be extended later on, rather than adding new overloads to the existing methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to cache these in a Dictionary to avoid looking them up each time over?

Good point. If I already know the uname and gname associated to a uid and gid, there's no need to retrieve them again.

And, I think the user may want to be able to control the the user id and name that get stored in the tar file.

This code path is only reached if the user calls TarWriter.WriteEntry(string fileName, string? entryName). If they wish to choose the uname, gname, uid, gid to other values than the default, they can create the TarEntry instance manually, fill out its fields, and use the other overload TarWriter.WriteEntry(TarEntry entry).

I think it would be meaningful if the API had option classes that control the creation behavior, and extraction behavior. These classes can be extended later on, rather than adding new overloads to the existing methods.

I would welcome that 🙂. We can discuss it in a separate issue.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for addressing my comments @carlossanlop !

// Output buffer was too small, loop around again and try with a larger buffer.
outputBufferSize = buffer.Length * 2;

if (outputBufferSize > 256) // Upper limit allowed for login name in kernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are limiting to 256 anyway, can't we just stackalloc byte[256] and use that? Why go through the pain of a loop and ArrayPool Rent/Return?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless it's explicitly documented (pretty much, there's some define -- eg NI_MAXHOST for getnameinfo()) I recommend that we use the stackalloc then resize loop, as in this example:

byte* stackBuf = stackalloc byte[DefaultPathBufferSize];
string? result = GetCwdHelper(stackBuf, DefaultPathBufferSize);
if (result != null)
{
return result;
}
// If that was too small, try increasing large buffer sizes
int bufferSize = DefaultPathBufferSize;
while (true)
{
checked { bufferSize *= 2; }
byte[] buf = ArrayPool<byte>.Shared.Rent(bufferSize);
try
{
fixed (byte* ptr = &buf[0])
{
result = GetCwdHelper(ptr, buf.Length);
if (result != null)
{
return result;
}
}
}
finally
{
ArrayPool<byte>.Shared.Return(buf);
}

I did find at least one case in the pal where it mallocs the buffer and resizes it for you, but it seems our pattern is to do all this in managed code.

Ideally it returns the actual length it needs, or you only loop on ERANGE. I'm not sure what is the pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use a similar loop in ReadLink:

internal static string? ReadLink(ReadOnlySpan<char> path)
{
int outputBufferSize = 1024;
// Use an initial buffer size that prevents disposing and renting
// a second time when calling ConvertAndTerminateString.
using var converter = new ValueUtf8Converter(stackalloc byte[1024]);
while (true)
{
byte[] buffer = ArrayPool<byte>.Shared.Rent(outputBufferSize);
try
{
int resultLength = Interop.Sys.ReadLink(
ref MemoryMarshal.GetReference(converter.ConvertAndTerminateString(path)),
buffer,
buffer.Length);
if (resultLength < 0)
{
// error
return null;
}
else if (resultLength < buffer.Length)
{
// success
return Encoding.UTF8.GetString(buffer, 0, resultLength);
}
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
// Output buffer was too small, loop around again and try with a larger buffer.
outputBufferSize = buffer.Length * 2;
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the GetCwd and the ReadLink examples, which do not have an upper bound on the length of the buffer, the uname and gname case do have a max length. So I am now more inclined to just use the 256 byte limit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is that length documented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://serverfault.com/questions/294121/what-is-the-maximum-username-length-on-current-gnu-linux-systems

The above page says that:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limit is implementation specific, so there should be a resize loop.
256 is a good value for the stackalloc to start with because that is the Linux value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the Linux kernel only cares about the numeric ids. libc handles the names.

https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/bits/local_lim.h#L89-L90 defines this (implementation specific) limit for Linux:

/* Maximum login name length.  This is arbitrary.  */
#define LOGIN_NAME_MAX		256

Copy link
Member Author

@carlossanlop carlossanlop May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another upper limit. It's in the code example in the manual: https://pubs.opengroup.org/onlinepubs/9699919799/functions/getgrgid.html

long int initlen = sysconf(_SC_GETGR_R_SIZE_MAX);
size_t len;
if (initlen == -1)
    /* Default initial length. */
    len = 1024;
else
    len = (size_t) initlen;

I can use 1024 as the absolute upper limit that would stop the loop, I think that's large enough (that's the value I got when printing _SC_GETGR_R_SIZE_MAX in Ubuntu).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use 1024 as the absolute upper limit that would stop the loop, I think that's large enough (that's the value I got when printing _SC_GETGR_R_SIZE_MAX in Ubuntu).

I don't think that's necessarily the upper limit, eg., this man page for getgrnam_r says

The call
sysconf(_SC_GETGR_R_SIZE_MAX)
returns either -1, without changing errno, or an initial
suggested size for buf. (If this size is too small, the call
fails with ERANGE, in which case the caller can retry with a
larger buffer.)

In the man page for getgrgid it says it in different words, again referring to getgrnam_r

A call to sysconf(_SC_GETGR_R_SIZE_MAX) returns either -1
without changing errno or an initial value suggested for the size
of this buffer.

Although I see Mono just used it (or 1024) and did not resize without provision for resize, here is a bug report where it was not large enough. https://bugs.ruby-lang.org/issues/9600

@carlossanlop carlossanlop changed the title Address some System.Formats.Tar TODOs Address some System.Formats.Tar TODOs (infra and syscalls) May 11, 2022
}

[LibraryImport(Libraries.SystemNative, EntryPoint = "SystemNative_GetGName", StringMarshalling = StringMarshalling.Utf8, SetLastError = true)]
private static unsafe partial string GetGNameInternal(uint uid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the StringMarshalling know that it should free the char* that comes back from the native method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the generated code in ILSpy, it looks like it does:

[LibraryImport(/*Could not decode attribute arguments.*/)]
[GeneratedCode("Microsoft.Interop.LibraryImportGenerator", "42.42.42.42")]
private unsafe static string GetGroupNameInternal(uint uid)
{
	//IL_000b: Unknown result type (might be due to invalid IL or missing references)
	byte* __retVal_gen_native = default(byte*);
	Utf8StringMarshaller __retVal_gen_native__marshaller = default(Utf8StringMarshaller);
	int __lastError;
	string __retVal;
	try
	{
		Marshal.SetLastSystemError(0);
		__retVal_gen_native = __PInvoke__(uid);
		__lastError = Marshal.GetLastSystemError();
		((Utf8StringMarshaller)(ref __retVal_gen_native__marshaller)).FromNativeValue(__retVal_gen_native);
		__retVal = ((Utf8StringMarshaller)(ref __retVal_gen_native__marshaller)).ToManaged();
	}
	finally
	{
		((Utf8StringMarshaller)(ref __retVal_gen_native__marshaller)).FreeNative();  // <-- HERE
	}
	Marshal.SetLastPInvokeError(__lastError);
	return __retVal;
	[DllImport("libSystem.Native", EntryPoint = "SystemNative_GetGroupName", ExactSpelling = true)]
	[CompilerGenerated]
	static extern unsafe byte* __PInvoke__(uint uid);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronRobinsonMSFT @elinor-fung @jkoritzinsky - for my knowledge, if I have a P/Invoke where the returned string is shared and shouldn't be free'd, how would I write that? Just return the byte* pointer in my method and handle it myself? Or is there some built-in way to tell the marshalling code to not free the buffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I have a P/Invoke where the returned string is shared and shouldn't be free'd, how would I write that?

That would be a special case and I'm dubious if it is all that common. There is no support for that with the source generator marshallers nor in the built-in marshallers. Feel free to file an issue on that and we can consider it. Full-disclosure, if there isn't a lot of traction on it we'd likely close it at the end of the .NET 8 cycle.

Just return the byte* pointer in my method and handle it myself?

That would be the easiest way to handle this case. Another way would be to copy the appropriate "StringMarshaller" type from source into the project and modify it as desired.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work here, @carlossanlop.

@carlossanlop carlossanlop merged commit 5408323 into dotnet:main May 17, 2022
@carlossanlop carlossanlop deleted the TarImprovements branch May 17, 2022 02:45
carlossanlop added a commit to carlossanlop/runtime that referenced this pull request May 26, 2022
)

* Fix Design time build errors by removing src project as dependency of the test project.

* Add Browser to target platform identifiers. Ensure Browser can consume the Unix specific ArchivingUtils file too.

* Remove nullable enable from csproj since it's now default

* Use FileStream constructor with FileMode.CreateNew to detect and throw if destination file exists when creating archive.

* No error checking on syscalls that do not set errno.

* Add RDev field in FileStatus and retrieve it with stat/lstat so devmajor and devminor get their correct values.

* Simplify some File.Open calls with direct FileStream constructor calls. Simplify FileStreamOptions objects too.

* Implement and consume p/invokes to retrieve uname from uid and gname from gid.

* size_t variables should not be checked for negative values

* FileStream calls simplified to File.OpenRead

* Remove check for RDev > 0

* Use dictionary to preserve repeated unames and gnames mapped to uids and gids

* Missing documentation in pal_uid.h new PALEXPORT methods.

* Adjust syscalls to thread-safe ones. Start with stackalloc, then use loop.

* Make dicts readonly and non-nullable, use TryGetValue

* Reuse 'GetNameFromUid' from pal_networking.c, move to pal_uid.c, use similar logic for Gid method. Simplify Interop.Sys method.

* Remove unnecessary comment

Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>

* Put TargetFrameworks back in the first position of the PropertyGroup.

* Address eerhardt suggestions

* Update src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.WriteEntry.File.Tests.Unix.cs

* Clarify in pal_uid.h methods comments that new memory is returned

Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com>
Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
carlossanlop added a commit that referenced this pull request Jun 2, 2022
* Address some System.Formats.Tar TODOs (infra and syscalls) (#69107)

Re-submitting the changes approved in PR: #69107

Addresses: #68230

Includes an extra change to prevent the android build failure with the addition of the new native call to getgrgid_r.

Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com>
Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
@ghost ghost locked as resolved and limited conversation to collaborators Jun 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants