-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak API support S3 bucket stream #2289
Conversation
|
||
|
||
public async Task<(Stream? Stream, string? FileName, string? ContentType)> GetFileStreamAsync(string path) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should check the path is null first? so skip the rest if the path is empty.
|
||
public async Task<(Stream? Stream, string? FileName, string? ContentType)> GetFileStreamAsync(string path) | ||
{ | ||
var fileReference = await GetByS3PathAsync(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can check if the path is in the S3 path format, to avoid the if... if...
var filesToDelete = await this.Context.FileReferences | ||
.Where(fr => fr.CreatedOn < beforeDate.ToUniversalTime() | ||
&& fr.IsSyncedToS3 | ||
&& (fr.ContentType.StartsWith("audio/") || fr.ContentType.StartsWith("video/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could make a const variable in the class as an array ["audio", "video"], or make an enum
Easier to add or delete later.
} | ||
|
||
|
||
public async Task<(Stream? Stream, string? FileName, string? ContentType)> GetFileStreamAsync(string path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we searching for FileReferences by their path? We should only be searching by their primary key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, path is actually the key s3 path. However, that we don't have path defined as unique in our references, which could be a hidden problem. I've modified it to ignore the reference step and access s3 directly with path(key).
return NotFound($"Stream does not exist: '{path}'"); | ||
} | ||
|
||
if (fileReference.IsSyncedToS3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this logic appears to be identical to the code in the FileReferenceService? It appears like it could be centralized.
|
||
public async Task<FileReference?> GetByS3PathAsync(string s3Path) | ||
{ | ||
return await this.Context.FileReferences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't be searching the database for a file reference by path. You should only be searching for file reference by its primary key.
.FirstOrDefaultAsync(fr => fr.S3Path == s3Path); | ||
} | ||
|
||
public async Task<FileReference?> GetByPathAsync(string path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't be searching the database for a file reference by path. You should only be searching for file reference by its primary key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to remove those. not search them. path
come from endpoint, I will let them goes to s3 file query or local file finder directly.
libs/net/dal/Config/S3Options.cs
Outdated
/// <summary> | ||
/// S3 config | ||
/// </summary> | ||
public static class S3Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally using static classes this way isn't ideal. It makes it difficult or impossible to use dependency injection and then later unit tests.
Is there a reason you cannot use the default configuration options like everything else? Then dependency injection works out of the box correctly.
If you need a helper class, then it should also not be a static class. If you only need a single instance of a shared helper class then configure dependency injection so that it only uses that one class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. let me follow other options styles.
@@ -9,6 +9,7 @@ | |||
using TNO.DAL.Config; | |||
using TNO.DAL.Services; | |||
using TNO.Elastic; | |||
using Amazon.S3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required?
/// <returns>uploaded files and failed uploads</returns> | ||
[HttpPost("upload-files-to-s3")] | ||
[Produces(MediaTypeNames.Application.Json)] | ||
[ProducesResponseType(typeof(Dictionary<string, List<string>>), (int)HttpStatusCode.OK)] | ||
[ProducesResponseType(typeof(ErrorResponseModel), (int)HttpStatusCode.BadRequest)] | ||
[SwaggerOperation(Tags = new[] { "Storage" })] | ||
public async Task<IActionResult> UploadFilesToS3([FromQuery] DateTime? updatedBefore = null, [FromQuery] int? limit = null) | ||
public async Task<IActionResult> UploadFileToS3Async([FromQuery] DateTime? updatedBefore = null, [FromQuery] int? limit = null, [FromQuery] bool force = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a file was updated last has little meaning or value to MMI. We use those for audit fields, but generally nothing else. Uploading to S3 probably should be based on the Content.PublishedOn
date, as that has a relationship to the license. Also there will be a lot of historical content that was recently updated, but it's old and should be moved to S3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong word, it's a filter base on PublishedOn
, I will change to a proper one like PublishBefore
.
libs/net/dal/Config/S3Options.cs
Outdated
/// </summary> | ||
public class S3Options | ||
{ | ||
public string BucketName => Environment.GetEnvironmentVariable("S3_BUCKET_NAME") ?? ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't how configuration options should be set. This will result in always using the environment variable value S3_BUCKET_NAME
, where it should be following the dependency injection configuration you set below S3.BucketName
. Expression-bodied member
properties are not useful for configuration options.
You can review the existing code to see how all this works. Also the following are links to information to understand it.
https://learn.microsoft.com/en-us/dotnet/core/extensions/dependency-injection
https://learn.microsoft.com/en-us/dotnet/core/extensions/configuration
{ | ||
_connection = connection; | ||
_storageOptions = storageOptions.Value; | ||
_apiOptions = apiOptions.Value; | ||
_fileReferenceService = fileReferenceService; | ||
_logger = logger; | ||
_s3Options = s3Options.Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you have chosen to use the scoped version of the configuration options. Which is the standard and in my opinion the correct way to do this.
@@ -23,9 +24,11 @@ public FileReferenceService( | |||
ClaimsPrincipal principal, | |||
IServiceProvider serviceProvider, | |||
StorageOptions options, | |||
S3Options s3Options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you are using the singleton. Is there a reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for configuration only. Use the standard IOptions<S3Options>
approach. If you need a singleton for connecting to S3, then you create a helper class and configure it through dependency injection as a singleton.
updatedBefore = updatedBefore.Value.ToUniversalTime(); | ||
query = query.Where(fr => fr.UpdatedOn < updatedBefore.Value); | ||
createdBefore = createdBefore.Value.ToUniversalTime(); | ||
query = query.Where(fr => fr.CreatedOn < createdBefore.Value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we searching for files created before a date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we searching for files created before a date?
We don't want to move the latest files to the S3 bucket, might be move files before 30 days. I also add a parameter to let us select a date range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using the Content.PublishedOn
date, not the FileReference.CreatedOn
date. The CreatedOn
date is an audit field that tells us when a row in the database was created. We are currently importing historical content, which means all the CreatedOn
dates will be at the same time. I suppose for testing purposes this will be okay, but the implementation could be problematic if the performance of S3 isn't ideal.
return entity; | ||
} | ||
|
||
public async Task<int> DeleteOldLocalFilesAsync(DateTime beforeDate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the purpose of this method related to the business processes. What is the requirement that requires this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove it as well. Endpoint was removed.
public static IServiceCollection AddS3Config(this IServiceCollection services, IConfiguration config) | ||
{ | ||
services.Configure<S3Options>(config.GetSection("S3")); | ||
services.AddSingleton(sp => sp.GetRequiredService<IOptions<S3Options>>().Value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need a singleton of the configuration values. You may only need a singleton of the helper class.
@@ -23,9 +24,11 @@ public FileReferenceService( | |||
ClaimsPrincipal principal, | |||
IServiceProvider serviceProvider, | |||
StorageOptions options, | |||
S3Options s3Options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for configuration only. Use the standard IOptions<S3Options>
approach. If you need a singleton for connecting to S3, then you create a helper class and configure it through dependency injection as a singleton.
updatedBefore = updatedBefore.Value.ToUniversalTime(); | ||
query = query.Where(fr => fr.UpdatedOn < updatedBefore.Value); | ||
createdBefore = createdBefore.Value.ToUniversalTime(); | ||
query = query.Where(fr => fr.CreatedOn < createdBefore.Value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using the Content.PublishedOn
date, not the FileReference.CreatedOn
date. The CreatedOn
date is an audit field that tells us when a row in the database was created. We are currently importing historical content, which means all the CreatedOn
dates will be at the same time. I suppose for testing purposes this will be okay, but the implementation could be problematic if the performance of S3 isn't ideal.
… publishedAfter publishedBefore, filter by content's publishedOn
IsSyncedToS3
last_synced_to_s3_on
S3Path
toFileReference
force
param toGetFiles
method , andupload-files-to-s3
endpointdelete-old-local-files
apiTo make this PR small, I make another PR to refactor transcription service.