Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak API support S3 bucket stream #2289

Merged
merged 15 commits into from
Sep 26, 2024
Merged

Tweak API support S3 bucket stream #2289

merged 15 commits into from
Sep 26, 2024

Conversation

kkwangsir
Copy link
Contributor

@kkwangsir kkwangsir commented Sep 17, 2024

  • add new fields IsSyncedToS3 last_synced_to_s3_on S3Path to FileReference
  • add force param to GetFiles method , and upload-files-to-s3 endpoint
  • Update Editor/Subscriber/Content and Storage Controller to support priority use of S3 files.
  • add delete-old-local-files api
    image

To make this PR small, I make another PR to refactor transcription service.

@JacobWang-bc JacobWang-bc added DB Migration A DB Migration may require refreshing or simply updating your database. devops DevOps templates, documentation, configuration, tools. labels Sep 17, 2024


public async Task<(Stream? Stream, string? FileName, string? ContentType)> GetFileStreamAsync(string path)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should check the path is null first? so skip the rest if the path is empty.


public async Task<(Stream? Stream, string? FileName, string? ContentType)> GetFileStreamAsync(string path)
{
var fileReference = await GetByS3PathAsync(path);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can check if the path is in the S3 path format, to avoid the if... if...

var filesToDelete = await this.Context.FileReferences
.Where(fr => fr.CreatedOn < beforeDate.ToUniversalTime()
&& fr.IsSyncedToS3
&& (fr.ContentType.StartsWith("audio/") || fr.ContentType.StartsWith("video/"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could make a const variable in the class as an array ["audio", "video"], or make an enum
Easier to add or delete later.

api/net/Areas/Editor/Controllers/StorageController.cs Outdated Show resolved Hide resolved
api/net/Areas/Editor/Controllers/ContentController.cs Outdated Show resolved Hide resolved
api/net/Areas/Editor/Controllers/ContentController.cs Outdated Show resolved Hide resolved
api/net/Areas/Editor/Controllers/StorageController.cs Outdated Show resolved Hide resolved
api/net/Areas/Editor/Controllers/StorageController.cs Outdated Show resolved Hide resolved
api/net/Areas/Editor/Controllers/StorageController.cs Outdated Show resolved Hide resolved
}


public async Task<(Stream? Stream, string? FileName, string? ContentType)> GetFileStreamAsync(string path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we searching for FileReferences by their path? We should only be searching by their primary key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, path is actually the key s3 path. However, that we don't have path defined as unique in our references, which could be a hidden problem. I've modified it to ignore the reference step and access s3 directly with path(key).

api/net/Areas/Editor/Controllers/StorageController.cs Outdated Show resolved Hide resolved
return NotFound($"Stream does not exist: '{path}'");
}

if (fileReference.IsSyncedToS3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this logic appears to be identical to the code in the FileReferenceService? It appears like it could be centralized.

libs/net/dal/Services/FileReferenceService.cs Outdated Show resolved Hide resolved
libs/net/dal/Services/FileReferenceService.cs Outdated Show resolved Hide resolved

public async Task<FileReference?> GetByS3PathAsync(string s3Path)
{
return await this.Context.FileReferences
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be searching the database for a file reference by path. You should only be searching for file reference by its primary key.

.FirstOrDefaultAsync(fr => fr.S3Path == s3Path);
}

public async Task<FileReference?> GetByPathAsync(string path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be searching the database for a file reference by path. You should only be searching for file reference by its primary key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to remove those. not search them. path come from endpoint, I will let them goes to s3 file query or local file finder directly.

/// <summary>
/// S3 config
/// </summary>
public static class S3Options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally using static classes this way isn't ideal. It makes it difficult or impossible to use dependency injection and then later unit tests.

Is there a reason you cannot use the default configuration options like everything else? Then dependency injection works out of the box correctly.

If you need a helper class, then it should also not be a static class. If you only need a single instance of a shared helper class then configure dependency injection so that it only uses that one class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. let me follow other options styles.

@@ -9,6 +9,7 @@
using TNO.DAL.Config;
using TNO.DAL.Services;
using TNO.Elastic;
using Amazon.S3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required?

libs/net/dal/Services/FileReferenceService.cs Outdated Show resolved Hide resolved
/// <returns>uploaded files and failed uploads</returns>
[HttpPost("upload-files-to-s3")]
[Produces(MediaTypeNames.Application.Json)]
[ProducesResponseType(typeof(Dictionary<string, List<string>>), (int)HttpStatusCode.OK)]
[ProducesResponseType(typeof(ErrorResponseModel), (int)HttpStatusCode.BadRequest)]
[SwaggerOperation(Tags = new[] { "Storage" })]
public async Task<IActionResult> UploadFilesToS3([FromQuery] DateTime? updatedBefore = null, [FromQuery] int? limit = null)
public async Task<IActionResult> UploadFileToS3Async([FromQuery] DateTime? updatedBefore = null, [FromQuery] int? limit = null, [FromQuery] bool force = false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a file was updated last has little meaning or value to MMI. We use those for audit fields, but generally nothing else. Uploading to S3 probably should be based on the Content.PublishedOn date, as that has a relationship to the license. Also there will be a lot of historical content that was recently updated, but it's old and should be moved to S3.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong word, it's a filter base on PublishedOn , I will change to a proper one like PublishBefore.

/// </summary>
public class S3Options
{
public string BucketName => Environment.GetEnvironmentVariable("S3_BUCKET_NAME") ?? "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't how configuration options should be set. This will result in always using the environment variable value S3_BUCKET_NAME, where it should be following the dependency injection configuration you set below S3.BucketName. Expression-bodied member properties are not useful for configuration options.

You can review the existing code to see how all this works. Also the following are links to information to understand it.

https://learn.microsoft.com/en-us/dotnet/core/extensions/dependency-injection
https://learn.microsoft.com/en-us/dotnet/core/extensions/configuration

libs/net/dal/Extensions/ServiceCollectionExtensions.cs Outdated Show resolved Hide resolved
{
_connection = connection;
_storageOptions = storageOptions.Value;
_apiOptions = apiOptions.Value;
_fileReferenceService = fileReferenceService;
_logger = logger;
_s3Options = s3Options.Value;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you have chosen to use the scoped version of the configuration options. Which is the standard and in my opinion the correct way to do this.

@@ -23,9 +24,11 @@ public FileReferenceService(
ClaimsPrincipal principal,
IServiceProvider serviceProvider,
StorageOptions options,
S3Options s3Options,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are using the singleton. Is there a reason for this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for configuration only. Use the standard IOptions<S3Options> approach. If you need a singleton for connecting to S3, then you create a helper class and configure it through dependency injection as a singleton.

updatedBefore = updatedBefore.Value.ToUniversalTime();
query = query.Where(fr => fr.UpdatedOn < updatedBefore.Value);
createdBefore = createdBefore.Value.ToUniversalTime();
query = query.Where(fr => fr.CreatedOn < createdBefore.Value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we searching for files created before a date?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we searching for files created before a date?

We don't want to move the latest files to the S3 bucket, might be move files before 30 days. I also add a parameter to let us select a date range.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using the Content.PublishedOn date, not the FileReference.CreatedOn date. The CreatedOn date is an audit field that tells us when a row in the database was created. We are currently importing historical content, which means all the CreatedOn dates will be at the same time. I suppose for testing purposes this will be okay, but the implementation could be problematic if the performance of S3 isn't ideal.

return entity;
}

public async Task<int> DeleteOldLocalFilesAsync(DateTime beforeDate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the purpose of this method related to the business processes. What is the requirement that requires this function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it as well. Endpoint was removed.

public static IServiceCollection AddS3Config(this IServiceCollection services, IConfiguration config)
{
services.Configure<S3Options>(config.GetSection("S3"));
services.AddSingleton(sp => sp.GetRequiredService<IOptions<S3Options>>().Value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need a singleton of the configuration values. You may only need a singleton of the helper class.

@@ -23,9 +24,11 @@ public FileReferenceService(
ClaimsPrincipal principal,
IServiceProvider serviceProvider,
StorageOptions options,
S3Options s3Options,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for configuration only. Use the standard IOptions<S3Options> approach. If you need a singleton for connecting to S3, then you create a helper class and configure it through dependency injection as a singleton.

updatedBefore = updatedBefore.Value.ToUniversalTime();
query = query.Where(fr => fr.UpdatedOn < updatedBefore.Value);
createdBefore = createdBefore.Value.ToUniversalTime();
query = query.Where(fr => fr.CreatedOn < createdBefore.Value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using the Content.PublishedOn date, not the FileReference.CreatedOn date. The CreatedOn date is an audit field that tells us when a row in the database was created. We are currently importing historical content, which means all the CreatedOn dates will be at the same time. I suppose for testing purposes this will be okay, but the implementation could be problematic if the performance of S3 isn't ideal.

… publishedAfter publishedBefore, filter by content's publishedOn
@Fosol Fosol merged commit 4287173 into bcgov:dev Sep 26, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB Migration A DB Migration may require refreshing or simply updating your database. devops DevOps templates, documentation, configuration, tools.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants