-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add format versions count report #98
Conversation
fe51478
to
ce12964
Compare
ce12964
to
21e78de
Compare
AIPscan/helpers.py
Outdated
|
||
:param date_string: Date string. Valid format is YYYY-MM-DD. | ||
|
||
:returns: datetime.datetime object or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might consider doing the work of _prepare_datetimes_for_query(...)
here and returning a consistent data type in this helper. For me there are two main clues/reasons:
- Mixing return types might cause complexity upstream. Testing for
None
vs. getting to work on a newly minteddatetime
. - I appreciate the amount of thought gone into the date handling across the PR, but the story as I see it through the code, is that we're largely only dealing with date ranges which I believe these helpers are largely setup for handling going forward. I think we're always going to know with a min or a max. So I might suggest a function signature of, something like:
def parse_datetime_bounds(date_string, Upper=False):
return datetime (or datetime + 1 day if upper==true) if valid,
else return datetime.min if invalid, or datetime.max if invalid and upper==true
I think we might see a cleaner coding style appear by decoding early, rather than doing a little bit now, and a little bit later.
AIPscan/helpers.py
Outdated
if isinstance(date_string, str): | ||
try: | ||
return datetime.strptime(date_string, "%Y-%m-%d") | ||
except ValueError: | ||
pass | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you can get away with something like:
def parse_date(date_string):
"""Parse date parameter string into datetime object or None."""
try:
return datetime.strptime(date_string, "%Y-%m-%d")
except (ValueError, TypeError):
pass
If you wanted to simplify this a little if the function is kept as is.
@tw4l thanks for this. Overall I can see this is a high quality PR, although I have to step into it in more detail. I feel like AIP created date is exactly the right date to use here as I think this is specifically AIP level reporting. You might be interested in this ticket I mentioned about the archival storage tab too: archivematica/Issues#731 I have left some initial comments about the date handling which are good additions and I look forward to seeing applied to other existing reports. What I am a little bit stuck on before I start to review in more detail, is the similarity in the shape of the data between the two data functions:
It reminds me of a conversation before, but the similarity feels too hard to ignore, and there should be something done to (my preference - consolidate, rather than duplicate or remove). So I'd just like to ask more about the rationale for 2 over 1. And I think we need to understand what happens to both now, with respect for what each of them offer. My initial instinct is that 2. should probably not have been created without acknowledgement of 1 (i.e. potentially adding to technical debt). I could be wrong, the partial duplication might be fine. Maybe this really is a case of repository level reporting, vs. starting to implement functions to support preservation watch and treatment? |
Thank you for the feedback @ross-spencer! As always I appreciate your sharp mind and generosity.
Fantastic, and thanks for the helpful context with that issue! Given that, I will plan to add a commit in this PR that modifies the "Format count report" view to also use the AIP created date, to bring the reports into line with each other.
Thanks, these comments make a lot of sense! I agree with your insight that we'd be better off decoding early and always returning a
This is fair, and truthfully it's just the result of an oversight on my part. I started with the UI report and worked backwards from there (i.e. deciding what the report looks like, then writing the Data endpoint accordingly, and finally adding the API endpoint to keep good coverage there). I suspect that the similarity and potential to extend what was already there would have been much more apparent to me if I had worked in the reverse order. But you're also right that we've had similar conversations before, so apologies for making us cover this ground again. Given what you've pointed out, I see a few potential ways forward:
I agree with you that having both API endpoints as they are doesn't add much value currently (your thought about "repository level reporting, vs. starting to implement functions to support preservation watch and treatment" is intriguing, but I'm not sure we're there yet) so that rules out approach 1 for me. My first thought was to go with approach 2, even though it would mean throwing out a lot of the work in this PR. However, as this would require first fetching data from the We could also consider the inverse of approach 2 (i.e. keeping the new endpoints and removing the old), but I see value in keeping the API endpoints a bit generic so that other systems can consume them and use them for their own purposes and I think the So as I write this, I'm inclined to go with the hybrid approach 3. In essence, I see this as optimizing for speed in the reporting UI and thoroughness/generality in the API endpoint. What do you think? Does that sound reasonable to you? |
Thanks @tw4l for laying these options out. It's really helpful. I'm thinking through them. I was thinking it might benefit me to share some previous context/thinking here in person and I wonder impact any particular approach might have one some of the decisions we make in development moving forward, so I requested a Google Hangout with yourself and Peter if that sounds good? Going through some of this again I don't feel is that big a deal. It's really timely. There's more that I could have done here to signpost intentions - and what I think might be a somewhat orphaned function in the data module/API endpoint certainly doesn't help! It's going to come up more for other folks as we get others contributing. |
Sounds good, thanks @ross-spencer! Looking forward to discussing further this afternoon. |
To sum up how we left the offline conversation:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tw4l lots of this looking good. I'm submitting a first go code-review in the hope it might aid any of your efforts this week. I know you're working on the changes we discussed. There are a few small style things here, something to think about in the tests, and possibly some fortification to add to the report generation but I like the look of the report.
|
||
This fixture is used to create expected state which is then used to | ||
test the Data.format_versions_count endpoint. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look good, I wonder if we can think about providing some helpers to make it easier to create fixtures? Simplify the constructors a little? Recognizing time restrictions this week I've explored simplifying this a little and come up with the following, but what do you think?
diff --git a/AIPscan/Data/tests/conftest.py b/AIPscan/Data/tests/conftest.py
index 6fb1e88..816cccf 100644
--- a/AIPscan/Data/tests/conftest.py
+++ b/AIPscan/Data/tests/conftest.py
@@ -3,6 +3,8 @@
"""This module defines shared Data blueprint pytest fixtures."""
import datetime
+import uuid
+
import pytest
from AIPscan import db, create_app
@@ -27,6 +29,121 @@ AIP_1_CREATION_DATE = "2020-01-01"
AIP_2_CREATION_DATE = "2020-06-01"
+def _create_test_storage_service(**kwargs):
+ """Wrapper to slightly more easily return a sample storage service
+ for conftest fixtures.
+
+ param: ...
+ param: ...
+ param: ...
+ """
+ name = kwargs.get("name", "test storage service")
+ url = kwargs.get("name", "http://example.com")
+ user_name = kwargs.get("user_name", "test")
+ api_key = kwargs.get("api_key", "test")
+ download_limit = kwargs.get("download_limit", 20)
+ download_offset = kwargs.get("download_offset", 10)
+ default = kwargs.get("default", True)
+ storage_service = StorageService(
+ name=name,
+ url=url,
+ user_name=user_name,
+ api_key=api_key,
+ download_limit=download_limit,
+ download_offset=download_offset,
+ default=default,
+ )
+ db.session.add(storage_service)
+ db.session.commit()
+ return storage_service
+
+
+def _create_fetch_job(**kwargs):
+ """Wrapper to slightly more easily return a sample fetch job for
+ conftest fixtures.
+
+ param: ...
+ param: ...
+ param: ...
+ """
+ total_packages = kwargs.get("total_packages", 1)
+ total_aips = kwargs.get("total_aips", 1)
+ total_deleted_aips = kwargs.get("total_deleted_aips", 0)
+ download_start = kwargs.get("download_start", datetime.datetime.now())
+ download_end = kwargs.get("download_end", datetime.datetime.now())
+ download_directory = kwargs.get("download_directory", "/some/directory/")
+ storage_service_id = kwargs.get("storage_service_id", 1)
+ fetch_job = FetchJob(
+ total_packages=total_packages,
+ total_aips=total_aips,
+ total_deleted_aips=total_deleted_aips,
+ download_start=download_start,
+ download_end=download_end,
+ download_directory=download_directory,
+ storage_service_id=storage_service_id,
+ )
+ db.session.add(fetch_job)
+ db.session.commit()
+ return fetch_job
+
+def _create_aip(**kwargs):
+ """Wrapper to slightly more easily return an AIP for conftest
+ fixtures.
+
+ param: ...
+ param: ...
+ param: ...
+ """
+ uuid_ = kwargs.get("uuid", str(uuid.uuid4()))
+ transfer_name = kwargs.get("transfer_name", "Test AIP")
+ create_date = kwargs.get("create_date", datetime.datetime.now())
+ storage_service_id = kwargs.get("storage_service_id", 1)
+ fetch_job_id = kwargs.get("fetch_job_id", 1)
+ return AIP(
+ uuid=uuid_,
+ transfer_name=transfer_name,
+ create_date=create_date,
+ storage_service_id=storage_service_id,
+ fetch_job_id=fetch_job_id,
+ )
+
+
+def _create_test_file(**kwargs):
+ """Wrapper to slightly more easily return a sample file for conftest
+ fixtures.
+
+ param: ...
+ param: ...
+ param: ...
+ """
+ name = kwargs.get("name", "file_name.ext")
+ filepath = kwargs.get("filepath", "/path/to/file_name.ext")
+ uuid_ = kwargs.get("uuid", str(uuid.uuid4()))
+ file_type = kwargs.get("file_type", FileType.original)
+ size = kwargs.get("size", "0")
+ date_created = kwargs.get("date_created", datetime.datetime.now())
+ puid = kwargs.get("puid", "fmt/test-1")
+ file_format = kwargs.get("file_format", "ACME File Format")
+ format_version = kwargs.get("format_version", "0.0.0")
+ checksum_type = kwargs.get("checksum_type", "test")
+ checksum_value = kwargs.get("checksum_value", "test")
+ aip_id = kwargs.get("aip_id", 1)
+ return File(
+ name=name,
+ filepath=filepath,
+ uuid=uuid_,
+ file_type=file_type,
+ size=size,
+ date_created=date_created,
+ puid=puid,
+ file_format=file_format,
+ format_version=format_version,
+ checksum_type=checksum_type,
+ checksum_value=checksum_value,
+ aip_id=aip_id,
+ )
+
+
@pytest.fixture
def app_with_populated_files(scope="package"):
"""Fixture with pre-populated data.
@@ -38,66 +155,30 @@ def app_with_populated_files(scope="package"):
with app.app_context():
db.create_all()
- storage_service = StorageService(
- name="test storage service",
- url="http://example.com",
- user_name="test",
- api_key="test",
- download_limit=20,
- download_offset=10,
- default=True,
- )
- db.session.add(storage_service)
- db.session.commit()
-
- fetch_job = FetchJob(
- total_packages=1,
- total_aips=1,
- total_deleted_aips=0,
- download_start=datetime.datetime.now(),
- download_end=datetime.datetime.now(),
- download_directory="/some/dir",
- storage_service_id=storage_service.id,
- )
- db.session.add(fetch_job)
- db.session.commit()
+ storage_service = _create_test_storage_service()
+ fetch_job = _create_fetch_job(storage_service_id=storage_service.id)
- aip = AIP(
- uuid="111111111111-1111-1111-11111111",
- transfer_name="test aip",
- create_date=datetime.datetime.now(),
+ aip = _create_aip(
storage_service_id=storage_service.id,
fetch_job_id=fetch_job.id,
)
db.session.add(aip)
db.session.commit()
- original_file = File(
- name="original.tif",
- filepath="/path/to/original.tif",
- uuid="111111111111-1111-1111-11111111",
+ original_file = _create_test_file(
file_type=FileType.original,
size=ORIGINAL_FILE_SIZE,
- date_created=datetime.datetime.now(),
puid=TIFF_PUID,
file_format=TIFF_FILE_FORMAT,
- checksum_type="test",
- checksum_value="test",
- aip_id=1,
)
- preservation_file = File(
- name="preservation.tif",
- filepath="/path/to/preservation.tif",
- uuid="222222222222-2222-2222-22222222",
+
+ preservation_file = _create_test_file(
file_type=FileType.preservation,
size=PRESERVATION_FILE_SIZE,
- date_created=datetime.datetime.now(),
puid=TIFF_PUID,
file_format=TIFF_FILE_FORMAT,
- checksum_type="test",
- checksum_value="test",
- aip_id=1,
)
+
db.session.add(original_file)
db.session.add(preservation_file)
db.session.commit()
@@ -114,45 +195,22 @@ def app_with_populated_format_versions(scope="package"):
This fixture is used to create expected state which is then used to
test the Data.format_versions_count endpoint.
"""
+ AIP_DATE_FORMAT = "%Y-%m-%d"
+
app = create_app("test")
with app.app_context():
db.create_all()
- storage_service = StorageService(
- name="test storage service",
- url="http://example.com",
- user_name="test",
- api_key="test",
- download_limit=20,
- download_offset=10,
- default=True,
- )
- db.session.add(storage_service)
- db.session.commit()
+ storage_service = _create_test_storage_service()
+ fetch_job = _create_fetch_job(storage_service_id=storage_service.id)
- fetch_job = FetchJob(
- total_packages=1,
- total_aips=1,
- total_deleted_aips=0,
- download_start=datetime.datetime.now(),
- download_end=datetime.datetime.now(),
- download_directory="/some/dir",
- storage_service_id=storage_service.id,
- )
- db.session.add(fetch_job)
- db.session.commit()
-
- aip1 = AIP(
- uuid="111111111111-1111-1111-11111111",
- transfer_name="test aip",
- create_date=datetime.datetime.strptime(AIP_1_CREATION_DATE, "%Y-%m-%d"),
+ aip1 = _create_aip(
+ create_date=datetime.datetime.strptime(AIP_1_CREATION_DATE, AIP_DATE_FORMAT),
storage_service_id=storage_service.id,
fetch_job_id=fetch_job.id,
)
- aip2 = AIP(
- uuid="222222222222-2222-2222-22222222",
- transfer_name="test aip",
- create_date=datetime.datetime.strptime(AIP_2_CREATION_DATE, "%Y-%m-%d"),
+ aip2 = _create_aip(
+ create_date=datetime.datetime.strptime(AIP_2_CREATION_DATE, AIP_DATE_FORMAT),
storage_service_id=storage_service.id,
fetch_job_id=fetch_job.id,
)
@@ -160,34 +218,22 @@ def app_with_populated_format_versions(scope="package"):
db.session.add(aip2)
db.session.commit()
- file1 = File(
- name="jpeg_1.01.jpg",
- filepath="/path/to/jpeg_1.01.jpg",
- uuid="111111111111-1111-1111-11111111",
- file_type=FileType.original,
+ file1 = _create_test_file(
size=ORIGINAL_FILE_SIZE,
- date_created=datetime.datetime.now(),
puid=JPEG_1_01_PUID,
file_format=JPEG_FILE_FORMAT,
format_version=JPEG_1_01_FORMAT_VERSION,
- checksum_type="test",
- checksum_value="test",
aip_id=aip1.id,
)
- file2 = File(
- name="jpeg_1.02.jpg",
- filepath="/path/to/jpeg_1.02.jpg",
- uuid="222222222222-2222-2222-22222222",
- file_type=FileType.original,
+
+ file2 = _create_test_file(
size=PRESERVATION_FILE_SIZE,
- date_created=datetime.datetime.now(),
puid=JPEG_1_02_PUID,
file_format=JPEG_FILE_FORMAT,
format_version=JPEG_1_02_FORMAT_VERSION,
- checksum_type="test",
- checksum_value="test",
aip_id=aip2.id,
)
+
db.session.add(file1)
db.session.add(file2)
db.session.commit()
The full script:
# -*- coding: utf-8 -*-
"""This module defines shared Data blueprint pytest fixtures."""
import datetime
import uuid
import pytest
from AIPscan import db, create_app
from AIPscan.models import StorageService, FetchJob, AIP, File, FileType
TIFF_FILE_FORMAT = "Tagged Image File Format"
TIFF_PUID = "fmt/353"
JPEG_FILE_FORMAT = "JPEG"
JPEG_1_01_FORMAT_VERSION = "1.01"
JPEG_1_01_PUID = "fmt/43"
JPEG_1_02_FORMAT_VERSION = "1.02"
JPEG_1_02_PUID = "fmt/44"
ORIGINAL_FILE_SIZE = 1000
PRESERVATION_FILE_SIZE = 2000
AIP_1_CREATION_DATE = "2020-01-01"
AIP_2_CREATION_DATE = "2020-06-01"
def _create_test_storage_service(**kwargs):
"""Wrapper to slightly more easily return a sample storage service
for conftest fixtures.
param: ...
param: ...
param: ...
"""
name = kwargs.get("name", "test storage service")
url = kwargs.get("name", "http://example.com")
user_name = kwargs.get("user_name", "test")
api_key = kwargs.get("api_key", "test")
download_limit = kwargs.get("download_limit", 20)
download_offset = kwargs.get("download_offset", 10)
default = kwargs.get("default", True)
storage_service = StorageService(
name=name,
url=url,
user_name=user_name,
api_key=api_key,
download_limit=download_limit,
download_offset=download_offset,
default=default,
)
db.session.add(storage_service)
db.session.commit()
return storage_service
def _create_fetch_job(**kwargs):
"""Wrapper to slightly more easily return a sample fetch job for
conftest fixtures.
param: ...
param: ...
param: ...
"""
total_packages = kwargs.get("total_packages", 1)
total_aips = kwargs.get("total_aips", 1)
total_deleted_aips = kwargs.get("total_deleted_aips", 0)
download_start = kwargs.get("download_start", datetime.datetime.now())
download_end = kwargs.get("download_end", datetime.datetime.now())
download_directory = kwargs.get("download_directory", "/some/directory/")
storage_service_id = kwargs.get("storage_service_id", 1)
fetch_job = FetchJob(
total_packages=total_packages,
total_aips=total_aips,
total_deleted_aips=total_deleted_aips,
download_start=download_start,
download_end=download_end,
download_directory=download_directory,
storage_service_id=storage_service_id,
)
db.session.add(fetch_job)
db.session.commit()
return fetch_job
def _create_aip(**kwargs):
"""Wrapper to slightly more easily return an AIP for conftest
fixtures.
param: ...
param: ...
param: ...
"""
uuid_ = kwargs.get("uuid", str(uuid.uuid4()))
transfer_name = kwargs.get("transfer_name", "Test AIP")
create_date = kwargs.get("create_date", datetime.datetime.now())
storage_service_id = kwargs.get("storage_service_id", 1)
fetch_job_id = kwargs.get("fetch_job_id", 1)
return AIP(
uuid=uuid_,
transfer_name=transfer_name,
create_date=create_date,
storage_service_id=storage_service_id,
fetch_job_id=fetch_job_id,
)
def _create_test_file(**kwargs):
"""Wrapper to slightly more easily return a sample file for conftest
fixtures.
param: ...
param: ...
param: ...
"""
name = kwargs.get("name", "file_name.ext")
filepath = kwargs.get("filepath", "/path/to/file_name.ext")
uuid_ = kwargs.get("uuid", str(uuid.uuid4()))
file_type = kwargs.get("file_type", FileType.original)
size = kwargs.get("size", "0")
date_created = kwargs.get("date_created", datetime.datetime.now())
puid = kwargs.get("puid", "fmt/test-1")
file_format = kwargs.get("file_format", "ACME File Format")
format_version = kwargs.get("format_version", "0.0.0")
checksum_type = kwargs.get("checksum_type", "test")
checksum_value = kwargs.get("checksum_value", "test")
aip_id = kwargs.get("aip_id", 1)
return File(
name=name,
filepath=filepath,
uuid=uuid_,
file_type=file_type,
size=size,
date_created=date_created,
puid=puid,
file_format=file_format,
format_version=format_version,
checksum_type=checksum_type,
checksum_value=checksum_value,
aip_id=aip_id,
)
@pytest.fixture
def app_with_populated_files(scope="package"):
"""Fixture with pre-populated data.
This fixture is used to create expected state which is then used to
test the Data.aips_by_file_format and Data.aips_by_puid endpoints.
"""
app = create_app("test")
with app.app_context():
db.create_all()
storage_service = _create_test_storage_service()
fetch_job = _create_fetch_job(storage_service_id=storage_service.id)
aip = _create_aip(
storage_service_id=storage_service.id,
fetch_job_id=fetch_job.id,
)
db.session.add(aip)
db.session.commit()
original_file = _create_test_file(
file_type=FileType.original,
size=ORIGINAL_FILE_SIZE,
puid=TIFF_PUID,
file_format=TIFF_FILE_FORMAT,
)
preservation_file = _create_test_file(
file_type=FileType.preservation,
size=PRESERVATION_FILE_SIZE,
puid=TIFF_PUID,
file_format=TIFF_FILE_FORMAT,
)
db.session.add(original_file)
db.session.add(preservation_file)
db.session.commit()
yield app
db.drop_all()
@pytest.fixture
def app_with_populated_format_versions(scope="package"):
"""Fixture with pre-populated data.
This fixture is used to create expected state which is then used to
test the Data.format_versions_count endpoint.
"""
AIP_DATE_FORMAT = "%Y-%m-%d"
app = create_app("test")
with app.app_context():
db.create_all()
storage_service = _create_test_storage_service()
fetch_job = _create_fetch_job(storage_service_id=storage_service.id)
aip1 = _create_aip(
create_date=datetime.datetime.strptime(
AIP_1_CREATION_DATE, AIP_DATE_FORMAT
),
storage_service_id=storage_service.id,
fetch_job_id=fetch_job.id,
)
aip2 = _create_aip(
create_date=datetime.datetime.strptime(
AIP_2_CREATION_DATE, AIP_DATE_FORMAT
),
storage_service_id=storage_service.id,
fetch_job_id=fetch_job.id,
)
db.session.add(aip1)
db.session.add(aip2)
db.session.commit()
file1 = _create_test_file(
size=ORIGINAL_FILE_SIZE,
puid=JPEG_1_01_PUID,
file_format=JPEG_FILE_FORMAT,
format_version=JPEG_1_01_FORMAT_VERSION,
aip_id=aip1.id,
)
file2 = _create_test_file(
size=PRESERVATION_FILE_SIZE,
puid=JPEG_1_02_PUID,
file_format=JPEG_FILE_FORMAT,
format_version=JPEG_1_02_FORMAT_VERSION,
aip_id=aip2.id,
)
db.session.add(file1)
db.session.add(file2)
db.session.commit()
yield app
db.drop_all()
It reduces the number of lines a little and helps with sensible defaults.
There are a few gaps in that I didn't have a position on adding param:
for these. Also, I think the rebase will affect it slightly but the overall functions should remain the same.
We could also make this a different future commit if you like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I like this a lot @ross-spencer. Thanks for putting in a lot of effort and good thinking here! I'll do my best to get this into my revisions but appreciate the option to kick it to a future commit/PR if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helpers added in commit dbe8350.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! The changes to conftest.py
are looking good!
DATE_AFTER_AIP_2 = "2020-06-02" | ||
|
||
|
||
class MockFormatVersionsCountQueryResult: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the sophistication here.
The approach also reminds me of named tuples which I use occasionally. They're a little more lightweight. But I've no opinion on which is best here. Just sharing for reference.
I use them in my Palettes work for the painter goblin: https://github.com/ross-spencer/palettes/blob/9a422fa8aeed79995b2395c87278d8d55cf71787/palettes.py#L13
And here's a link to "module of the week" for more context: https://pymotw.com/2/collections/namedtuple.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, this is new to me. Thanks for sharing!
version_count, | ||
total_file_count, | ||
total_file_size, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this looks good! 👍
// this into it's own import too. | ||
// | ||
// TODO: Are there more idiomatic approaches? 🤷 | ||
bootstrap_alert = function(message, alertType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see these removed!
var startdate = $('#startdate3').val(); | ||
if (startdate != "") { | ||
if (startdate > date) { | ||
alert("The start date must be before the end date"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the two alerts here are supposed to have the same string we can just remove the duplication with a const/var above? const DATE_ALERT = "...";
.
My reason for asking if they are supposed, then I wonder if the end date alert might otherwise say, "the end date must be after the start date"? I have no preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added two new consts here - DATE_ALERT_START
and DATE_ALERT_END
- for the two variations of the message
@@ -54,8 +51,8 @@ | |||
<tr> | |||
<td>Format version count</td> | |||
<td></td> | |||
<td></td> | |||
<td></td> | |||
<td><input type="text" id="startdate3" value="{{ start_date }}"></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been using integers as suffixes a little across this page, but do you think we need to think a little further ahead and perhaps name attributes more meaningfully, e.g. id="start_date_version_report"
or something along those lines? Same for end date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point!
columns=translate_headers(headers), | ||
versions=versions, | ||
total_file_count=sum(version.get(data.FIELD_COUNT, 0) for version in versions), | ||
total_size=sum(version.get(data.FIELD_SIZE, 0) for version in versions), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting a stack trace error here:
Traceback (most recent call last):
File "/home/ross-spencer/git/artefactual-labs/aipscanvenv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/home/ross-spencer/git/artefactual-labs/aipscanvenv/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/ross-spencer/git/artefactual-labs/aipscanvenv/lib/python3.8/site-packages/flask_restx/api.py", line 639, in error_router
return original_handler(e)
File "/home/ross-spencer/git/artefactual-labs/aipscanvenv/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/ross-spencer/git/artefactual-labs/aipscanvenv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/home/ross-spencer/git/artefactual-labs/aipscanvenv/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/home/ross-spencer/git/artefactual-labs/aipscanvenv/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/ross-spencer/git/artefactual-labs/AIPscan/AIPscan/Reporter/report_format_versions_count.py", line 37, in report_format_versions_count
total_size=sum(version.get(data.FIELD_SIZE, 0) for version in versions),
It's not coming to me tonight why I might be getting None
but I am wondering if you have an instinct? We might also need to capture that in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add the check for None
in the Data endpoint here, but agree that it would be good to capture this in tests! I will work on including that in my revisions.
<td>{{ version["Format"] }}</td> | ||
<td>{{ version["Version"] }}</td> | ||
<td>{{ version["Count"] }}</td> | ||
<td>{{ version["Size"] | filesizeformat }}</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be related to the reporter module code, but here, I have a {'PUID': 'fmt/468', 'Format': 'ISO Disk Image File', 'Version': None, 'Count': 1, 'Size': None}
result and the template expects a number to work with and not None
so I wonder how we can fortify this or the data coming out of the DB (or into) which can help us to avoid this? I looked at the original METS and it looks like the ISO Disk Image File
was ingested as a standard transfer from one of my FPR test sets - but it's likely the image from the sample data dirs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's likely that this File
object has the format and size it does because of this potentially misleading AttributeError
handling:
AIPscan/AIPscan/Aggregator/database_helpers.py
Lines 196 to 200 in 39dbf88
except AttributeError: | |
# File error/warning to log. Obviously this format may | |
# be incorrect so it is our best guess. | |
file_info["file_format"] = "ISO Disk Image File" | |
file_info["puid"] = "fmt/468" |
Regardless, I think you hit the nail on the head that we should be checking for None
in the data endpoint to avoid this issue. Good spot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah whoops, just re-read your comment and my "I wonder if it's actually an ISO disk image..." was off base here. Sorry about that! 😅
AIPscan/Data/data.py
Outdated
except AttributeError: | ||
pass | ||
version_info[FIELD_COUNT] = version.file_count | ||
version_info[FIELD_SIZE] = version.total_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it looks like maybe we can test for none here? (see template comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think you're right!
AIPscan/Data/data.py
Outdated
:start_date: Optional inclusive AIP creation start date | ||
(datetime.datetime object). | ||
:end_date: Optional inclusive AIP creation end date | ||
(datetime.datetime object). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(datetime.datetime object). | |
:param storage_service_id: Storage Service ID. | |
:start_date: Optional inclusive AIP creation start date | |
(datetime.datetime object). | |
:end_date: Optional inclusive AIP creation end date | |
(datetime.datetime object). |
I am still getting used to this more complex style of docstring. Admittedly I haven't spent a lot of time looking at this, but what do you think about adding a hanging indent: https://stackoverflow.com/a/50455731 - to help readability? It would be good to do that wherever we break a docstring over two lines. Initially it's hard to pick up as part of the same string.
Is there another idiomatic approach that you happen to know about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These detailed docstrings are a new-ish habit for me so I don't have a ready answer for other approaches but this is a good suggestion and I agree it would help a lot with readability.
21e78de
to
dbe8350
Compare
Hi @ross-spencer - I believe I've responded to each of your suggestions and comments and this PR is now ready for code review. Thanks again for the great suggestions here! In addition to responding to the in-line comments, I've rebased this branch on Along the way I discovered a bug in the |
I realized right after marking this PR as ready for re-review that I forgot to update the "File format count" report to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question around style/presentation for the API URL, but approving, and:
The way you approached moving things about made it really simple to re-review. Lots of this looking really clean and easy to use now too. Thanks @tw4l
|
||
This fixture is used to create expected state which is then used to | ||
test the Data.format_versions_count endpoint. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! The changes to conftest.py
are looking good!
data.FIELD_STORAGE_NAME: "Storage Service Name", | ||
data.FIELD_TRANSFER_NAME: "Transfer Name", | ||
data.FIELD_VERSION: "Version", | ||
fields.FIELD_AIP_NAME: "AIP Name", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fields module is a nice intuitive change it feels much easier to make use of now.
AIPscan/API/namespace_report_data.py
Outdated
from AIPscan.Data import report_data | ||
|
||
api = Namespace( | ||
"report_data", description="Retrieve data optimized for AIPscan reports" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder which is nicer to have here:
With underscore: http://127.0.0.1:5000/api/report_data/format-version-count/1
With dash: http://127.0.0.1:5000/api/report-data/format-version-count/1
It looks like both are valid - my preference is with the dash, because we have dashes in the endpoint name too. IDK what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch! I agree the dash is nicer and more consistent. I'll make that change before merging.
This commit also: - Adds a new Data.report_date module and report-data API namespace for data endpoints optimized for reports implemented in the AIPscan UI as opposed to more general data overview endpoints. - Modifies the file format count report to filter dates on AIP.create_date to bring the two reports into line with each other. - Adds new helper functions for creating test objects in the Data.test package's conftest module.
cb05db4
to
e73fc58
Compare
Connected to #93
This PR implements the "Format versions count" report in the UI and API, with support for limiting results by start and end date. This commit also removes the Bootstrap alerts from the report selection template, as there is no longer the need for any "Coming soon"-type messages, and users should be able to tell that their reports are loading in a new tab without these alerts.
NB: With reference to this conversation in #65 (comment), I suspect that some changes will need to be made around implementing the
start_date
andend_date
parameters before merging. This might mean changing this report to use files' ingestionEvent
date or it might mean modifying the "File format count" report to useAIP.create_date
, as I've done here. My rationale for usingAIP.create_date
is that that date is likely the most visible date for Archivematica users, since it is displayed in both the Archival Storage tab and the Storage Service's Package tab. But honestly, I'm not really sure which is the better approach and would like some input, as I'd like to set a standard practice and use that moving forward wherever possible.In many cases, file ingestion dates and AIP creation dates will be the same, or at least similar. But in some cases, as in the case of files that are stored in a transfer backlog for weeks or months prior to becoming part of a SIP (and, eventually, an AIP), they may end up being quite different.
So when we say "start date" and "end date" in relation to something like this report, or the "File format count" report, do we mean when the file was first ingested into an Archivematica instance, or when it became part of an AIP? Might that answer change for other reports? How might we better communicate what these dates mean to users? (I've tried in the API documentation with some success, I think, but am more at a loss for what to do in the GUI)