Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fms_find_unique #938

Merged
merged 2 commits into from
Mar 21, 2022
Merged

Conversation

uramirez8707
Copy link
Contributor

Description
Adds a function to string_utils_mod to determine the number of unique strings in a sorted array and adds a test for it. This will be used in the diag_manager update to determine the number of unique diag_fields in a diag_yaml.

How Has This Been Tested?
CI, including the added unit test.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • New check tests, if applicable, are included
  • make distcheck passes

Comment on lines 166 to 181
// Finds the number of unique strings in an array
int fms_find_unique(char** arr, int *n)
{
int i;
int nfind;

nfind=1;
//printf("n is %i", *n);
for(i=1; i<*n; i++){
//printf("Comparing %s and %s \n",arr[i], arr[i-1]);
if (strcmp(arr[i], arr[i-1]) != 0){ nfind = nfind + 1;}
}

//printf("nfind=%i",nfind);
return nfind;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misreading the logic, this only works for a sorted array. To do this without a pre-sort, you'd need a nested loop structure. You can test this by moving the fms_find_unique in your unit test to be line 59 prior to the sort. If you want to stick with a single loop, you'd need add an fms_sort_this prior to the search-loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only intended to work with a sorted array:

!> @brief c function that finds the number of unique strings in a SORTED array of c pointers
!! @return number of unique strings
function fms_find_unique(my_pointer, p_size) bind(c)&
result(ntimes)
use iso_c_binding
type(c_ptr), intent(in) :: my_pointer(*) !< Array of sorted c pointer
integer(kind=c_int), intent(in) :: p_size !< Size of the array
integer(kind=c_int) :: ntimes
end function fms_find_unique

But the documentation should be more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a call to the fms_sort_this function to ensure a sort is done prior to the comparison search. That would remove the requirement to pass in only sorted arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a quick way to test if it's sorted? Is the sorting of a sorted array fast enough that it doesn't matter that you sort it again?

Comment on lines 173 to 179
//printf("n is %i", *n);
for(i=1; i<*n; i++){
//printf("Comparing %s and %s \n",arr[i], arr[i-1]);
if (strcmp(arr[i], arr[i-1]) != 0){ nfind = nfind + 1;}
}

//printf("nfind=%i",nfind);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unused printfs should be cleaned up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 166 to 167
// Finds the number of unique strings in an array
int fms_find_unique(char** arr, int *n)
Copy link
Member

@thomas-robinson thomas-robinson Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These C function variables should be documented https://www.doxygen.nl/manual/commands.html#cmdparam
You also need a \return for the return value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, I think

@thomas-robinson thomas-robinson merged commit 359939a into NOAA-GFDL:dmUpdate Mar 21, 2022
int nfind; // Number of unique strings in an array
int * ids = calloc(*n, sizeof(int)); // Array of integers initialized to 0

fms_sort_this(arr, n, ids);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uramirez8707
The fms_sort_this function uses qsort function from stdlib.h. I was able to look at the code and web comments and it seems to me that it uses the known/published tricks to avoid the worst (O(N^2)) cases. But I am not 100% sure. To be closer to sure, one can check if he array is already sorted or reverse is sorted. This is fast (O(N)) in compariosn to qsort. One can also test and determine the performance as functin of size of input, with test data in sorted order, reverse sorted order, and random order.

@uramirez8707 uramirez8707 deleted the find_unique branch May 18, 2022 14:21
rem1776 pushed a commit to rem1776/FMS that referenced this pull request May 1, 2024
rem1776 pushed a commit to rem1776/FMS that referenced this pull request May 1, 2024
rem1776 pushed a commit to rem1776/FMS that referenced this pull request May 1, 2024
rem1776 pushed a commit to rem1776/FMS that referenced this pull request May 1, 2024
rem1776 pushed a commit to rem1776/FMS that referenced this pull request May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants