Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History Entries table does not contain last played track, although rekordbox see it #32

Closed
IvanOnishchenko opened this issue Jun 7, 2024 · 17 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@IvanOnishchenko
Copy link

IvanOnishchenko commented Jun 7, 2024

I use https://ide.kaitai.io/# and rekordbox_pdb.ksy to read export.pdb files.

I noticed that HISTORY_ENTRIES tabel of export.pdb has one less track than what I played, last track missing, but Recordbox sees all tracks.

@brunchboy
Copy link
Member

Interesting. Please study your database, and figure out why this is happening, and submit a PR to correct the structure definition.

@brunchboy brunchboy added the help wanted Extra attention is needed label Jun 7, 2024
@IvanOnishchenko
Copy link
Author

IvanOnishchenko commented Jun 7, 2024

I've attached files in case anyone can help figure this out.

Hystory list from recordbox: https://drive.google.com/file/d/1-mSBPSp0x3Z_bD1z21wKkebytMMGN4Xh/view?usp=sharing
Export.pdb from stick: https://drive.google.com/file/d/1ImR27xHuCVxFHyysDsBUKG0kghH-v9qr/view?usp=sharing

Thank you!

@brunchboy
Copy link
Member

Please try to figure it out yourself! You are the person who is best positioned in the world to do so, because you have both the data, and a desire to see the change happen. That is what drives open-source projects. When I created this project, I knew nothing about Kaitai, and I learned it all as I went, so you can do the same.

@brunchboy
Copy link
Member

Thank you for uploading that data, and for making it such a focused example of the problem. I had some spare time tonight, and poked into your files, and you have identified a misinterpretation we made in the structure format that is probably affecting a lot of tables. We had assumed num_rows represented the total valid row entries in a table, regardless of whether they had been marked as present or not. But it turns out that it seems to indicate the total number of remaining rows in the table after skipping the ones that have been marked as deleted. So we need to scan num_row_groups * 16 row entries, looking for all rows whose row_present_flag is full. I will assign this issue to myself and work on updating Crate Digger to reflect this improved understanding.

@brunchboy brunchboy added bug Something isn't working good first issue Good for newcomers labels Jun 9, 2024
@brunchboy
Copy link
Member

If you want to test the fix yourself in the Kaitai web IDE, you can change the repeat-expr in line 289 of rekordbox_psb.ksy to the much simpler value 16.

@IvanOnishchenko
Copy link
Author

Wow! Now I see all tracks! Thank you!

@Holzhaus
Copy link
Contributor

Holzhaus commented Aug 3, 2024

I have a question on this: Is the calculation of num_row_groups still correct? https://github.com/Deep-Symmetry/crate-digger/blob/main/src/main/kaitai/rekordbox_pdb.ksy#L243-L248

Or could there be any number of row groups, and we have to keep reading row groups it until we found all (via presence flags)?

@brunchboy
Copy link
Member

I think num_row_groups is correct, you just need to scan all 16 entries in each of those row groups until all rows mentioned in num_rows have been found as present.

@Holzhaus
Copy link
Contributor

Holzhaus commented Aug 7, 2024

I'm wondering because in the Database Exports Analysis you wrote:

Note: The row counter entries represent the number of actually-present rows in the page. To find them, you need to scan all 16 entries of each of the row groups present in the page, ignoring any whose row presence bit is zero.

I guess "row counter entries" refers to num_rows (i.e. num_rows_small or num_rows_large)? If so, this statement is quite confusing, because that would mean we check how many rows are present in a page, and then we start reading row groups until num_rows equals the total number of 1s in the row groups that were read.

Something like:

num_rows = 123
rows_found = 0
while rows_found < num_rows:
    rowgroup = read_next_rowgroup()
    rows_found += rowgroup.row_presence_flags.count_ones()

But this will not work (I checked). Either something is wrong with num_rows, or the statement is not really correct. The kaitai itself does not adhere to it, as far as I can tell by using the following files in the Kaitai Web IDE:

In this example, the workflow is as follows:

  1. According to num_rows there should be 7 existing rows (at least if I understand the note from the docs correctly)
  2. We calculate the number of row groups: $(n_{rows} - 1) / 16 + 1 = (7 - 1) / 16 + 1 = 6 / 16 + 1 = 0 + 1 = 1$ which means that there is only one rowgroup
  3. According to the row_presence_flags there are only 2 rows in the group

image

If the note is correct, where are the remaining 5 rows? I already checked, the other rows groups in that page do not contain any rows (it starts to fail after 45 rows groups).

@brunchboy
Copy link
Member

brunchboy commented Aug 12, 2024

I’m afraid it has been too many years since I was working at this level for the details to be fresh in my head. Do you have a sample export file that you can share that Crate Digger fails to find all the rows for a table with? Thinking about this now, I think you are probably right that we can’t use num_rows to figure out how many row groups there are, because of the fact that deleted rows might use up several row groups. So how are you determining how many row groups to scan?

@brunchboy
Copy link
Member

Perhaps we need to just assume there are as many row groups as fit in the index page, and keep scanning until we find num_rows non-deleted rows? Sadly this is not going to be something that can be expressed coherently in Kaitai, I fear. Although we might be saved by the lazy nature of the row group evaluation.

@Holzhaus
Copy link
Contributor

Perhaps we need to just assume there are as many row groups as fit in the index page, and keep scanning until we find num_rows non-deleted rows?

Nope, this is what I tried and it doesn't work. The export.pdb linked my comment above just contains the two demo tracks, but num_rows is 7. My initial implementation tried to read row groups until it finds 7 non-deleted rows, but since there are only two non-deleted rows it crashes at some point because it tries to interpret actual row data at the start of the heap as row group data.

@Holzhaus
Copy link
Contributor

Holzhaus commented Aug 12, 2024

Just to clarify: the current kaitai code works. I'm just saying the corresponding docs are confusing and likely wrong.

My current interpretation of num_rows is that:

  • num_rows is the number of rows in the table, but that does not mean that these rows are actually present (I think that was also what the docs said before this change).
  • Additionally, we cannot make assumptions about where the rows are located inside in a row group.
  • Hence, we always need to scan the entire row group.
  • the number of row groups is determined by ${\lceil}\frac{\texttt{num\_rows}}{16}{\rceil}$

@brunchboy
Copy link
Member

Yes, I think that is the best interpretation from the evidence. num_rows tells us how many rows have ever been allocated, which we can use to calculate num_row_groups, and we just need to scan all row groups for any whose presence is true.

@brunchboy
Copy link
Member

Thanks for this follow up, I will re-update the documentation and implementation when I have some time. I will re-open this issue in the mean time.

@brunchboy brunchboy reopened this Aug 12, 2024
brunchboy added a commit that referenced this issue Aug 13, 2024
@brunchboy
Copy link
Member

Ok, I have fixed the documentation and the Kaitai mapping explanation. It seems my implementation was already working this way, but the reasoning behind the explanation was wrong. Thanks again for pointing this out, I have added credit in the change log!

@brunchboy
Copy link
Member

(Oh, and please let me know if it is still confusing or could be further improved in your eyes!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants