Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UTF-8 / multi-byte charset support to ADIF parser #830

Merged
merged 1 commit into from
Mar 30, 2021

Conversation

mikaelnousiainen
Copy link
Contributor

This PR demonstrates the changes required to add UTF-8 / multi-byte charset support to the ADIF parser. The ADIF text data can never be accessed using direct index references (e.g. $data[$index]) and all of these need to be replaced with mb_substr. Additionally, other functions handling strings, like strlen, strlower, etc, have been replaced with their multi-byte counterparts.

The code here is certainly not very clean and could be improved further, but it works at least on my test ADIF logs with lots of non-ASCII chars and long comments.

@AndreasK79
Copy link
Contributor

Just tested ADIF import and export. It worked without any trouble.

@magicbug
Copy link
Owner

@mikaelnousiainen thanks for this, plan to merge tomorrow, suffering from poor health least week, just want to confirm everything is ok my end 👍🏻

@mikaelnousiainen
Copy link
Contributor Author

@magicbug Thanks. The only thing I'm worried about is that the speed of imports might be slower now. Importing an ADIF with a bit over 1000 QSOs logged in WSJT-X took a significant time, maybe around 10-15 seconds on an otherwise idle, single-core cloud instance. Some of the parsing could be optimized in the future. Anyway, hope you'll recover soon!

@AndreasK79
Copy link
Contributor

@mikaelnousiainen yeah, I did notice the slowdown here as well. Recently I made some modifications to the parser. On my local machine, it made a huge difference. It dropped the time from many hours to down to around 30 seconds on 26k QSOs. With this PR, it takes around 1 minute. I guess further optimizations are possible, but I guess Peter will decide how he proceeds.

@mikaelnousiainen
Copy link
Contributor Author

@AndreasK79 Would you like share those optimizations here?

@AndreasK79
Copy link
Contributor

AndreasK79 commented Jan 27, 2021

@mikaelnousiainen I have. They were merged into the parser a few weeks back. See here: #604

Copy link
Owner

@magicbug magicbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not requesting you add the code but before merge I'd like to do the following to track speed changes in the adif code

  • In development mode run the Codeigniter benchmarking tool to calculate processing time for a while which is then logged to file using Codeigniters logging features.

This is purely to make sure there's not a huge jump, it might also be useful to have a set few ADIF files to check against.

Sorry for the delays been quite unwell the last couple of months

@AndreasK79
Copy link
Contributor

@magicbug

I did a benchmark here by enabling $this->output->enable_profiler(TRUE); in the controller.
There are some slowdowns, but not too bad:

501 qsos:
--with
Loading Time: Base Classes 0.0092
Controller Execution Time ( Adif / Import ) 10.4986
Total Execution Time 10.5081

--without
Loading Time: Base Classes 0.0092
Controller Execution Time ( Adif / Import ) 9.1137
Total Execution Time 9.1232

7777 qsos:
--with:
Loading Time: Base Classes 0.0086
Controller Execution Time ( Adif / Import ) 157.7302
Total Execution Time 157.7391

--without:
Loading Time: Base Classes 0.0081
Controller Execution Time ( Adif / Import ) 149.7410
Total Execution Time 149.7495

There are other things that can be done to speedup things, as many calls are made to the database just for one QSO.

@magicbug
Copy link
Owner

Merged

@magicbug magicbug merged commit fba5fe9 into magicbug:master Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants