Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

floppy: apply NCO-based 2nd-order PLL to WDATA line #864

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mx-shift
Copy link

Centurion Finch Floppy Controller
(FFC, https://github.com/Nakazoto/CenturionComputer/wiki/FFC-Board) is an 8" floppy controller that applies up to 350ns write precompensation. FlashFloppy's existing approach of re-aligning the bit clock to the last pulse edge causes enough additional error for bitcells to be missed or injected in this case.

This commit applies a 2nd-order (proportional and integral) PLL to the incoming WDATA to mitigate both frequency offset and extreme write precompensation. The implementation is based on a virtual numerically-controlled oscillator with a frequency 2^16 times FlashFloppy's tick rate. Proportional and integral constants were chosen to nominally provide a 715kHz natural loop frequency and a damping factor of 1 and then adjusted to be powers of 2 to allow computation to use bit shifts instead of multiplication and division.

This commit has been successfully tested with a variety of PC-compatible floppy controllers as well as Centurion FFC.

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

Thanks for this! I need to consider whether to apply this as a default, or only, clock recovery. I can probably simplify the edges of this code a bit, where it interfaces to the rest of the firmware. My main concern is how long it takes to run per bit clock, as the old STM32 Goteks aren't very fast, and FlashFloppy is tested up to ED data rate (MFM 500ns clock). What Gotek model(s) has this been tested on?

However, fundamentally it is more principled than ae42450 and adjusting the frequency rather than messing directly with the clock phase seems obviously better for dealing with aggressive precomp. I can test this against Greaseweazle easily enough, but my gut says that this implementation shouls be quite a bit more robust.

Possibly I should try something like this in Greaseweazle too, which has a PLL that's not really a PLL. Albeit the existing implementation has proven to work pretty well even on marginal disks. I could have two implementations though, and then A/B them.

@mx-shift
Copy link
Author

I only have 2x Gotek SFRKC30.AT4.35 so I've only tested with the Artery AT32F435. I was pretty careful to choose an implementation that avoids expensive math. There's one uint32_t multiply and one int32_t divide per sample. Everything else is adds, subtracts, and bit shifts. I didn't check the disassembly to see if the multiple and division are replaced by the compiler.

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

The other question that comes to mind is how were the PLL parameters arrived at? I'm no PLL expert, and I don't know whether for example the loop frequency and damping factors chosen are good ones, nor whether the resulting calculated k_l and k_i are good. Was there much trial and error?

@mx-shift
Copy link
Author

mx-shift commented Jan 21, 2024

I'll admit that I'm not particularly strong on PLL design either. I included a link to https://www.dsprelated.com/showarticle/967.php in the comment with the parameters as I referred to that site often while developing this implementation.

I did experiment with various parameters to see what range of loop frequency would work. I left damping factor alone as all the information I turned up said 1 was a safe choice. Ultimately, I found anything from 300kHz up to 750kHz worked fine for a 1us MFM clock with write precomp ranging from 100ns up to 350ns. Given that, I worked backwards from k_l and k_i that were convenient for bit shifting which also resulted in a loop frequency in that range. That resulted in k_l = 1/8, k_i = 1/256 (715kHz) and k_l = 1/16, k_i = 1/1024 (358kHz). Both worked fine with 3.5" HD and Centurion 8" DD flux images and on real hardware.

To do all this experimentation (and not need Usagi Electric to fire up the Centurion dozens of times), I wrote a tool (https://github.com/mx-shift/Centurion_Floppy/tree/main/kryoflux_to_flashfloppy) that converts Kryoflux files to a list of tick counts approximating how the AT32F435's timer capture unit generates samples via the DMA engine. This tool can also apply additional write precompensation. Then I extracted the core algorithm from IRQ_wdata_dma into another tool (https://github.com/mx-shift/Centurion_Floppy/tree/main/flashfloppy_to_hfe) so I could try different implementations and parameters. You'll notice that the tool has algorithms from FlashFloppy v4.31, a snapshot of master, Greaseweazle's two PLLs, a design based on an FDC9216, and finally this NCO-based PLL approach.

If you have Kryoflux files of other disks you'd like to verify these algorithms against, I'll gladly run them through the tooling and verify HxC can decode the generated HFEs. As I only got into floppy emulation because Usagi Electric was having difficulty with Centurion, I don't have a large library of flux images to draw from for verification.

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

Okay that's very awesome! I assume you found this NCO-PLL scheme to work best? Sounds like you used empirical methods to pick good k_l and k_i, and f_n emerged from that rather than working forwards, and that certainly seems fine to me.

Regarding the signed divisions, you should find that these compile to three instructions. Unfortunately conventional signed division (which rounds to zero) is not the same as arithmetic shift (which rounds to -inf). For the purposes here however, perhaps arithmetic shift is okay and will save two instructions? It is "safe enough" to try >> here because although its implementation dependent how C right shifts a negative number, in practice everyone uses arithmetic shift if it's available (which it is on almost any ISA including ARM). Leaving unsigned divisions as divisions is okay as the compiler can substitute a single LSR there itself, no problem.

The main weirdness of ASR is that a negative number repeatedly shifted will tend to -1 rather than 0.

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

Did you in the end find any reason to pick between 358kHz vs 715kHz? I'd certainly be inclined to go with what experiments say if they are conclusive either way but, since WDATA should have a very stable clock close to what FlashFloppy expects and basically all the noise is bit shifts introduced by precomp, the lower-bandwidth 358kHz parameters would be a natural choice.

By the way, you've probably seen this doc before, but I found this useful for thinking about PLL behaviours in relation to floppy drive data: https://info-coach.fr/atari/hardware/_fd-hard/AN-505.pdf
Yes, I do skip most of the maths!

@mx-shift
Copy link
Author

mx-shift commented Jan 21, 2024 via email

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

Well that's fine, I have no strong reason to prefer 358kHz vs 715kHz. Clearly the latter has really good tolerance for bit shifts!

And yes makes sense I suppose to leave as is. I have a feeling this may only get applied for the '435 build of FlashFloppy anyway. That given you could probably even have used single-precision floats, no problem! Three-instruction divisions are unlikely to need optimisation there.

@mx-shift
Copy link
Author

mx-shift commented Jan 21, 2024 via email

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

Interesting, having a play with your conversion and analysis tools. I don't actually find 350ns precomp reliable with the 715kHz PLL and an MFM clock of 1us. For example (paraphrasing slightly, but this is the sequence of commands ignoring paths):

dd if=/dev/urandom of=img1440.img bs=1024 count=1440
gw convert --format=ibm.1440 img1440.img img1440_00.0.raw
kryoflux_to_flashfloppy img1440_00.0.raw --write-precomp-ns 350
flashfloppy_to_hfe img1440_00.0.revolution1.ff_samples a.hfe nco_715k

If I inspect the result using HxC software's track analyzer, there are plenty of badly decoded bitcells and most sectors fail data CRC.

Also the resulting HFE seems a bit long, hence HxC software thinks the RPM is approx 298. This is probably just some artifact of your HFE-stuffing code.

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

Warning. Ramble:

Another thing I took to wondering is if there is a better way to handle precomp shifts, given that we basically know that the only high-frequency noise we will see is the deterministic shifting caused by precomp. Whenever I start thinking about it I reckon I always end up wanting something like a conservative low-bandwidth PLL though. It is interesting that a PLL (with appropriate parameters!) can tolerate even this aggressive shifting. It's not immediately obvious to me that it should be possible. After all the flux timings are expected to be written direct to disk, and only passed through a PLL on readback, at which point the precomp shifts should be mostly compensated by effects of the magnetic media.

@mx-shift
Copy link
Author

mx-shift commented Jan 21, 2024 via email

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

No Greaseweazle doesn't apply any precomp to image conversions. Could the problem be that the data is more random than previous tests, and that certain bit sequences are defeating the higher bandwidth PLL? The lower bandwidth version worked okay on this test input. Worth investigating anyway.

@mx-shift
Copy link
Author

I believe you are right. I'm starting to play with even lower bandwidth versions to see if 400ns can be tolerated.

@keirf
Copy link
Owner

keirf commented Jan 21, 2024

Another interesting measurement for low bandwidth PLLs will be their tolerance to frequency shift. This can also be tested with Greaseweazle as a custom format can be specified with a shifted bitrate (eg 490-510 in place of 500kbps). We should make sure the PLL can lock and lock reasonably fast to deviations of a few percent I'd say?

@mx-shift
Copy link
Author

I was thinking along the same lines. I started to work through the formulas again and remembered that f_n (loop natural frequency) determines how far the PLL will "pull in" an input that has a frequency offset.

One way to think of this PLL design is as a PLL that tracks a nominal 72MHz clock. Since MFM doesn't provide a clock pulse consistently, I'm simulating the PLL clock running N cycles until the MFM edge and then spreading the accumulated error found at that edge across those N cycles. So when thinking about how much the PLL will tolerate a frequency offset, it's in reference to the nominal 72MHz clock and the tolerance is divided across however many ticks occur before an MFM edge.

For nco_715k, this means the PLL tolerates roughly 72MHz +/- 715kHz. With a 1MHz MFM clock, it's possible to see 4us between MFM pulses. So the 715kHz tolerance is spread over 288 PLL clock ticks resulting in 2.482kHz tolerance on the incoming 1MHz MFM clock. The tolerance actually gets worse with slower MFM clocks.

To experiment with slower MFM clocks, I modified https://github.com/mx-shift/Centurion_Floppy/tree/main/flashfloppy_to_hfe to allow specifying the HFE bitrate which is used to calculate write_bc_ticks just like FlashFloppy does. With a 250kbps HFE, both nco_715k and nco_358k were able to track an ibm.360 image. (Aside: I also added https://github.com/mx-shift/Centurion_Floppy/blob/main/scripts/kryo_to_hfe.sh to quickly generate HFEs with 0, 300, 350, and 400ns precomp from a kryoflux image).

I also added nco_178k to see what happens if the loop natural frequency is dropped further. This works fine with ibm.1440 and ibm.360 images containing random data and converted with gw convert. Note that an ibm.360 converted to a 250kbps HFE with nco_178k spreads the 178k tolerance over 576 PLL clock ticks resulting in 309Hz tolerance. nco_358k is only slightly better at 621Hz.

One way to improve this is to slow the PLL virtual clock down to be much closer to the expected range of MFM clocks. That requires some thinking through how to handle the PLL virtual clock and the timer samples overflowing at different times.

@mx-shift
Copy link
Author

Playing around a bit more, I think I was wrong about the tolerance being spread across the measurement time. The PLL only gets updated at the end of the next MFM edge. Even if that 4us from the previous edge, the phase error is divided by the number of PLL clock ticks before updating the PLL. So the PLL can tolerate x% at the update which translates to x% across the measurement window.

Zeta is what becomes important for tolerating jitter while keeping a wide loop natural frequency. As zeta approaches 0, the jitter tolerance improves until a point where the PLL loses stability. As zeta increases above 1, phase tracking to the input clock is improved until a point where the PLL loses stability.

Putting these two pieces together, I calculated parameters for loop natural frequency of 3%, 2%, and 1% of 72MHz and zeta at various points between 0 and 1. I've added all of these to https://github.com/mx-shift/Centurion_Floppy/tree/main/flashfloppy_to_hfe as nco__. No matter what I do, I can't tolerate 400ns of write precomp. nco_1440k_0p25 seems to be a good balance of 2% frequency tolerance while also tolerating 350ns write precomp. I have not tried this with intentional frequency offsets.

I did try it against https://github.com/Nakazoto/CenturionComputer/tree/main/Drives/Floppy/FluxReading/Old/RawDS_Kryoflux which is a Centurion floppy read via greaseweazle. Note that since this was created from actual media, it only has moderate write precomp remaining as most of it was compensated for by the drive and media. Also note that Centurion adds additional write precomp starting at track 42. So looking at track 42 with an additional 150ns or so of write precomp is an example of what Centurion FFC looks like going into FlashFloppy. It has both a small frequency offset (and drift) as well as extreme write precomp. nco_1440k_0p25 seems to handle this just fine.

@mx-shift
Copy link
Author

mx-shift commented Jan 22, 2024

Updated the PR with the numerical accuracy fix for the integral and to adjust the PLL to 1440kHz loop frequency and loop dampening of 0.25.

Centurion Finch Floppy Controller
(FFC, https://github.com/Nakazoto/CenturionComputer/wiki/FFC-Board)
is an 8" floppy controller that applies up to 350ns write
precompensation. FlashFloppy's existing approach of re-aligning
the bit clock to the last pulse edge causes enough additional error
for bitcells to be missed or injected in this case.

This commit applies a 2nd-order (proportional and integral) PLL to
the incoming WDATA to mitigate both frequency offset and extreme
write precompensation. The implementation is based on a
virtual numerically-controlled oscillator nominally running at the
same frequency at the WDATA timer but with a 16.16 fixed-point
representation allowing for considerable precision in frequency
adjustment.  Proportional and integral constants were calcuated to
nominally provide a 1440kHz natural loop frequency and a damping
factor of 0.25 to allow for 2% frequency offset and strong jitter
rejection against write precompenstaion.
@keirf
Copy link
Owner

keirf commented Jan 22, 2024

Okay, I will have a play with this when I have some time. Could be next weekend. I can't really comment authoritatively on how to interpret PLL parameters, especially when it's a digital approximation of an APLL applied to an incomplete clock stream. I think the natural frequency is actually to do with the frequency that the PLL oscillates at when perturbed. I'm also not sure the absolute f_n and f_s have much meaning for us when the only parameters we depend on in the code (k_l and k_i) depend only on the ratio between f_n and f_s.

Anyway, empirical evaluation of sensible-looking 2^-N values for k_l and k_i seems a reasonable approach.

@keirf
Copy link
Owner

keirf commented Jan 22, 2024

Interestingly, a quick fairly arbitrary test with data bitrate +/- 3% of nominal, and 350ns precomp, seems handled better by nco_358k than nco_1440k_0p25.

Seems there's possibility of searching this space somewhat automatically, at the cost of implementing a harness:

  1. Greaseweazle can generate pseudorandom IBM 1.44M Kryoflux files with arbitrary bitrate
  2. kryoflux_to_flashfloppy can add arbitrary precomp
  3. flashfloppy_to_hfe could be generalised to accept arbitrary pow2 k_i and k_l divisors.
  4. Greaseweazle can analyse resulting HFE looking for valid sectors

Then run a multi-dim search across (rate,precomp,k_l,k_i) -> nr_valid_sectors

Then pick optimal k_l,k_i for preferred tradeoff between worst-case rate vs precomp. We could chuck in a few of the other 'pll' algorithms as baselines.

Then the whole thing seems empirically watertight at least, within our constraints on values for k_l,k_i.

@keirf
Copy link
Owner

keirf commented Jan 26, 2024

I forked and added a parametric NCO and a test harness at https://github.com/keirf/Centurion_Floppy

I searched across bit rates up to +/-8% from 500kbps nominal. And across precomp values up to 350ns. The best parameters I can find are k_l = 1/16 and k_i = 1/128 (though k_i = 1/256 is a very close second).

It is good that the k_l matches your preferred NCOs (358k and 1440k_0p25). While k_i sits somewhere between, but closer to nco_358k.

@mx-shift
Copy link
Author

mx-shift commented Jan 26, 2024 via email

@keirf
Copy link
Owner

keirf commented Jan 26, 2024

Dividing by write_bc_ticks is simply to remove the data frequency from phase calculations. It's nothing to do with averaging out the phase error across bitcells. That would be wrong, and it's not what's done here, so all good.

That said, I want to finesse the code some on the FlashFloppy side, and then I will want to re-extract that into the test program and re-run the search. But I haven't found anything in the existing NCO-PLL code I'd strictly call wrong.

@mx-shift
Copy link
Author

mx-shift commented Jan 26, 2024 via email

@keirf
Copy link
Owner

keirf commented Jan 26, 2024

But think about the dimensions of the values. phase_step is dimensionless, then multipled by write_bc_ticks (clock ticks) to get bc_step (clock ticks). distance_from_curr_bc_left and bc_step are also in units of clock ticks. But then phase_error needs to be dimensionless because it is used to calculate phase_step, hence dividing through by something in units of clock ticks surely makes sense.

Anyway, inarguably the code does seem to work :)

@mx-shift
Copy link
Author

mx-shift commented Jan 26, 2024 via email

@mx-shift
Copy link
Author

Finally found some time to revisit this. I cleaned up https://github.com/mx-shift/Centurion_Floppy/tree/main/flashfloppy_to_hfe quite a bit and replaced all the nco_* variations with a parameterized version as you had done in your fork. I also reworked your harness.py into sweep_ibm.py which scans across both ibm.360 and ibm.1440 formats to see what happens with different MFM clock rates. I also added summarize_results.py which will show how well the bitrate/precomp combinations are covered as well as the coverage of the set of parameters that had the widest coverage.

I've also worked up a revision of the NCO PLL that behaves like the math says it should. I call this nco_v2 in flashfloppy_to_hfe. sweep_ibm.py will run both nco_v1 and nco_v2 and generate results for each. I haven't let it run completely but early results show nco_v2 working quite well with p_mul=1, p_div=2048, i_mul=1, i_div=8192.

I also added a CSV output of phase_error over time to flashfloppy_to_hfe. TechTangents provided me with a kryoflux image of a floppy that failed formatting in a Xerox 645S word processor due to the drive spindle speed varying so much that it couldn't read it's own writes. I ran that through flashfloppy_master, greaseweazle_default_pll, nco_v1, and nco_v2 and plotted all of their phase errors:
Screenshot from 2024-02-19 23-33-15

nco_v2 slightly edges out nco_v1. I also did the same with a Centurion floppy as I know it has a modest amount of write precomp:
Screenshot from 2024-02-19 23-39-15

Again, nco_v2 is slightly better than nco_v1. I'll update again once my run of sweep_ibm.py finishes.

@keirf
Copy link
Owner

keirf commented Feb 20, 2024

Nice. Can it tolerate 400ns precomp in a 1.44m pc disk?

@mx-shift
Copy link
Author

mx-shift commented Feb 20, 2024 via email

@keirf
Copy link
Owner

keirf commented Feb 20, 2024

Well this is great anyway. I still haven't got my head round the mathematical correctness of v1 versus v2, but I need to give it a bit more thought, and it's hard to argue against empirical results, plus dropping a divide, and the common-sense interpretation that we are propagating and accumulating errors with greater precision, in v2.

My main concern is whether v2 is sufficiently "scale invariant" to various write_bc_ticks values. Sweeping over precomp and clock variations at base clock 1us versus 2us will certainly be interesting: will the same parameters work best for both base clocks in v1 and/or v2.

@mx-shift
Copy link
Author

TL;DR Use nco_v2[p_mul=1,p_div=2048,i_mul=1,i_div=8192] and it should work for everything up to and including ibm.2880.

Sweep finished overnight. Results are pretty clear: both nco_v1[p_mul=1,p_div=16,i_mul=1,i_div=128] and nco_v2[p_mul=1,p_div=2048,i_mul=1,i_div=8192] can cover:

  • ibm.360 with +/- 8% data rate and up to 400ns precomp
  • ibm.1440 with -8% to +6% data rate and up to 350ns precomp

nco_v1 can also do ibm.1440 with +7% data rate and 350ns precomp. I'm surprised they are different on this one data point.

I also looked at ibm.2880 for those two specific parameter sets. I can't use the same tooling since it is all built around generating HFE which can't store a track long enough for ibm.2880. Instead, I looked at the phase error plot of a nominal ibm.2880 just to check that nothing went particularly haywire.

image

nco_v1 does slightly better here but both nco_v1 and nco_v2 are worse than flashfloppy_master. The error phase is well within the tolerance needed for correct decode though (bitcells are 36 sample ticks wide for 1000kbps). I suspect the extra phase error noise for both nco_v1 and nco_v2 come down to using 32-bits for the NCO. Moving up to 64-bits might give it enough precision to settle down instead of oscillating around 35kHz which is very close to where the math says this filter should oscillate.

Full summaries of results:
ibm.360.summary.txt
ibm.1440.summary.txt

@mx-shift
Copy link
Author

I'll gladly do the work to update this PR with nco_v2. Up to you.

@keirf
Copy link
Owner

keirf commented Feb 21, 2024

Doesn't seem there's much between 'em.

You can do the ibm.2880 bit rate but not the exact format. Just do ibm.1440 but at rate=1000 and rpm=600. You'll need to add an rpm config parameter to your class Format and write that into the generated diskdef. Something like:

Format(
        'ibm.ed',
        heads=2,
        cylinders=80,
        sectors_per_cylinder=18,
        bytes_per_sector=512,
        data_rate_kbps=1000,
        rpm=600
    )

100ms of track should be plenty in which to evaluate the PLLs.

@mx-shift
Copy link
Author

mx-shift commented Feb 21, 2024 via email

@keirf
Copy link
Owner

keirf commented Feb 21, 2024

Let's get the higher rate numbers and see. We do indeed want one set of parameters for all rates if possible. So whichever of v1 and v2 is best for that is going to win.

@mx-shift
Copy link
Author

mx-shift commented Feb 22, 2024

Results are not what I expected. Both nco_v1 and nco_v2 have a wide range of parameters that work fine at 1000kbps. What's really surprising is nco_v1[p_mul=1, p_div=16, i_mul=1, i_div=128] works for 1000kbps but nco_v2[p_mul=1, p_div=2048, i_mul=1, i_div=8192] does not. I've started another run with an nco_v3 to test a theory but it seems like nco_v1[p_mul=1, p_div=16, i_mul=1, i_div=128] is a solid choice for FlashFloppy.

ibm.ed.summary.txt

@keirf
Copy link
Owner

keirf commented Feb 22, 2024

Yes interesting, not only is nco_v2[p_mul=1, p_div=2048, i_mul=1, i_div=8192] a narrow miss, but nco_v1[p_mul=1, p_div=16, i_mul=1, i_div=128] remains at the very heart of a successful 2d range of parameters. It wasn't obvious to me which would end up winning in practice.

Is there anything else from v2 that's worth considering in combo with v1 for a v3 or v4?

@mx-shift
Copy link
Author

mx-shift commented Feb 22, 2024

v3 is two small adjustments to v2. nco_v3[p_mul=1, p_div=2048, i_mul=1, i_div=8192] is just shy of nco_v1[p_mul=1, p_div=16, i_mul=1, i_div=128] in coverage. I could probably eek out those last few cells (ibm1440 >+6% RPM +350ns) with a bit more work but nco_v1 already handles this. I suspect i_div between 4096 and 8192 would pick them up but then we lose the performance advantage of a divide by power-of-2. Let me know which version you want to proceed with and I'll update this PR accordingly.

omnibus_summary.txt

@keirf
Copy link
Owner

keirf commented Feb 22, 2024

I'd stick with nco_v1[p_mul=1, p_div=16, i_mul=1, i_div=128].

I also note from previous comment: The other difference is nco_v1 assumes both the PLL uint32 and the sample clock uint16 will overflow at the same time.

Is this the following nco_v2 code?

        if (pll_period < PLL_PERIOD_MIN) {
            pll_period = PLL_PERIOD_MIN;
        } else if (pll_period > PLL_PERIOD_MAX) {
            pll_period = PLL_PERIOD_MAX;
        }

@keirf
Copy link
Owner

keirf commented Feb 22, 2024

Ultimately my path forward is as follows: Get an acceptable patch ready for firmware, then port that as directly as possible back out into the test harness (so the code is line for line as close to identical as possible), and check it against the nco_v1 baseline again. This will allow to confidently test firmware optimisations too.

@mx-shift
Copy link
Author

mx-shift commented Feb 22, 2024 via email

@keirf
Copy link
Owner

keirf commented Feb 22, 2024

No, that bit of code is limiting the tunable range of the NCO. If it is left unlimited, it can go so low that the decoding process gets very slow.

That might be worth keeping then?

As for the rest, as long as phase_step (aka pll_period) doesn't get very large, I don't see that prev_sample and prev_bc_left can really deviate far enough that the arithmetic involving curr_sample will overflow if we use the latter (since they should be within plus or minus bc_step/2 of each other).

I also don't see how the more complex code in nco_v2 helps anyway: prev_pll_edge appears to always be prev_sample<<16 plus a fixed delta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants