Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange input signal behavior #158

Closed
chykon opened this issue Sep 18, 2022 · 8 comments · Fixed by #166
Closed

Strange input signal behavior #158

chykon opened this issue Sep 18, 2022 · 8 comments · Fixed by #166
Assignees
Labels
bug Something isn't working

Comments

@chykon
Copy link
Contributor

chykon commented Sep 18, 2022

Describe the bug

The incoming signal behaves incorrectly in a certain case. Adding a useless assignment operator solves the problem.

To Reproduce

Steps to reproduce the behavior:

  1. Get project
  2. Go to file and comment highlighted line
  3. Run another file

Expected behavior

The "codepoint" input does not require the use of a useless assignment operator to work correctly.

Actual behavior

The "codepoint" input does not work correctly.

Additional details

Dart SDK version: 2.18.0 (stable) (Fri Aug 26 10:22:54 2022 +0000) on "linux_x64"
name: utf8encoder_hw
version: 0.1.0-prototype.1
description: A hardware UTF-8 encoder written in Dart using the ROHD framework.
repository: https://github.com/chykon/utf8encoder-hw
issue_tracker: https://github.com/chykon/utf8encoder-hw/issues

environment:
  sdk: '>=2.18.0 <3.0.0'

dependencies:
  rohd: ^0.3.2

dev_dependencies:
  test: ^1.21.5
  very_good_analysis: ^3.0.1

Additional context
To me, this is some weird issue that occurs with code point U+2020 but does not occur with U+2019 and U+2021. I also want to note that I encountered a problem when testing using this file. Before the appearance of the U+2020 code point, there are many other three-byte sequences in the file, but there are no problems with them.

@chykon chykon added the bug Something isn't working label Sep 18, 2022
@chykon
Copy link
Contributor Author

chykon commented Sep 19, 2022

Made a minimal example using one file.

I also stumbled upon the sudden appearance of unknown bits. There are comments in the code regarding this. Maybe these problems are somehow related? Or should I open a separate issue?

import 'dart:convert';
import 'dart:io';
import 'package:rohd/rohd.dart';

class UTF8Encoder extends Module {
  UTF8Encoder(Logic codepoint) {
    codepoint = addInput('codepoint', codepoint, width: 21);
    final bytes = addOutput('bytes', width: 32);

    final count = Logic(name: 'count', width: 2);
    final offset = Logic(name: 'offset', width: 8);
    final byte = Logic(name: 'byte', width: 8);

    final debug1 = Logic(name: 'debug_1', width: 21);

    Combinational([
      count < 0,

      If((codepoint >= 0) & codepoint.lte(0x7F), then: [
        bytes < codepoint.zeroExtend(32)
      ], orElse: [
        If((codepoint >= 0x800) & codepoint.lte(0xFFFF), then:[
          count < 2,
          offset < 0xE0
        ])
      ]),

      If(~count.eq(0), then: [
        byte < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
                offset.zeroExtend(21)).slice(7, 0),
        // NOTE: Codepoint doesn't work correctly without this.
        debug1 < codepoint,
        // An attempt to directly substitute "byte" into "bytes" in this case
        // leads to the error "Cannot convert invalid LogicValue to int:
        // 32'bxxxxxxxx101000001000000011100010", but is it possible for
        // unknown bits to appear if "zeroExtend" is executed?
        /*
        bytes < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
                 offset.zeroExtend(21)).slice(7, 0).zeroExtend(32),
        */
        bytes < byte.zeroExtend(32),
        bytes < bytes.withSet(8, Const(0x80, width: 8) |
                 ((codepoint >>> (Const(6, width: 4) * (count - 1).zeroExtend(4)))
                   .slice(7, 0) & Const(0x3F, width: 8))),
        count < count - 1,
        If(~count.eq(0), then: [
          bytes < bytes.withSet(16, Const(0x80, width: 8) |
                   ((codepoint >>> (Const(6, width: 3) * (count - 1).zeroExtend(3)))
                     .slice(7, 0) & Const(0x3F, width: 8))),
          count < count - 1,
          If(~count.eq(0), then: [
            bytes < bytes.withSet(24, Const(0x80, width: 8) |
                     (codepoint.slice(7, 0) & Const(0x3F, width: 8))),
          ])
        ])
      ])
    ]);
  }

  Logic get bytes => output('bytes');
}

Future<void> main() async {
  // Preparing inputs.
  final codepoint = Logic(width: 21);

  // Create and build module instance.
  final utf8encoder = UTF8Encoder(codepoint);
  await utf8encoder.build();

  // Generate SystemVerilog code.
  File('code.sv').writeAsStringSync(utf8encoder.generateSynth());

  // Prepare WaveDumper for simulation tracking.
  WaveDumper(utf8encoder);

  // Prepare input and output.
  final codepoints = '†† †† † †'.runes;
  final utf8Bytes = <int>[];

  // Run simulation.
  SimpleClockGenerator(10);
  await Simulator.tick();
  await Simulator.tick();
  for (final inputCodepoint in codepoints) {
    codepoint.inject(inputCodepoint);
    await Simulator.tick();
    await Simulator.tick();
    await Simulator.tick();
    for (var i = 0; i < 4; ++i) {
      final tempByte = (utf8encoder.bytes.value.toInt() >> (8 * i)) & 0xFF;
      if ((tempByte != 0) || (i == 0)) {
        utf8Bytes.add(tempByte);
      }
    }
  }
  codepoint.inject(0);
  await Simulator.tick();
  await Simulator.tick();

  // Print result.
  stdout.writeln(utf8.decode(utf8Bytes));
}

@mkorbel1
Copy link
Contributor

I'm able to reproduce a failure in your example code, but I'm still trying to figure out what the failure means since I'm not that familiar with the details of UTF8 encoding. It is certainly strange that some extraneous conditional assignments have an effect on the behavior, so I expect something suspicious happening within ROHD here, so thank you for filing this!

As a side note, if you use put instead of inject, you could ditch all the ticks (of which there are more than necessary anyways), since your logic is purely combinational -- the Simulator doesn't need to be involved. Also, the clock generator is not doing anything (since the clk is not attached to anything).

@chykon
Copy link
Contributor Author

chykon commented Sep 19, 2022

Further shortening of the code:

import 'dart:io';
import 'package:rohd/rohd.dart';

class ExampleModule extends Module {
  ExampleModule(Logic codepoint) {
    codepoint = addInput('codepoint', codepoint, width: 21);
    final bytes = addOutput('bytes', width: 32);
    final count = Logic(name: 'count', width: 2);

    Combinational([
      If(codepoint.eq(0x2020), then:[
        count < 2,
        bytes < ((codepoint >>> (Const(6, width: 5) * count.zeroExtend(5))) +
                 Const(0xE0, width: 21)).slice(7, 0).zeroExtend(32),
        count < count - 2,
      ]),
    ]);
  }

  Logic get bytes => output('bytes');
}

Future<void> main() async {
  final codepoint = Logic(width: 21);
  final exampleModule = ExampleModule(codepoint);
  await exampleModule.build();
  File('code.sv').writeAsStringSync(exampleModule.generateSynth());
  final codepoints = '†† †† † † q†† †'.runes;
  for (final inputCodepoint in codepoints) {
    codepoint.put(inputCodepoint);
    print(exampleModule.bytes.value);
  }
}

I don't know how to reduce it yet. When count < count - 2 is removed, the error disappears and everything works, also everything starts working when you try to substitute Const(2, width: 5) instead of count.zeroExtend(5).

Tried to discard all the details regarding UTF-8 encoding. Just watch the console output.

32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100010
32'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
32'bxxxxxxxxxxxxxxxxxxxxxxxx11100000

Ignore completely unknown lines - these are spaces and the character q. The logic for handling them has been dropped, but without them the problem is not reproducible. Up to the first whitespace character, the first bytes are well-formed, which is E2, but then the output becomes E0. Also note that after the q character and up to the first whitespace character, everything also works fine.

My guess is that the codepoint value stops updating. If we shift the code point U+2020 (†) 12 bits to the right, we get 2, to which we then add the offset E0. But in the case of the code point U+0020 (space), after the shift we get 0, and after adding the offset we get the offset itself.


Thanks for the advice! But in fact, I used inject with tick to generate the timing diagram, and SimpleClockGenerator had to be used, because otherwise the message No more to tick. is sent to the console for every tick. However, I took your suggestions and got a more compact version of the code above.

@chykon
Copy link
Contributor Author

chykon commented Sep 19, 2022

I also seem to have missed the fact that the 24 most significant bits are unknown even though the zeroExtend operation is in progress.

@mkorbel1
Copy link
Contributor

Thank you for the shortened example, it's very helpful towards debugging this!

@mkorbel1 mkorbel1 self-assigned this Sep 20, 2022
@mkorbel1
Copy link
Contributor

I think I've root caused a collection of issues in Combinational sensitivity lists that causes the behavior you're seeing, as well as some unexpected intermediate signal contention and incomplete swizzle execution. I've written a small suite of tests around your examples and I think I have things fixed up now. I'm cleaning up my code and will create a pull request with proposed fixes soon.

@mkorbel1
Copy link
Contributor

I've created a PR that should fix this issue: #166

You can try it out and let me know if you see any issues still. I plan to merge it in soon.

Thanks, again, for filing this @chykon! It was an interesting and fruitful debug.

@chykon
Copy link
Contributor Author

chykon commented Sep 28, 2022

Checked - everything works. Great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants