Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random stability features needed #117

Closed
msmftc opened this issue Oct 4, 2021 · 10 comments
Closed

Random stability features needed #117

msmftc opened this issue Oct 4, 2021 · 10 comments

Comments

@msmftc
Copy link

msmftc commented Oct 4, 2021

PyVSC needs to add features to support random stability. Random stability is the idea that randomized tests should be exactly reproducible (given a fixed random seed), and that changes to a small feature of a test should not affect randomization outside of that feature.

PyVSC cannot support random stability right now because a single, global random number generator (RNG) supplies all random numbers. If a test has several randobj instances then the sequence of random values produced by a particular instance in a loop of randomize() calls depends on the order of ALL randomize() calls on ALL randobj's in the test. This causes a few problems:

  • Multi-threaded tests are not reproducible because the order of randomize() calls between threads is not predictable.
  • Changes to a single randobj instance affects randomization throughout the test, not just randomization of that instance.
  • Within a test it is not possible to make a particular randobj instance generate the same random sequence multiple times, since there is no way to set the random seed for just a particular instance.

A solution would be to provide methods that pass an RNG instance or a seed into a randobj. All randomization for that randobj would need to use its private RNG. I've examined PyVSC source code, and it looks like this solution has been started, but is not complete. Is there a plan to complete it?

I've written an example to demonstrate random (in)stability. The example has two different randobj's, and randomize() is called ten times on each. A command-line parameter alters the order of randomize() calls between the two randobj's. If the example was random stable, then the sequence of numbers from each randobj would always be the same, regardless of randomize() order. Since the test is not random stable, the sequences vary.

import sys
import random
import vsc

@vsc.randobj
class RandByte:
    def __init__(self):
        self.byte = vsc.rand_bit_t(8)

    def __str__(self):
        return("byte = %d".format(self.byte))


@vsc.randobj
class RandWord:
    def __init__(self):
        self.word = vsc.rand_bit_t(16)

    def __str__(self):
        return("word = %d".format(self.word))

def main():
    random.seed(100)   # randobj's should not depend on global seed, but they currently do.

    randByte = RandByte()   # randobj methods should set either an RNG instance or a seed. 
    randWord = RandWord()
    byte_l = []
    word_l = []

    # Order of randomize() calls should not affect random sequence from a particular
    # randobj, but it currently does.
    if(len(sys.argv) > 1 and sys.argv[1] == '1'):
        for i in range(10):
            randByte.randomize()
            randWord.randomize()
            byte_l.append(randByte.byte)
            word_l.append(randWord.word)
    elif(len(sys.argv) > 1 and sys.argv[1] == '2'):
        for i in range(10):
            randWord.randomize()
            word_l.append(randWord.word)
        for i in range(10):
            randByte.randomize()
            byte_l.append(randByte.byte)
    else:
        sys.exit("First command-line argument must be '1' or '2'.")

    print("Byte list")
    for i in range(10):
        print("  byte {}:  {}".format(i, byte_l[i]))
    print("Word list")
    for i in range(10):
        print("  word {}:  {}".format(i, word_l[i]))

main()
$ python randStability1.py 1
Byte list
  byte 0:  89
  byte 1:  104
  byte 2:  209
  byte 3:  63
  byte 4:  121
  byte 5:  175
  byte 6:  222
  byte 7:  112
  byte 8:  243
  byte 9:  19
Word list
  word 0:  26802
  word 1:  24719
  word 2:  52830
  word 3:  25249
  word 4:  37991
  word 5:  52656
  word 6:  42828
  word 7:  59728
  word 8:  13635
  word 9:  6471
$ python randStability1.py 2
Byte list
  byte 0:  175
  byte 1:  205
  byte 2:  222
  byte 3:  167
  byte 4:  112
  byte 5:  233
  byte 6:  243
  byte 7:  53
  byte 8:  19
  byte 9:  25
Word list
  word 0:  22904
  word 1:  26802
  word 2:  26721
  word 3:  24719
  word 4:  53699
  word 5:  52830
  word 6:  16265
  word 7:  25249
  word 8:  31093
  word 9:  37991
@mballance
Copy link
Member

mballance commented Oct 9, 2021

Hi @msmftc,
I've been giving this more thought. Here are some design and requirement notes on this feature. Please review and let me know your thoughts.

Requirements

  • Support a RandState object for capturing and propagating random state
  • Support creating RandState from a numeric seed
  • Support setting RandState for a randomizable object
  • Support getting the RandState of a randomizable object (assume state mutates as randomizations occur)
  • Have sensible default behavior when users do not set the random state of a randomizable object

Operation

  • Every randomizable object provides a set_randstate and get_randstate method.
  • State is copy-in/copy-out to avoid aliasing across objects that may be randomized in different threads
  • RandState mutates as randomizations are performed
  • If the user does not set rand-state for an object, it is initialized on the first randomization using the next random number from the random package as the seed.
  • RandState provides a 'clone' method that supports saving a copy of the random state for later use.

Applying this to your example should look something like the below:

import sys
import random
import vsc

@vsc.randobj
class RandByte:
    def __init__(self):
        self.byte = vsc.rand_bit_t(8)

    def __str__(self):
        return("byte = %d".format(self.byte))


@vsc.randobj
class RandWord:
    def __init__(self):
        self.word = vsc.rand_bit_t(16)

    def __str__(self):
        return("word = %d".format(self.word))

def main():
    rs = RandState(100) # Create a random state from a numeric seed

    randByte = RandByte()   # randobj methods should set either an RNG instance or a seed. 
    randByte.set_randstate(rs) # Sets the randstate to the current state of 'rs'
    randWord = RandWord()
    randWord.set_randstate(rs) # Sets the randstate to the current state of 'rs'

    # Note: because a copy of the randstate is taken, the subsequent randomizations
    # of 'randByte' and 'randWord' are independent. 
    # Note, also, that subsequent randomizations do not mutate the state held in 'rs'.

    byte_l = []
    word_l = []

    # Order of randomize() calls should not affect random sequence from a particular
    # randobj, but it currently does.
    if(len(sys.argv) > 1 and sys.argv[1] == '1'):
        for i in range(10):
            randByte.randomize()
            randWord.randomize()
            byte_l.append(randByte.byte)
            word_l.append(randWord.word)
    elif(len(sys.argv) > 1 and sys.argv[1] == '2'):
        for i in range(10):
            randWord.randomize()
            word_l.append(randWord.word)
        for i in range(10):
            randByte.randomize()
            byte_l.append(randByte.byte)
    else:
        sys.exit("First command-line argument must be '1' or '2'.")

    print("Byte list")
    for i in range(10):
        print("  byte {}:  {}".format(i, byte_l[i]))
    print("Word list")
    for i in range(10):
        print("  word {}:  {}".format(i, word_l[i]))

main()

@msmftc
Copy link
Author

msmftc commented Oct 12, 2021

@mballance, these features look good, but I recommend adding a couple more options for seeding RandState objects. The goal of these options is to uniquely seed each RandState obj while preserving random stability.

First, there could be an option to seed RandObj with the combination of an integer and a string, for example:

rs = RandState(100, "core[0].worker[3].opc")

The integer and string are hashed together to produce a unique seed for the RandState. UVM uses this technique. The integer is generally a global seed that is set for the whole test, and the string is generally a hierarchical pathname for the object being seeded. Assuming every randomizable object has a unique pathname, then each gets a unique seed. When a test is altered, seeds stay constant for any object whose pathname is unchanged. The order in which objects are seeded does not affect randomization.

Second, RandState could have a method that returns random integers so that a RandState can be seeded from another RandState. For example:

rs1 = RandState(100)
rs2 = RandState(rs1.randInt())

This enables a hierarchical method of seeding RandObj's, similar to what SystemVerilog does. Imagine a test that uses many threads, and each thread has the ability to start additional threads. The test could be written such that the initial thread is given an integer seed. The thread creates a RandState with that seed (rs_top = RandState(seed)), then calls rs_top.randInt() to seed a new RandState for each RandObj in the thread. When the thread needs to start child threads, it again calls rs_top.randInt() to provide seeds to those children. This keeps randomization stable for all threads even though we cannot predict the program order in which grandchild threads will be started.

If you include these options in RandState then your plan should work well.

@mballance
Copy link
Member

Hi @msmftc,
I'm good with adding a 'randInt' method to RandState. That certainly makes sense.

I'm wondering if the string-based seed creation could be kept separate from PyVSC. PyVSC itself doesn't maintain a globally-unique instance path for all randomizable objects. That's somewhat dependent on the user's test environment, so the user will need to generate appropriate string paths anyway. I had a look at UVM seed management, and I could see using a similar approach (ie a singleton object that relates paths to seed values) in Python. Can you manage this in your environment -- at least to start?

Thanks,
Matthew

@msmftc
Copy link
Author

msmftc commented Oct 13, 2021

@mballance,

I'm not suggesting that PyVSC manage instance paths for randomizable objects. That is the user's responsibility. I'm recommending that RandState have a constructor or method that hashes an arbitrary integer with an arbitrary string to create a unique random seed. The rest of the work is up to the user (setting a global seed, tracking unique pathnames for each object).

Including the hashing in RandState saves users from needing to provide their own hash functions, which have potential to be implemented badly.

@mballance
Copy link
Member

@msmftc, I'm good with that approach: provide a utility method on RandState for constructing a RandState object from the combination of an integer and string seed.
Since it seems we're agreed on the specifics, I'll proceed to begin implementation.

mballance added a commit that referenced this issue Oct 29, 2021
- (#117) Add support for explicitly managing random stability

Signed-off-by: Matthew Ballance <matt.ballance@gmail.com>
@mballance
Copy link
Member

mballance commented Oct 29, 2021

Hi @msmftc,
The 0.6.3 release contains an implementation of explicit random-stability management. There is a brief discussion of the new features in the documentation: https://pyvsc.readthedocs.io/en/latest/methods.html#managing-random-stability. However, it's probably well-worth taking a look at the new tests for the feature as well. For example, here is an example showing creation of a rand-state object from an integer seed and string component:

def test_same_seed_str(self):
@vsc.randobj
class item_c(object):
def __init__(self):
self.a = vsc.rand_uint8_t()
self.b = vsc.rand_uint8_t()
@vsc.constraint
def ab_c(self):
self.a < self.b
ci = item_c()
v1 = []
v2 = []
print("Iteration 1")
rs1 = RandState.mkFromSeed(10, "abc")
ci.set_randstate(rs1)
for _ in range(10):
ci.randomize()
v1.append((ci.a,ci.b))
print("Iteration 2")
rs2 = RandState.mkFromSeed(10, "abc")
ci.set_randstate(rs2)
for _ in range(10):
ci.randomize()
v2.append((ci.a,ci.b))
# Check that two lists are not exactly equal
all_equal = True
for i in range(len(v1)):
print("[%d] v1=(%d,%d) v2=(%d,%d)" % (
i, v1[i][0], v1[i][1],
v2[i][0], v2[i][1]))
if v1[i][0] != v2[i][0] or v1[i][1] != v2[i][1]:
all_equal = False
# break
self.assertTrue(all_equal)

I'll leave this open for now until you're able to try things out.

-Matthew

@msmftc
Copy link
Author

msmftc commented Nov 19, 2021

@mballance,

I've begun testing the random stability feature. I wrote a simple example test showing that PyVSC provides random repeatability within a single run, but is completely unrepeatable across multiple runs. I've included the example below.

I looked at the PyVSC source code in rand_state.py and found the likely root cause. In method mkFromSeed() you are seeding a RandState object with seed + hash(strval). In Python 3.3 and later, the hash() of a given string changes from run to run. We need a hash function that doesn't change.

The following example demonstrates repeatability within a run, unrepeatability across multiple runs, and different hash() results across multiple runs:

import vsc

@vsc.randobj
class Selector():
    def __init__(self):
        self.number = vsc.rand_bit_t(8)
    

contextName = "test[n1]/process"
selector = Selector()
for i in range(2):
    randState = vsc.RandState.mkFromSeed(1, contextName)
    selector.set_randstate(randState)

    numStr = f"Selection {i}:"
    for j in range(10):
        selector.randomize()
        numStr += f"\t{int(selector.number)}"

    print(numStr)

print(f"hash({contextName}):  {hash(contextName)}")

Run 1:

Selection 0:    93      135     136     153     228     129     12      48      244     153
Selection 1:    93      135     136     153     228     129     12      48      244     153
hash(test[n1]/process):  -714692566165798307

Run 2:

Selection 0:    120     24      164     207     60      186     195     30      58      58
Selection 1:    120     24      164     207     60      186     195     30      58      58
hash(test[n1]/process):  4775314088417376212

An alternative method for getting an integer seed in mkFromSeed() could be to use a random.Random() instance, which can be seeded with any string. For example:

def mkFromSeed(cls, seed, strVal=None):
    if strval is not None:
        rng = random.Random()
        rng.seed(f"{seed} + {strVal}")
        seed = rng.randInt(0, 0xFFFFFFFF)
    return RandState(seed)

However, I think there is a better way to implement the RandState class. Rather than using your own XOR and shift operations to produce random integers, you could build RandState around an instance of random.Random() and take advantage of its pertinent methods: seed(), getstate(), setstate(), and randint(). That would take advantage of the Mersenne Twister RNG algorithm, which is well-tested. A possible downside to this approach is memory consumption - I don't know how much memory each random.Random() instance consumes, but it could be several kilobytes.

I could contribute code if you need help.

@mballance
Copy link
Member

Hi @msmftc, I had overlooked that hash(str) isn't deterministic across runs.
I'm certainly open to using Random() as the core of RandState. Early on with PyVSC, I faced some random-stability issues that I thought were connected to using multiple Random instances (which is why I moved to using only a single instance). However, those issues may have been due to use of other features.

If you're open to contributing code to use 'Random' as the core of RandState, I'll be happy to accept it.

@msmftc
Copy link
Author

msmftc commented Nov 29, 2021

I've submitted a pull request that implements RandState with Random() objects providing random number generation. I've tested it in my private code and it fixes the random stability bug.

@mballance
Copy link
Member

Thanks, @msmftc. I've merged your PR, and started the release process for 0.6.5. I'll close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants