Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to troubleshoot a bad model fit? #51

Open
Javier-Acuna opened this issue Jul 12, 2021 · 0 comments
Open

How to troubleshoot a bad model fit? #51

Javier-Acuna opened this issue Jul 12, 2021 · 0 comments

Comments

@Javier-Acuna
Copy link

Hello,

I tried to use your library with some measurement data but the model obtained is not a good fit.

x and y values are already normalized, I already tried diverse partitioning of the test/train sets and changing the max_iter value to 50.000 but after that I don't know what else to do. Is the number of data points too small? Does ffx not handle non-deterministic (with some random noise) models? Any help would be very much appreciated

Here is a sample code:

import ffx
import numpy as np
import matplotlib.pyplot as plt

# Data to model, two measurements of 'y' for each 'x' value
x = np.array([[-1.392], [-0.985], [-0.308], [0.293], [1.046], [1.347], [-1.392], [-0.985], [-0.308], [ 0.293], [ 1.046], [ 1.347]])
y = np.array([[-1.691], [-0.925], [ 0.109], [0.768], [0.826], [0.829], [-1.673], [-1.049], [ 0.123], [ 0.833], [ 0.947], [ 0.903]])

# Plot y vs x
fig, ax = plt.subplots(1)
ax.scatter(x, y, facecolor='b', marker='o')

# Separate in train and test sets, two possibilities:
if( False ): # Alternate values of 'x' in train set
    x_train = x[1:12:2].reshape( (6,1) )
    y_train = y[1:12:2].reshape( (6,1) )
    
    x_test = x[0:12:2].reshape( (6,1) )
    y_test = y[0:12:2].reshape( (6,1) )
else: # Each 'x' value in train set
    x_train = x[0:6].reshape( (6,1) )
    y_train = y[0:6].reshape( (6,1) )
    
    x_test = x[6:12].reshape( (6,1) )
    y_test = y[6:12].reshape( (6,1) )

#Plot train/tests sets
fig, ax = plt.subplots(1)
ax.scatter(x_train, y_train, facecolor='b', marker='o', label='train')
ax.scatter(x_test,  y_test,  facecolor='b', marker='x', label='test')
ax.legend()

# max_iter changed to 50000  in model_factories.py
models = ffx.run(x_train, y_train, x_test, y_test, varnames=['x'])

for model in models:
    yhat = model.simulate(x_test)
    print(model)

    fig, ax = plt.subplots(1)
    ax.scatter(x, y, facecolor='b', marker='o', label='measurement')
    ax.scatter(x_test, yhat, facecolor='r', marker='x', label='model')
    ax.legend()

The models I obtain with the first partition are:
0.227
0.187 + 0.179*x
Figure_Github_ffx_Partition_1

and with the second partition are:
-0.0140
0.0116 / (1.0 - 0.150*abs(x))
Figure_Github_ffx_Partition_2

Do you have any ideas what should I try? Any help would be much appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant