Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent dimensions for n-dimensional arrays. #54

Closed
AndyPohlNZ opened this issue Jul 27, 2022 · 8 comments
Closed

Inconsistent dimensions for n-dimensional arrays. #54

AndyPohlNZ opened this issue Jul 27, 2022 · 8 comments

Comments

@AndyPohlNZ
Copy link

Hi Rob,

Thanks for putting work into getting stan and julia playing nice together and for the fix to the above issue. Sorry for raising this but the fix implemented does not seem to generalize to multidimensional arrays. I am using the master branch of StanSample: StanSample v6.9.2 https://github.com/StanJulia/StanSample.jl#master . See the modified example below:

using StanSample
ProjDir = @__DIR__
n1 = 50
n2 = 2
n3 = 27
x = fill(0, n1, n2, n3)

stan_data = Dict("x" => x, "n1" => n1, "n2" => n2, "n3" => n3)
println(size(stan_data["x"]))

mdl = "
data { 
    int n1;
    int<lower=1> n2;
    int<lower=1> n3;
    array[n1, n2, n3]real x;            
}

parameters {
   real mu;
} 

model {
    mu ~ normal(0, 1);
}
"

tmpdir = joinpath(ProjDir, "tmp")
isdir(tmpdir) && rm(tmpdir; recursive=true)
stan_model = SampleModel("temp_model", mdl, tmpdir)
stan_sample(
    stan_model;
    data=stan_data,
    seed=123,
    num_chains=4,
    num_samples=1000,
    num_warmups=1000,
    save_warmup=false
)

Returns the following error:
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=x; position=0; dims declared=(50,2,27); dims found=(27,2,50)

Originally posted by @AndyPohlNZ in #51 (comment)

@goedman
Copy link
Collaborator

goedman commented Jul 27, 2022

Hi Andy, Thanks for raising this issue. I see it as well. Back to the drawing board...

@goedman
Copy link
Collaborator

goedman commented Jul 28, 2022

Hi Andy ( @AndyPohlNZ ), the issue was caused by the update_json_files() function in StanBase.jl. I just pushed StanBase.jl v4.7.3 which I think should fix the problem.

I have not seen a model to check the results of the fix. Also, I'm a bit concerned this is another point solution, not something that deals with Array{T, N} in general.

@AndyPohlNZ
Copy link
Author

Hi Rob,
Just updated to StanBase.jl v4.7.3 which does seem to work for the example above although as you suspected the fix doesn't hold in general....

See this slightly more complex example which I hope is a useful test case for testing dimensions of both input and output:

ProjDir = @__DIR__
using StanSample
using DataFrames
using Random, Distributions


# Dimensions
n1 = 1;
n2 = 2;
n3 = 3;
# Number of observations
N = 500;

# True values
σ = 0.01;
μ₁ = [1.0, 2.0, 3.0];
μ₂ = [10, 20, 30];

μ = Array{Float32}(undef, n1, n2, n3);
μ[1, 1, :] = μ₁;
μ[1, 2, :] = μ₂;

# Observations
y = Array{Float32}(undef, N, n1, n2, n3);
for i in 1:N
    for j in 1:n1
        for k in 1:n2
            for l in 1:n3
                y[i, j, k, l] = rand(Normal(μ[j, k, l], σ))
            end
        end
    end
end


mdl = "
data {
    int<lower=1> N;
    int<lower=1> n1;
    int<lower=1> n2;
    int<lower=1> n3;
    array[N, n1, n2] vector[n3] y;
}

parameters {
    array[n1, n2] vector[n3] mu;
    real<lower=0> sigma;
}
model {
    // Priors
    sigma ~ inv_gamma(0.01, 0.01);
    for (i in 1:n1) {
        for (j in 1:n2) {
            mu[i, j] ~ normal(rep_vector(0, n3), 1e6);
        }
    }

    // Model
    for (i in 1:N){
        for(j in 1:n1){
            for(k in 1:n2){
                y[i, j, k] ~ normal(mu[j, k], sigma);
            }
        }
    }
}
"

stan_data = Dict(
    "y" => y,
    "N" => N,
    "n1" => n1,
    "n2" => n2,
    "n3" => n3,
);

println(size(stan_data["y"]))

tmpdir = joinpath(ProjDir, "tmp")
isdir(tmpdir) && rm(tmpdir; recursive=true)
stan_model = SampleModel("multidimensional_inference", mdl, tmpdir)
stan_sample(
    stan_model;
    data=stan_data,
    seed=123,
    num_chains=4,
    num_samples=1000,
    num_warmups=1000,
    save_warmup=false
)

samps = DataFrame(read_samples(stan_model))
println(describe(samps))

Error suggests dimensions are still inconsistent:
Exception: mismatch in dimension declared and found in context; processing stage=data initialization; variable name=y; position=0; dims declared=(500,1,2,3); dims found=(3,2,1,500)

@goedman
Copy link
Collaborator

goedman commented Jul 29, 2022

Hi Andy, just in the process of pushing StanBase.jl v4.7.4. This afternoon, driving up to the mountains here in Colorado, I realized a better approach is to simply permutedims() the input matrix if we go from Julia to Stan. The new version checks if the input is a subtype of Array and then I do permute dims(input, length(size(input)):-1:1). Basically reverse the axis.

I wrote a simple test program that I think demonstrates it works for 2, 3 and 4 (and up) dimensions:

using StanSample, Statistics, Test

ProjDir = @__DIR__
n1 = 2
n2 = 3
n3 = 4
n4 = 4

stan0_2 = "
data { 
    int n1;
    int<lower=1> n2;
    array[n1, n2] real x;            
}

generated quantities {
    array[n1] real mu;
    for (i in 1:n1)
        mu[i] =  x[i, 1] + x[i, 2] +x[i, 3];
}
";

x = Array(reshape(1:n1*n2, n1, n2))
data = Dict("x" => x, "n1" => n1, "n2" => n2)

tmpdir = joinpath(ProjDir, "tmp")
m0_2s = SampleModel("m0_2s", stan0_2, tmpdir)
rc0_2s = stan_sample(m0_2s; data)

if success(rc0_2s)
    post0_2s = read_samples(m0_2s, :dataframe)
    sums_stan_2 = Int.(mean(Array(post0_2s); dims=1))[1, :]
    sums_julia_2 = [sum(x[i, :]) for i in 1:n1]
    @test sums_stan_2 == sums_julia_2
end

stan0_3 = "
data { 
    int n1;
    int<lower=1> n2;
    int<lower=1> n3;
    array[n1, n2, n3] real x;            
}

generated quantities {
    array[n1, n2] real mu;
    for (i in 1:n1)
        for (j in 1:n2)
            mu[i, j] =  x[i, j, 1] + x[i, j, 2] +x[i, j, 3] + x[i, j, 4];
}
";

x = Array(reshape(1:n1*n2*n3, n1, n2, n3))
data = Dict("x" => x, "n1" => n1, "n2" => n2, "n3" => n3)

tmpdir = joinpath(ProjDir, "tmp")
m0_3s = SampleModel("m0_3s", stan0_3, tmpdir)
rc0_3s = stan_sample(m0_3s; data)

if success(rc0_3s)
    post0_3s = read_samples(m0_3s, :dataframe)
    sums_stan_3 = Int.(mean(Array(post0_1s); dims=1))[1, :]
    sums_julia_3 = [sum(x[i, j, :]) for j in 1:n2 for i in 1:n1]
    @test sums_stan_3 == sums_julia_3
end

stan0_4 = "
data { 
    int n1;
    int<lower=1> n2;
    int<lower=1> n3;
    int<lower=1> n4;
    array[n1, n2, n3, n4] real x;            
}

generated quantities {
    array[n1, n2, n3] real mu;
    for (i in 1:n1)
        for (j in 1:n2)
            for (k in 1:n3)
                mu[i, j, k] =  x[i,j,k,1] + x[i,j,k,2] + x[i,j,k,3] + x[i,j,k,4];
}
";

x = Array(reshape(1:n1*n2*n3*n4, n1, n2, n3, n4))
data = Dict("x" => x, "n1" => n1, "n2" => n2, "n3" => n3, "n4" => n4)

m0_4s = SampleModel("m0_4s", stan0_4, tmpdir)
rc0_4s = stan_sample(m0_4s; data)

if success(rc0_4s)
    post0_4s = read_samples(m0_4s, :dataframe)
    sums_stan_4 = Int.(mean(Array(post0_4s); dims=1))[1, :]
    sums_julia_4 = [sum(x[i, j, k, :]) for k in 1:n3 for j in 1:n2 for i in 1:n1]
    @test sums_stan_4 == sums_julia_4
end

which I'm adding to the StanSample test set.

I'll try your test script above but I think to Stan array[N, n1, n2] vector[n3] y; is identical to array[N, n1, n2, n3] y;

@goedman
Copy link
Collaborator

goedman commented Jul 29, 2022

I'm getting:

7×7 DataFrame
 Row │ variable  mean        min          median      max         nmissing  eltype   
     │ Symbol    Float64     Float64      Float64     Float64     Int64     DataType 
─────┼───────────────────────────────────────────────────────────────────────────────
   1 │ mu.1.1.1   1.00011     0.998632     1.00012     1.00163           0  Float64
   2 │ mu.1.2.1   9.99936     9.99774      9.99937    10.0009            0  Float64
   3 │ mu.1.1.2   1.9995      1.9977       1.99949     2.00175           0  Float64
   4 │ mu.1.2.2  19.9993     19.9975      19.9993     20.0008            0  Float64
   5 │ mu.1.1.3   3.00083     2.99928      3.00084     3.00253           0  Float64
   6 │ mu.1.2.3  29.9998     29.9984      29.9998     30.0016            0  Float64
   7 │ sigma      0.0100989   0.00968489   0.0100999   0.0105766         0  Float64

which I think is correct.

@AndyPohlNZ
Copy link
Author

I can confirm that things seem to be working correctly with StanBase v.4.7.4. Thanks so much for your quick turnaround with this issue! Enjoy the mountains :).

@goedman
Copy link
Collaborator

goedman commented Jul 29, 2022

Hi Andy, is it ok if I add your script as a test case? It's a great example! Thanks for you help, it has bothered me a long time that the JSON stuff was never put to the test.

@AndyPohlNZ
Copy link
Author

Hi Rob. Absolutely feel free to add the example as a test case.

goedman added a commit that referenced this issue Dec 7, 2022
c7a425a Allow JSON data as a string (#60)
50fe2ec Update README examples section
e380e82 Reword docstrings
fb513e3 Reorganize and update documentation (#57)
96805fd Compile methods: split stanc args from make args, add default include path (#58)
f62bf46 Fix broken link
676db6b Bump actions/setup-python from 2 to 4 (#55)
34f10dd Remove CmdStan Dependency (#51)
821883f Prefix any C-exposed symbols with `bs_` (#54)
81129b0 Add the option for autodiff Hessian calculations (#52)

git-subtree-dir: deps/data/bridgestan
git-subtree-split: c7a425aac54120bafa643b722ed24b2a32111782
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants