Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when parsing a json array, [] object yielded at end #7

Closed
ilackarms opened this issue Sep 14, 2017 · 21 comments
Closed

when parsing a json array, [] object yielded at end #7

ilackarms opened this issue Sep 14, 2017 · 21 comments
Labels

Comments

@ilackarms
Copy link

ilackarms commented Sep 14, 2017

i've noticed that when i parse an array, after yielding each of the objects in the array, an empty array object [] is yielded at the end

reproduce:

input:

{
    "kind": "PodList",
    "apiVersion": "v1",
    "metadata": {
        "selfLink": "/api/v1/pods",
        "resourceVersion": "1315"
    },
    "items": [
        {
            "metadata": {
                "name": "redis-master3",
                "namespace": "default",
                "selfLink": "/api/v1/pods/redis-master3?namespace=default",
                "uid": "1da148b4-cef5-11e4-ac24-3c970e4a436a",
                "resourceVersion": "1301",
                "creationTimestamp": "2015-03-20T13:34:48+02:00",
                "labels": {
                    "mylabel": "mylabelvalue",
                    "role": "pod"
                }
            },
            "spec": {
                "volumes": null,
                "containers": [
                    {
                        "name": "master",
                        "image": "dockerfile/redis",
                        "ports": [
                            {
                                "hostPort": 6379,
                                "containerPort": 6379,
                                "protocol": "TCP"
                            }
                        ],
                        "resources": {
                            "limits": {
                                "cpu": "100m"
                            }
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "imagePullPolicy": "IfNotPresent",
                        "securityContext": {
                            "capabilities": {}
                        }
                    },
                    {
                        "name": "php-redis",
                        "image": "kubernetes/example-guestbook-php-redis",
                        "ports": [
                            {
                                "hostPort": 8000,
                                "containerPort": 80,
                                "protocol": "TCP"
                            }
                        ],
                        "resources": {
                            "limits": {
                                "cpu": "100m",
                                "memory": "50000000"
                            }
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "imagePullPolicy": "IfNotPresent",
                        "securityContext": {
                            "capabilities": {}
                        }
                    }
                ],
                "restartPolicy": {
                    "always": {}
                },
                "dnsPolicy": "ClusterFirst"
            },
            "status": {
                "phase": "Pending"
            }
        }
    ]
}

parsing code:

streamer.get(key: 'items') do |object|
  p object
end

result:

{"metadata"=>{"name"=>"redis-master3", "namespace"=>"default", "selfLink"=>"/api/v1/pods/redis-master3?namespace=default", "uid"=>"1da148b4-cef5-11e4-ac24-3c970e4a436a", "resourceVersion"=>"1301", "creationTimestamp"=>"2015-03-20T13:34:48+02:00", "labels"=>{"mylabel"=>"mylabelvalue", "role"=>"pod"}}, "spec"=>{"volumes"=>nil, "containers"=>[{"name"=>"master", "image"=>"dockerfile/redis", "ports"=>[{"hostPort"=>6379, "containerPort"=>6379, "protocol"=>"TCP"}], "resources"=>{"limits"=>{"cpu"=>"100m"}}, "terminationMessagePath"=>"/dev/termination-log", "imagePullPolicy"=>"IfNotPresent", "securityContext"=>{"capabilities"=>{}}}, {"name"=>"php-redis", "image"=>"kubernetes/example-guestbook-php-redis", "ports"=>[{"hostPort"=>8000, "containerPort"=>80, "protocol"=>"TCP"}], "resources"=>{"limits"=>{"cpu"=>"100m", "memory"=>"50000000"}}, "terminationMessagePath"=>"/dev/termination-log", "imagePullPolicy"=>"IfNotPresent", "securityContext"=>{"capabilities"=>{}}}], "restartPolicy"=>{"always"=>{}}, "dnsPolicy"=>"ClusterFirst"}, "status"=>{"phase"=>"Pending"}}
[]
@thisismydesign
Copy link
Owner

Issue is caused by a bug that values within an array were handled as if they have keys (namely the previous key in the JSON object) while they should not have keys at all.

@thisismydesign
Copy link
Owner

Issue is fixed in v1.1.1. Please verify and close this issue if you're satisfied.

@ilackarms
Copy link
Author

now I'm only getting the first object in the array

@thisismydesign
Copy link
Owner

What do you mean?

@thisismydesign
Copy link
Owner

I see what you mean, hang on..

@ilackarms
Copy link
Author

input:

{
    "kind": "ServiceList",
    "apiVersion": "v1",
    "metadata": {
        "selfLink": "/api/v1/services",
        "resourceVersion": "59"
    },
    "items": [
        {
            "metadata": {
                "name": "kubernetes",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes?namespace=default",
                "uid": "016e9dcd-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "6",
                "creationTimestamp": "2015-03-19T15:08:16+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 443,
                "protocol": "TCP",
                "selector": null,
                "clusterIP": "10.0.0.2",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        },
        {
            "metadata": {
                "name": "kubernetes-ro",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes-ro?namespace=default",
                "uid": "015b78bf-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "5",
                "creationTimestamp": "2015-03-19T15:08:15+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 80,
                "protocol": "TCP",
                "selector": null,
                "clusterIP": "10.0.0.1",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        }
    ]
}

output:

{"metadata"=>{"name"=>"kubernetes", "namespace"=>"default", "selfLink"=>"/api/v1/services/kubernetes?namespace=default", "uid"=>"016e9dcd-ce39-11e4-ac24-3c970e4a436a", "resourceVersion"=>"6", "creationTimestamp"=>"2015-03-19T15:08:16+02:00", "labels"=>{"component"=>"apiserver", "provider"=>"kubernetes"}}, "spec"=>{"port"=>443, "protocol"=>"TCP", "selector"=>nil, "clusterIP"=>"10.0.0.2", "containerPort"=>0, "sessionAffinity"=>"None"}, "status"=>{}}

only the first object is yielded

@ilackarms
Copy link
Author

i am also experiencing another new bug: certain keys are rendered as nil

input:

{
    "kind": "PodList",
    "apiVersion": "v1",
    "metadata": {
        "selfLink": "/api/v1/pods",
        "resourceVersion": "1315"
    },
    "items": [
        {
            "metadata": {
                "name": "redis-master3",
                "namespace": "default",
                "selfLink": "/api/v1/pods/redis-master3?namespace=default",
                "uid": "1da148b4-cef5-11e4-ac24-3c970e4a436a",
                "resourceVersion": "1301",
                "creationTimestamp": "2015-03-20T13:34:48+02:00",
                "labels": {
                    "mylabel": "mylabelvalue",
                    "role": "pod"
                }
            },
            "spec": {
                "volumes": null,
                "containers": [
                    {
                        "name": "master",
                        "image": "dockerfile/redis",
                        "ports": [
                            {
                                "hostPort": 6379,
                                "containerPort": 6379,
                                "protocol": "TCP"
                            }
                        ],
                        "resources": {
                            "limits": {
                                "cpu": "100m"
                            }
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "imagePullPolicy": "IfNotPresent",
                        "securityContext": {
                            "capabilities": {}
                        }
                    },
                    {
                        "name": "php-redis",
                        "image": "kubernetes/example-guestbook-php-redis",
                        "ports": [
                            {
                                "hostPort": 8000,
                                "containerPort": 80,
                                "protocol": "TCP"
                            }
                        ],
                        "resources": {
                            "limits": {
                                "cpu": "100m",
                                "memory": "50000000"
                            }
                        },
                        "terminationMessagePath": "/dev/termination-log",
                        "imagePullPolicy": "IfNotPresent",
                        "securityContext": {
                            "capabilities": {}
                        }
                    }
                ],
                "restartPolicy": {
                    "always": {}
                },
                "dnsPolicy": "ClusterFirst"
            },
            "status": {
                "phase": "Pending"
            }
        }
    ]
}

output:

{"metadata"=>{"name"=>"redis-master3", "namespace"=>"default", "selfLink"=>"/api/v1/pods/redis-master3?namespace=default", "uid"=>"1da148b4-cef5-11e4-ac24-3c970e4a436a", "resourceVersion"=>"1301", "creationTimestamp"=>"2015-03-20T13:34:48+02:00", "labels"=>{"mylabel"=>"mylabelvalue", "role"=>"pod"}}, "spec"=>{"volumes"=>nil, "labels"=>[{"name"=>"master", "image"=>"dockerfile/redis", nil=>[{"hostPort"=>6379, "containerPort"=>6379, "protocol"=>"TCP"}], "resources"=>{"limits"=>{"cpu"=>"100m"}}, "terminationMessagePath"=>"/dev/termination-log", "imagePullPolicy"=>"IfNotPresent", "securityContext"=>{"capabilities"=>{}}}, {"name"=>"php-redis", "image"=>"kubernetes/example-guestbook-php-redis", "securityContext"=>{"capabilities"=>{}}, "resources"=>{"limits"=>{"cpu"=>"100m", "memory"=>"50000000"}}, "terminationMessagePath"=>"/dev/termination-log", "imagePullPolicy"=>"IfNotPresent"}], "restartPolicy"=>{"always"=>{}}, "dnsPolicy"=>"ClusterFirst"}, "status"=>{"phase"=>"Pending"}}

(notice the line nil=>[{"hostPort"=>6379, "containerPort"=>6379, "protocol"=>"TCP"}]) parsed from the "ports" key above

@thisismydesign
Copy link
Owner

Turns out the cause was identified correctly (values within an array were handled as if they have keys) but I made wrong assumptions regarding fixing it. v1.1.2 should be fine, I also added more tests covering handling of arrays. Please verify again.

@ilackarms
Copy link
Author

now i'm getting the whole object back as a single array. what I want to do is yield each object within the array one-by-one. is this possible with json-streamer? I've tried playing with combinations of key: 'items' and nesting_level: X`, but so far nothing has worked.

@thisismydesign
Copy link
Owner

thisismydesign commented Sep 18, 2017

Since the items key points to an array it will return an array, there's no way around that using the key matcher.

However..

Input: #7 (comment)
The following parameters {nesting_level: 2, yield_values: false}
Result:

{"metadata"=>{"name"=>"kubernetes", "namespace"=>"default", "selfLink"=>"/api/v1/services/kubernetes?namespace=default", "uid"=>"016e9dcd-ce39-11e4-ac24-3c970e4a436a", "resourceVersion"=>"6", "creationTimestamp"=>"2015-03-19T15:08:16+02:00", "labels"=>{"component"=>"apiserver", "provider"=>"kubernetes"}}, "spec"=>{"port"=>443, "protocol"=>"TCP", "selector"=>"null", "clusterIP"=>"10.0.0.2", "containerPort"=>0, "sessionAffinity"=>"None"}, "status"=>{}}

{"metadata"=>{"name"=>"kubernetes-ro", "namespace"=>"default", "selfLink"=>"/api/v1/services/kubernetes-ro?namespace=default", "uid"=>"015b78bf-ce39-11e4-ac24-3c970e4a436a", "resourceVersion"=>"5", "creationTimestamp"=>"2015-03-19T15:08:15+02:00", "labels"=>{"component"=>"apiserver", "provider"=>"kubernetes"}}, "spec"=>{"port"=>80, "protocol"=>"TCP", "selector"=>"null", "clusterIP"=>"10.0.0.1", "containerPort"=>0, "sessionAffinity"=>"None"}, "status"=>{}}

Is this what you're looking for?

Using v1.3.0 (latest).

@ilackarms
Copy link
Author

@thisismydesign I'm looking for the same output, but with the parameters {key: 'items', nesting_level: 2} however i notice that everything with nesting_level 2 gets printed. I'd only like to access the contents of the array "items". is that possible?

@ilackarms
Copy link
Author

ilackarms commented Sep 18, 2017

e.g. with json body

{
    "kind": "ServiceList",
    "apiVersion": "v1",
    "metadata": {
        "selfLink": "/api/v1/services",
        "resourceVersion": "59"
    },
    "items1": [
        {
            "metadata": {
                "name": "kubernetes",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes?namespace=default",
                "uid": "016e9dcd-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "6",
                "creationTimestamp": "2015-03-19T15:08:16+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 443,
                "protocol": "TCP",
                "selector": "null",
                "clusterIP": "10.0.0.2",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        },
        {
            "metadata": {
                "name": "kubernetes-ro",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes-ro?namespace=default",
                "uid": "015b78bf-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "5",
                "creationTimestamp": "2015-03-19T15:08:15+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 80,
                "protocol": "TCP",
                "selector": "null",
                "clusterIP": "10.0.0.1",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        }
    ],
    "items2": [
        {
            "metadata": {
                "name": "kubernetes",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes?namespace=default",
                "uid": "016e9dcd-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "6",
                "creationTimestamp": "2015-03-19T15:08:16+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 443,
                "protocol": "TCP",
                "selector": "null",
                "clusterIP": "10.0.0.2",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        },
        {
            "metadata": {
                "name": "kubernetes-ro",
                "namespace": "default",
                "selfLink": "/api/v1/services/kubernetes-ro?namespace=default",
                "uid": "015b78bf-ce39-11e4-ac24-3c970e4a436a",
                "resourceVersion": "5",
                "creationTimestamp": "2015-03-19T15:08:15+02:00",
                "labels": {
                    "component": "apiserver",
                    "provider": "kubernetes"
                }
            },
            "spec": {
                "port": 80,
                "protocol": "TCP",
                "selector": "null",
                "clusterIP": "10.0.0.1",
                "containerPort": 0,
                "sessionAffinity": "None"
            },
            "status": {}
        }
    ]
}

i'd like to yield the contents of items1, item-by-item

@thisismydesign
Copy link
Owner

Not possible unfortunately. The closest you can do is what you already did: get the items1 array and iterate over it. Is that acceptable for your use case?

I think this would be possible with JSONPath though. Like I mentioned in your PR I'll think about supporting it but not sure about the effort yet.

@ilackarms
Copy link
Author

that would defeat the purpose of using json-streamer; what we want to do is yield items one-by-one from a very large array rather than having to load the whole thing into memory

@thisismydesign
Copy link
Owner

thisismydesign commented Sep 18, 2017

Just wanted to point out that depending on the dispersion of data under separate itemN keys it may still be an improvement.

In any case: how important is this for you? Would you consider implementing your own solution or is it rather just nice to have?

@thisismydesign
Copy link
Owner

Actually.. since the aggregator is exposed you can technically do this:

nesting_level = 2
key = 'items1'
streamer.get(nesting_level: nesting_level, yield_values: false) do |object|
  if streamer.aggregator[nesting_level-2]&.dig(:key) == key
    p "ensured that #{object} is within #{key}"
  end
end

Probably easier than reinventing the wheel, at least for now.

I plan to keep aggregator exposed for exactly such cases.

@thisismydesign
Copy link
Owner

I was planning to do some refactoring and now was a great time to do it as it solves your issue.

In v2.0.0 I created abstraction layers for the callback handler and conditions. This will allow

  • the possibility to substitute json-stream with any other parser providing SAX-like events
  • custom conditions that can handle virtually any scenario

For your use case:

conditions = Json::Streamer::Conditions.new
conditions.yield_object = lambda do |aggregator:, object:|
  aggregator.level.eql?(2) && aggregator.key_for_level(1).eql?('items1')
end

streamer.get_with_conditions(conditions) do |object|
  p object
end

See also the new section of the README and this test case.

@ilackarms
Copy link
Author

cool! will try it out now

@thisismydesign
Copy link
Owner

@ilackarms any update?

@ilackarms
Copy link
Author

perfect! works well. thank you so much for the support

@thisismydesign
Copy link
Owner

Glad to hear that it's working. My pleasure. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants