Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ReadNextEvents method gets stuck #3450

Open
tender-barbarian opened this issue May 23, 2024 · 10 comments
Open

[BUG] ReadNextEvents method gets stuck #3450

tender-barbarian opened this issue May 23, 2024 · 10 comments

Comments

@tender-barbarian
Copy link

Describe the bug

ReadNextEvents method of EventHistoryCollector managed object gets stuck.

It looks like this (simplification):

  • Last read event ID is 1000
  • Next event in line which can be read is 2001 - meaning there is a gap of 1001 events
  • ReadNextEvents does not return anything as max page size is 1000

Why the gap? I assume it is because API user does not have access to all inventory objects. So events coming from inventory objects to which user does not have read access are not returned.

To Reproduce

Might be a bit hard to reproduce as I'm not sure if this can be simulated using vcsim since there is no user authorisation.

Steps to reproduce the behavior:

  1. Create an user without access to full inventory
  2. Login to API
  3. Get collector via CreateCollectorForEvents
  4. Attempt to read events via ReadNextEvents method - filter to start some time in the past so there will be enough events to pull
  5. Observe the behaviour - ReadNextEvents will be returing varying amount of events. For example, if given query size of a 100 events, it will be returning less or even get completely stuck if gap in events is bigger then provided query size.

Expected behavior

Collector should be only loading events for which given user have read rights. Currently it looks like it loads everything and then determines which events can be returned based on user access rights.

Affected version

Govmomi latest master
VMware vCenter - all versions as far as I can tell

Additional context

Yes I know there is a simple solution to this - always use an user which have full inventory read rights. But for obvious reasons this is not ideal or even possible in some environments.

For example, in my case, we share vCenter with other entity - which does not want us accessing their events.

Copy link
Contributor

Howdy 🖐   tender-barbarian ! Thank you for your interest in this project. We value your feedback and will respond soon.

If you want to contribute to this project, please make yourself familiar with the CONTRIBUTION guidelines.

@paveljanda
Copy link

👍

2 similar comments
@tgeek77
Copy link

tgeek77 commented May 23, 2024

👍

@TomasFlam
Copy link

👍

@dougm
Copy link
Member

dougm commented Jun 6, 2024

Hi folks, I'm not able to reproduce this using:

% govc about | grep FullName
FullName:     VMware vCenter Server 8.0.2 build-23319993

And with the example here: https://github.com/vmware/govmomi/tree/main/examples/events
Created a user with limited permissions like so:

#!/bin/bash -e

pass=$(govc env GOVC_PASSWORD) # using same password as Administrator

create() {
  id="$1"

  # create a user
  if ! govc sso.user.id "$id" 2>/dev/null ; then
    govc sso.user.create -p "$pass" "$id"
  fi

  # create a role with limited permissions
  if ! govc role.ls "$id" 2>/dev/null ; then
    govc role.create "$id" $(govc role.ls Admin | grep VirtualMachine)
  fi

  # create a vm folder (relative to $GOVC_DATACENTER)
  folder="vm/$id"
  if ! govc object.collect "$folder" 2>/dev/null ; then
    govc folder.create "$folder"
  fi

  # grant user limited permisions for the folder
  govc permissions.set -principal "$id@vsphere.local" -role "$id" "$folder"
}

create limited

With a few dozen VMs in $folder the limited user can read and and few VMs outside of that, using govc vm.power on/off to generate events. Run the example as Administrator:

% export GOVMOMI_URL="Administrator@vsphere.local:$password@$vcenter_ip" GOVMOMI_INSECURE=true
% go run main.go -b 48h | wc -l
3097

Then as limited user:

% export GOVMOMI_URL="limited@vsphere.local:$password@$vcenter_ip" GOVMOMI_INSECURE=true
% go run main.go -b 48h | wc -l
2533

No hanging, but do see a few (expected):

... [EventEx] The user does not have permission to view the entity associated with this event

Are you able to reproduce the issue with the same example or other self-contained program you can share?
Please also share your build number from govc about.

@tender-barbarian
Copy link
Author

tender-barbarian commented Jun 6, 2024

@dougm my guess is that it doesn't get stuck, because there is not enough enitites producing logs to cause big enough gap between events for collector to get stuck. But i will try your example 👍

Our version: VMware vCenter Server 7.0.3 build-20990077

@tender-barbarian
Copy link
Author

tender-barbarian commented Jun 7, 2024

@dougm ok so I tried your script. Then moved single vm to limited folder and powered on/off several times to generate events.

Results

Admin user

% export GOVMOMI_URL="$admin_username@$domain:$password@$vcenter_ip" GOVMOMI_INSECURE=true
% go run main.go -b 48h | wc -l                                                                
    1090

Limited user

% export GOVMOMI_URL="limited@$domain:$password@$vcenter_ip" GOVMOMI_INSECURE=true 
% go run main.go -b 48h | wc -l                                                               
       0

Limited user again with shorter time

% export GOVMOMI_URL="limited@$domain:$password@$vcenter_ip" GOVMOMI_INSECURE=true 
% go run main.go -b 1h | wc -l                                                               
      12

And just to confirm those 12 events are power on/offs from vm in limited folder.

Conclusion

So as you can see, with 48h timeframe, collector does not return anything. I assume it's because it cannot skip more than 1000 events, but there were 1088 events to go through to get to events allowed for limited user. Once gap was shortened by using smaller timeframe, it started to work again.

All of this is just my assumption... I might be mistaken (I hope). So please tell me if I'm doing anything wrong.

I tried all of this on older vCenter in our test environment - VMware vCenter Server 6.7.0 build-15976728 as I didn't have user creation rights in our prod vCenter. But I observed the same behaviour on newer vCenter previously.

Try changing maxCount param in collector.ReadNextEvents(ctx, 100) for example to 10 and see if collector will get stuck.

@tender-barbarian
Copy link
Author

@dougm any news? Did you manage to reproduce?

@dougm
Copy link
Member

dougm commented Jul 11, 2024

I haven't yet. Not sure this will make a difference, but next couple of things I'd look at if you want to try in the patch below.

diff --git a/examples/events/main.go b/examples/events/main.go
index 9a8bf1a2..8d0781f1 100644
--- a/examples/events/main.go
+++ b/examples/events/main.go
@@ -25,8 +25,11 @@ import (
 
 	"github.com/vmware/govmomi/event"
 	"github.com/vmware/govmomi/examples"
+	"github.com/vmware/govmomi/find"
+	"github.com/vmware/govmomi/property"
 	"github.com/vmware/govmomi/vim25"
 	"github.com/vmware/govmomi/vim25/methods"
+	"github.com/vmware/govmomi/vim25/mo"
 	"github.com/vmware/govmomi/vim25/types"
 )
 
@@ -41,7 +44,10 @@ func main() {
 	examples.Run(func(ctx context.Context, c *vim25.Client) error {
 		m := event.NewManager(c)
 
-		ref := c.ServiceContent.RootFolder
+		folder, err := find.NewFinder(c).Folder(ctx, "/DC0/vm/limited") // just 1 folder, rather than all
+		if err != nil {
+			return err
+		}
 
 		now, err := methods.GetCurrentTime(ctx, c) // vCenter server time (UTC)
 		if err != nil {
@@ -49,14 +55,17 @@ func main() {
 		}
 
 		filter := types.EventFilterSpec{
-			EventTypeId: flag.Args(), // e.g. VmEvent
+			EventTypeId: []string{"VmReconfiguredEvent", "VmPoweredOnEvent", "VmPoweredOffEvent"}, // specific event types
 			Entity: &types.EventFilterSpecByEntity{
-				Entity:    ref,
+				Entity:    folder.Reference(),
 				Recursion: types.EventFilterSpecRecursionOptionAll,
 			},
 			Time: &types.EventFilterSpecByTime{
 				BeginTime: types.NewTime(now.Add(*begin * -1)),
 			},
+			UserName: &types.EventFilterSpecByUsername{
+				UserList: []string{"VSPHERE.LOCAL\\Administrator"}, // specific user(s)
+			},
 		}
 		if *end != 0 {
 			filter.Time.EndTime = types.NewTime(now.Add(*end * -1))
@@ -69,6 +78,7 @@ func main() {
 
 		defer collector.Destroy(ctx)
 
+		total := 0
 		for {
 			events, err := collector.ReadNextEvents(ctx, 100)
 			if err != nil {
@@ -88,8 +98,19 @@ func main() {
 				kind := reflect.TypeOf(events[i]).Elem().Name()
 				fmt.Printf("%d [%s] [%s] %s\n", event.Key, event.CreatedTime.Format(time.ANSIC), kind, event.FullFormattedMessage)
 			}
+			total += len(events)
 		}
 
+		// err = collector.SetPageSize(ctx, 100) // to change size of latestPage
+		var hc mo.EventHistoryCollector
+		pc := property.DefaultCollector(c)
+		err = pc.RetrieveOne(ctx, collector.Reference(), []string{"latestPage"}, &hc) // does latestPage have different set of (most recent) events?
+		if err != nil {
+			return err
+		}
+
+		fmt.Printf("read %d events, latestPage is %d events\n", total, len(hc.LatestPage))
+
 		return nil
 	})
 }

Copy link
Contributor

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Mark as fresh by adding the comment /remove-lifecycle stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants